Crash, when setting top_k, top_p, or repeat_penalty

Maximilian-Winter / llama-cpp-agent

The llama-cpp-agent framework is a tool designed for easy interaction with Large Language Models (LLMs). Allowing users to chat with LLM models, execute structured function calls and get structured output. Works also with models not fine-tuned to JSON output and function calls.

Other

445 stars 38 forks source link

Crash, when setting top_k, top_p, or repeat_penalty #59

Closed woheller69 closed 1 month ago

woheller69 commented 2 months ago

I updated my GUI to your new 0.2.2 version. It now works as long as I do not set top_p, top_k, or repeat_penalty.

these give e.g.:

    llama_cpp.llama_sample_top_p(
ctypes.ArgumentError: argument 3: TypeError: wrong type

   self.provider = LlamaCppPythonProvider(self.main_model)
    self.settings = self.provider.get_provider_default_settings()
    self.settings.max_tokens = 2000
    self.settings.temperature = 0.65
    self.settings.top_k=40,
    self.settings.top_p=0.4,
    self.settings.repeat_penalty=1.18,
    self.settings.stream=True,

woheller69 commented 2 months ago

And where do I set repeat_last_n=64,

woheller69 commented 2 months ago

Or do I have to set these parameters now when defining the model

Llama(
            model_path = self.model_path,
            n_gpu_layers = 0,
            f16_kv = True,
            top_k = 40,
            top_p = 0.4,
            repeat_penalty = 1.18,

Maximilian-Winter commented 2 months ago

@woheller69 You have to remove the , from the lines assigning top_p, top_k, repeat_penalty and stream.

You set the repeat_last_n=64 at the intialization of the Llama class.

But there still is a problem with the generation at llama-cpp-python, it will insert strange symbols into text. Will look into this.

woheller69 commented 2 months ago

Thanks, stupid error on my side :-)

So I moved the repeat_penalty to Llama(...) and removed the commas.

        self.main_model = Llama(
            model_path = self.model_path,
            n_gpu_layers = 0,
            f16_kv = True,
            repeat_last_n = 64,
            use_mmap = True,
            use_mlock = False,
            embedding = False,
            n_threads = self.threads,
            n_batch = 128,
            n_ctx = self.context,
            offload_kqv = True,
            last_n_tokens_size = 1024,
            verbose = True,
            seed = -1,
        )
        self.provider = LlamaCppPythonProvider(self.main_model)
        self.settings = self.provider.get_provider_default_settings()
        self.settings.max_tokens = 2000
        self.settings.temperature = 0.65
        self.settings.top_k=40
        self.settings.top_p=0.4
        self.settings.repeat_penalty=1.18
        self.settings.stream=True

save_messages is not available anymore. Is there a replacement?

Maximilian-Winter commented 2 months ago

@woheller69 The Agent now uses an chat history class that implements the handling of the messages. The BasicChatHistory class has a message store that handles the storing of messages. You can acces it by calling agent.chat_history.message_store.save_to_json and giving it a filename. You load it with load_from_json.

I have to add that to documentation. Thank you for pointing that out!

Maximilian-Winter commented 2 months ago

@woheller69 I think llama-cpp-python is broken, the following code will generate garbage at the beginning of generation or crash the script:

llama_model = Llama(r"C:\AI\Agents\gguf-models\mistral-7b-instruct-v0.2.Q6_K.gguf", n_batch=1024, n_threads=10, n_ctx=8192)

for t in llama_model.create_completion("[INST] Hello! [/INST]", stream=True):
    print(t["choices"][0]["text"], end="")

Maximilian-Winter commented 2 months ago

Can you close this if you have no further questions?

woheller69 commented 2 months ago

I have tried several models and do not get garbage. llama-cpp-python 0.2.74, updated yesterday.

woheller69 commented 2 months ago

Trying to save messages using

    self.llama_cpp_agent.chat_history.message_store.save_to_json("msg.txt")

gives

    TypeError: Object of type Roles is not JSON serializable

Maximilian-Winter commented 2 months ago

Sorry, I will update the package later today. Will inform you here!

Maximilian-Winter commented 2 months ago

@woheller69 Thank you for the information on llama-cpp-python.

Maximilian-Winter commented 2 months ago

@woheller69 Fixed everything and published new version, let me know if it works for you. Also added function to get message store of chat history. You can save and load like that:

    agent.chat_history.get_message_store().load_from_json("test.json")
    agent.chat_history.get_message_store().save_to_json("test.json")

woheller69 commented 2 months ago

saving messages now works but using it I find that adding a message does not work anymore. When interrupting inference manually , see #47, I am adding the partial message to history with

self.llama_cpp_agent.add_message(self.model_reply, "assistant")

This worked with the "old" version. Now it does not have an effect.

When saving the messages after add_message the added message is not there.

woheller69 commented 2 months ago

I found I can add it with

            self.llama_cpp_agent.chat_history.get_message_store().add_assistant_message(self.model_reply)

But will it be used in follow-up conversation then?

woheller69 commented 2 months ago

Another thing: The prompt_suffix works nicely, but it is not stored as part of the assistants message. I think this should be the case.

E.g. using "Sure thing!" as prompt_suffix will eliminate refusals from Llama 3 :-) But when the conversation is saved, "Sure thing!" is missing.