Maximilian-Winter / llama-cpp-agent

The llama-cpp-agent framework is a tool designed for easy interaction with Large Language Models (LLMs). Allowing users to chat with LLM models, execute structured function calls and get structured output. Works also with models not fine-tuned to JSON output and function calls.
Other
445 stars 38 forks source link

Fix GEMMA_2, LLAMA_3, and PHI_3 MessagesFormatter #73

Open woheller69 opened 2 weeks ago

woheller69 commented 2 weeks ago

Gemma needs another \n in the prompt template to avoid slow processing of follow up prompts, see #54

Screenshot from 2024-07-07 19-16-28 As can be seen in the attached screenshot the prompt processing is much faster. Before: The new user prompt plus the models previous answer are processed (42 tokens) After: Only the new user prompt is processed (18 tokens)

The longer the models previous answer the more relevant...

Probably there are similar issues in all other prompt templates.

pabl-o-ce commented 2 weeks ago

Great patch I would check all to see if anyone is missing a typo or newline.

Thank u

pabl-o-ce commented 2 weeks ago

it seems that keep missing some newlines in there you wanna add in llama3 in this PR? It would be like this the correct one:

llama_3_prompt_markers = {
    Roles.system: PromptMarkers("""<|start_header_id|>system<|end_header_id|>\n\n""", """<|eot_id|>"""),
    Roles.user: PromptMarkers("""<|start_header_id|>user<|end_header_id|>\n\n""", """<|eot_id|>"""),
    Roles.assistant: PromptMarkers("""<|start_header_id|>assistant<|end_header_id|>\n\n""", """<|eot_id|>"""),
    Roles.tool: PromptMarkers("""<|start_header_id|>function_calling_results<|end_header_id|>\n""", """<|eot_id|>"""),
}

I don't know if in tools would be another \n newline.. I'm not sure

On Gemma I'm not sure is required a new line I have to test it

And on Phi you are correct :)

pabl-o-ce commented 2 weeks ago

I saw the issue you put it in Gemma Thanks.

Just if wanna add the missing Llama 3 would be awesome

Thank u for the contribution

woheller69 commented 2 weeks ago

I don't use tools, so I don't know if another \n is missing there. But Llama3 is also fixed in this PR.

But I think a better long term fix in general would be my proposal in #54, i.e. to keep the leading and trailing newlines (and spaces?) produced by the model and store them in history. I don't know the code (and I am a python noob) but it feels like they are being stipped.

woheller69 commented 2 weeks ago

it seems that keep missing some newlines in there you wanna add in llama3 in this PR? It would be like this the correct one:


llama_3_prompt_markers = {
    Roles.system: PromptMarkers("""<|start_header_id|>system<|end_header_id|>\n\n""", """<|eot_id|>"""),
    Roles.user: PromptMarkers("""<|start_header_id|>user<|end_header_id|>\n\n""", """<|eot_id|>"""),
    Roles.assistant: PromptMarkers("""<|start_header_id|>assistant<|end_header_id|>\n\n""", """<|eot_id|>"""),
    Roles.tool: PromptMarkers("""<|start_header_id|>function_calling_results<|end_header_id|>\n""", """<|eot_id|>"""),
}

We only need one more \n in Roles.assistant, not in Roles.user (there it should not matter) We only need to make sure that we append the latest model answer exactly as the model produced it so that it finds the tokens already processed in memory.

woheller69 commented 2 days ago

Is there any reason, not to merge this?