LostRuins / koboldcpp

Run GGUF models easily with a KoboldAI UI. One File. Zero Install.
https://github.com/lostruins/koboldcpp
GNU Affero General Public License v3.0
5.33k stars 363 forks source link

Update Llama-3.json #1150

Closed BBC-Esq closed 1 month ago

BBC-Esq commented 1 month ago

Probably doesn't matter too much, but Llama 3 doesn't have the extra newlines while llama 3.1 does. Also, added appropriate newlines.

Will send the one for Llama 3.1 and 3.2 next.

Pyroserenus commented 1 month ago
{% set loop_messages = messages %}{% for message in loop_messages %}{% set content = '<\|start_header_id\|>' + message['role'] + '<\|end_header_id\|>\n\n'+ message['content'] \| trim + '<\|eot_id\|>' %}{% if loop.index0 == 0 %}{% set content = bos_token + content %}{% endif %}{{ content }}{% endfor %}{% if add_generation_prompt %}{{ '<\|start_header_id\|>assistant<\|end_header_id\|>\n\n' }}{% endif %}"

This is from llama 3.0 8b instruct, it has double newlines after the end headers, your changes do not. There are also no new lines after <|eot_id|>. existing template is correct.

BBC-Esq commented 1 month ago

I'm only seeing one newline after '<|end_header_id|>', not two. Am I mistaken? I do see that there are no newlines after '<|eot_id|>' in the template, however?

BBC-Esq commented 1 month ago

NM, I see it now! Thanks.

BBC-Esq commented 1 month ago

What about the "<|begin_of_text|>" though, is that something that should be included or does Kobold automatically handle that?

Pyroserenus commented 1 month ago

What about the "<|begin_of_text|>" though, is that something that should be included or does Kobold automatically handle that?

BoS tokens are automatically added

LostRuins commented 1 month ago

I think the existing template works correctly. Also, L3, 3.1 and 3.2 all work with the same template.

BBC-Esq commented 1 month ago

Thanks for chiming in...but now that I've been told that I'm wrong wrong wrong, multiple times, I'd like some certainty as to whether I'm actually wrong or not regarding the differences between the chat templates themselves. For example, the Jinja template for llama 3 and 3.1 don't include the knowledge date cutoff stuff, yet the instructions/examples use it...The jinja template for llama 3.2 includes it as mandatory, stuff like that. ;-)

LostRuins commented 1 month ago

I think "correctness" is a very vague term. What is more important is functionality - does a change improve/help users or not? I'm not really too concerned about academic accuracy.

BBC-Esq commented 1 month ago

I'm interested after being called out on newline characters as if I'm wrong, where, based on what you say, it's functionality. I mentioned functionality in my post regarding the llama 3.2 suggested changes so I fully understand that. @Pyroserenus Am I correct that Kobold does not adhere to the proper Jinja template for Llama 3.2 in that the template utilizes the knowledge cutoff date etcetera but Kobold doesn't? Also, am I correct in that the Jinja templates within the tokenizer_config.json files for llama 3 and 3.1 are in conflict in that regard? Thanks.

For your reference:

https://github.com/LostRuins/koboldcpp/pull/1152