Closed lastrosade closed 6 months ago
Could be nice to just return all the gguf metadata in one go?
I need this too. Currently, the problem is that we cannot access to metadata outside of llama_model_loader
(please correct me if I'm wrong)
There are functions in the llama.h API to read the metadata. It should work with any non-array metadata.
@slaren Perfect, thanks. That's exactly what I was missing in https://github.com/ggerganov/llama.cpp/pull/5425
I'm not sure how can we decode the template inside cpp code. It would be far more complicated to include some kind of "official" parser.
The idea that I'm having in my mind is maybe hard code some template patterns to detect if it's which type of template. In reality, we will mostly have either llama2 format ([INST]
) or chatml (<|im_start|>
)
The idea that I'm having in my mind is maybe hard code some template patterns to detect if it's which type of template. In reality, we will mostly have either llama2 format ([INST]) or chatml (<|im_start|>)
Yes, exactly. Some simple heuristic checks to detect the most common templates would be great. Should be something very basic and easy to reuse - no need to over-engineer it.
Would that work for weirder templates like MiniCPM's
<用户>
<AI>
?
Would that work for weirder templates like MiniCPM's
<用户> <AI>
?
No, not for now, but we can add support for these template as long as we can find the jinja version.
I couldn't find this template in tokenizer_config.json of MiniCPM-V. Can you find it somewhere?
For now, we can only support templates that are included in tokenizer_config.json. The benefit is that I can run the python code then cpp code to compare if the cpp implementation is correct or not.
I couldn't find this template in tokenizer_config.json of MiniCPM-V. Can you find it somewhere?
I took mine from here https://github.com/ggerganov/llama.cpp/issues/5447#issuecomment-1957784407
{% for message in messages %}{% if message['role'] == 'user' %}{{'<用户>' + message['content'].strip() + '<AI>'}}{% else %}{{message['content'].strip()}}{% endif %}{% endfor %}
@lastrosade Can you give the link to the official docs somewhere? Pay attention because template may different structure of newline & space & EOS / BOS token that is quite confused.
Your template will output something like: <用户>hello<AI>hi
But in reality, it may be: <用户>hello\n<AI>hi
, <用户>\nhello\n</s><s><AI>\nhi
,...
That's why it's always better to have the official template (the one used in training process)
I don't know where to find any official docs, But looking at their repo, it seems that they do not use any special tokens in their template.
https://github.com/OpenBMB/MiniCPM/blob/b3358343cb6cc40002d92bc382ab92b98d5b8f3e/model/modeling_minicpm.py#L1326 But I think this only parses text, so idk.
@lastrosade sorry for the late response, but the current blocking point is that the gguf model does not have template at all, so it's impossible for server to detect if it should use MiniCPM template or not.
Please join the discussion in the linked issue above
This issue was closed because it has been inactive for 14 days since being marked as stale.
Feature Description
Retrieve the "chat_template" field from the GGUF model in the /props endpoint.
Motivation
Many models incorporate a jinja2 template stored in a field called chat_template. This feature would enable users to generate appropriate templates with their scripts.
This could also probably be used on the web UI to autofill the template text boxes.