ggerganov / llama.cpp

LLM inference in C/C++
MIT License
67.49k stars 9.69k forks source link

Get chat_template from a server endpoint. #5447

Closed lastrosade closed 6 months ago

lastrosade commented 9 months ago

Feature Description

Retrieve the "chat_template" field from the GGUF model in the /props endpoint.

Motivation

Many models incorporate a jinja2 template stored in a field called chat_template. This feature would enable users to generate appropriate templates with their scripts.

This could also probably be used on the web UI to autofill the template text boxes.

Azeirah commented 9 months ago

Could be nice to just return all the gguf metadata in one go?

ngxson commented 8 months ago

I need this too. Currently, the problem is that we cannot access to metadata outside of llama_model_loader (please correct me if I'm wrong)

slaren commented 8 months ago

There are functions in the llama.h API to read the metadata. It should work with any non-array metadata.

https://github.com/ggerganov/llama.cpp/blob/8084d554406b767d36b3250b3b787462d5dd626f/llama.h#L357-L367

ngxson commented 8 months ago

@slaren Perfect, thanks. That's exactly what I was missing in https://github.com/ggerganov/llama.cpp/pull/5425

I'm not sure how can we decode the template inside cpp code. It would be far more complicated to include some kind of "official" parser.

The idea that I'm having in my mind is maybe hard code some template patterns to detect if it's which type of template. In reality, we will mostly have either llama2 format ([INST]) or chatml (<|im_start|>)

ggerganov commented 8 months ago

The idea that I'm having in my mind is maybe hard code some template patterns to detect if it's which type of template. In reality, we will mostly have either llama2 format ([INST]) or chatml (<|im_start|>)

Yes, exactly. Some simple heuristic checks to detect the most common templates would be great. Should be something very basic and easy to reuse - no need to over-engineer it.

lastrosade commented 8 months ago

Would that work for weirder templates like MiniCPM's

<用户>
<AI>

?

ngxson commented 8 months ago

Would that work for weirder templates like MiniCPM's

<用户>
<AI>

?

No, not for now, but we can add support for these template as long as we can find the jinja version.

I couldn't find this template in tokenizer_config.json of MiniCPM-V. Can you find it somewhere?

For now, we can only support templates that are included in tokenizer_config.json. The benefit is that I can run the python code then cpp code to compare if the cpp implementation is correct or not.

lastrosade commented 8 months ago

I couldn't find this template in tokenizer_config.json of MiniCPM-V. Can you find it somewhere?

I took mine from here https://github.com/ggerganov/llama.cpp/issues/5447#issuecomment-1957784407

{% for message in messages %}{% if message['role'] == 'user' %}{{'<用户>' + message['content'].strip() + '<AI>'}}{% else %}{{message['content'].strip()}}{% endif %}{% endfor %}

ngxson commented 8 months ago

@lastrosade Can you give the link to the official docs somewhere? Pay attention because template may different structure of newline & space & EOS / BOS token that is quite confused.

Your template will output something like: <用户>hello<AI>hi

But in reality, it may be: <用户>hello\n<AI>hi, <用户>\nhello\n</s><s><AI>\nhi,...

That's why it's always better to have the official template (the one used in training process)

lastrosade commented 8 months ago

I don't know where to find any official docs, But looking at their repo, it seems that they do not use any special tokens in their template.

https://github.com/OpenBMB/MiniCPM/blob/b3358343cb6cc40002d92bc382ab92b98d5b8f3e/model/modeling_minicpm.py#L1326 But I think this only parses text, so idk.

ngxson commented 8 months ago

@lastrosade sorry for the late response, but the current blocking point is that the gguf model does not have template at all, so it's impossible for server to detect if it should use MiniCPM template or not.

Please join the discussion in the linked issue above

github-actions[bot] commented 6 months ago

This issue was closed because it has been inactive for 14 days since being marked as stale.