ggerganov / llama.cpp

LLM inference in C/C++
MIT License
67.23k stars 9.65k forks source link

llama-cli chat templates ignored? #8469

Closed 0wwafa closed 1 month ago

0wwafa commented 3 months ago

llama-cli -c 1024 -t 6 -m codegeex4-all-9b.q4_k.gguf -p "You are my assistant." -e -cnv --chat-template chatml

== Running in interactive mode. ==
 - Press Ctrl+C to interject at any time.
 - Press Return to return control to the AI.
 - To return control without starting a new line, end your input with '/'.
 - If you want to submit another line, end your input with '\'.

<|im_start|>system
You are my assistant.<|im_end|>

> Hello.
Hello! How can I assist you today?
<|im_end|>
<|im_start|>user
I'm a developer and I want [....]

and it continues by itself. what am I missing?

0wwafa commented 3 months ago

I also tried -r "<|im_end|>" same thing

dspasyuk commented 3 months ago

Codegeex uses this prompt format: [gMASK] <|system|> {system_prompt} <|user|> {prompt} <|assistant|>

try using --chat-template zephyr

TheLapinMalin commented 3 months ago

Codegeex uses this prompt format: [gMASK] <|system|> {system_prompt} <|user|> {prompt} <|assistant|>

try using --chat-template zephyr

Do you know why the [gMASK] token is required at the start?

The CodeGeex README doesn't even mention it, but without it, the model really doesn't work well.

dspasyuk commented 3 months ago

@TheLapinMalin AFAIK it is a BOS token (begining of string token) which masks a set of instructions used for text generation.

TheLapinMalin commented 3 months ago

Ok, thanks for the info, I couldn't find anything describing it in their docs. :)

github-actions[bot] commented 1 month ago

This issue was closed because it has been inactive for 14 days since being marked as stale.

sgjohnson1981 commented 1 month ago

For anyone else struggling to get codegeex4 models working, don't bother going through the wiki instructions to add a new template because the list of supported templates is outdated. Codegeex4 IS supported. That's just not the name of the template. Codegeex4 is a fine-tuned ChatGLM4 model and happens to use the same template, so "--chat-template chatglm4" actually works. Has to be that exact string too, because that's the string llama.cpp checks for.