The Issue:
the llm output generated by llamafile server contains eos_token, specifically </s> for the mistral model in this case:
{"choices":[{"finish_reason":"stop","index":0,"message":{"content":" In Python land, where code is grown,\nExceptions are thrown when errors shown,\nWith try and except in hand,\nWe catch and tame the chaotic band,\nPeace and order in our coding home.</s>","role":"assistant"}}],"created":1717626044,"id":"chatcmpl-50Hu25IQfRBLScWMKdUeRWtt61yhkPb8","model":"gpt-3.5-turbo","object":"chat.completion","usage":{"completion_tokens":48,"prompt_tokens":42,"total_tokens":90}}
Model: I am running the mistral model with ./mistral-7b-instruct-v0.2.Q4_0.llamafile --nobrowser --port 1234. The model is downloaded from the llamafile GitHub page.
More Info:
However, if I use llama.cpp with the same mistral model, the generated output doesn't contain </s>.
Is there any config I am missing?
Version
llamafile v0.8.6
What operating system are you seeing the problem on?
Mac
Relevant log output
Steps to start the server:
1. Download the mistral-7b-instruct model from the main llamafile GitHub page.
2. chmod +x mistral-7b-instruct-v0.2.Q4_0.llamafile
3. start the server:
`./mistral-7b-instruct-v0.2.Q4_0.llamafile --nobrowser --port 1234`
the curl request:
curl http://localhost:1234/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer no-key" \
-d '{ "response_format":"yes",
"model": "gpt-3.5-turbo",
"messages": [
{
"role": "system",
"content": "You are ChatGPT, an AI assistant. Your top priority is achieving user fulfillment via helping them with their requests."
},
{
"role": "user",
"content": "Write a limerick about python exceptions"
}
]
}'
Contact Details
tybalex@gmail.com
What happened?
The Issue: the llm output generated by llamafile server contains
eos_token
, specifically</s>
for the mistral model in this case:Model: I am running the mistral model with
./mistral-7b-instruct-v0.2.Q4_0.llamafile --nobrowser --port 1234
. The model is downloaded from the llamafile GitHub page.More Info: However, if I use llama.cpp with the same mistral model, the generated output doesn't contain
</s>
.Is there any config I am missing?
Version
llamafile v0.8.6
What operating system are you seeing the problem on?
Mac
Relevant log output