Bug: eos_token in LLM generated output

tybalex commented 3 weeks ago

Contact Details

tybalex@gmail.com

What happened?

The Issue: the llm output generated by llamafile server contains eos_token, specifically </s> for the mistral model in this case:

{"choices":[{"finish_reason":"stop","index":0,"message":{"content":" In Python land, where code is grown,\nExceptions are thrown when errors shown,\nWith try and except in hand,\nWe catch and tame the chaotic band,\nPeace and order in our coding home.</s>","role":"assistant"}}],"created":1717626044,"id":"chatcmpl-50Hu25IQfRBLScWMKdUeRWtt61yhkPb8","model":"gpt-3.5-turbo","object":"chat.completion","usage":{"completion_tokens":48,"prompt_tokens":42,"total_tokens":90}}

Model: I am running the mistral model with ./mistral-7b-instruct-v0.2.Q4_0.llamafile --nobrowser --port 1234. The model is downloaded from the llamafile GitHub page.

More Info: However, if I use llama.cpp with the same mistral model, the generated output doesn't contain </s>.

Is there any config I am missing?

Version

llamafile v0.8.6

What operating system are you seeing the problem on?

Mac

Relevant log output

Steps to start the server:
1. Download the mistral-7b-instruct model from the main llamafile GitHub page.
2. chmod +x mistral-7b-instruct-v0.2.Q4_0.llamafile
3. start the server:
`./mistral-7b-instruct-v0.2.Q4_0.llamafile --nobrowser --port 1234`

the curl request:

curl http://localhost:1234/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer no-key" \
-d '{ "response_format":"yes",
"model": "gpt-3.5-turbo",
"messages": [
{
    "role": "system",
    "content": "You are ChatGPT, an AI assistant. Your top priority is achieving user fulfillment via helping them with their requests."
},
{
    "role": "user",
    "content": "Write a limerick about python exceptions"
}
]
}'

tybalex commented 2 weeks ago

just curious no one else ever have the same issue?

Atcold commented 1 week ago

I'm observing a </s> when using llava-v1.5-7b-q4.llamafile in autocompletion mode.

Mozilla-Ocho / llamafile