bentoml / OpenLLM

Run any open-source LLMs, such as Llama 3.1, Gemma, as OpenAI compatible API endpoint in the cloud.
https://bentoml.com
Apache License 2.0
9.74k stars 619 forks source link

Llama2 models giving junk output in v100 #553

Closed bibekyess closed 10 months ago

bibekyess commented 10 months ago

Hello everyone! I found Llama models like beomi/llama-2-ko-7b are giving junk output like \n[/INST]\n\n[/INST].... I tried with multiple Llama2 korean models and I am getting similar junk results. What may be the reason? Is it because of running on v100 GPU? But other models like NousResearch/llama-2-7b-chat-hf are working fine. The difference between this Llama models and other is that it uses the Huggingface Fast tokenizer instead of the sentencepiece model used in the regular Llama models. Doesn't OpenLLM support Llama models without tokenizer.model file?

I tested other gptneox models like beomi/polyglot-ko-12.8b, and it works fine. So, I am wondering what may the issue.

Thank you!

aarnphm commented 10 months ago

This has been updated on main and will be released on 0.4