Llama2 models giving junk output in v100

Hello everyone! I found Llama models like beomi/llama-2-ko-7b are giving junk output like \n[/INST]\n\n[/INST].... I tried with multiple Llama2 korean models and I am getting similar junk results. What may be the reason? Is it because of running on v100 GPU? But other models like NousResearch/llama-2-7b-chat-hf are working fine. The difference between this Llama models and other is that it uses the Huggingface Fast tokenizer instead of the sentencepiece model used in the regular Llama models. Doesn't OpenLLM support Llama models without tokenizer.model file?

I tested other gptneox models like beomi/polyglot-ko-12.8b, and it works fine. So, I am wondering what may the issue.

Thank you!

bentoml / OpenLLM

Llama2 models giving junk output in v100 #553