Lightning-AI / litgpt

20+ high-performance LLMs with recipes to pretrain, finetune and deploy at scale.
https://lightning.ai
Apache License 2.0
10.8k stars 1.07k forks source link

chatting with mistral generates answer with no spaces #1822

Open micrem73 opened 3 weeks ago

micrem73 commented 3 weeks ago

Bug description

steps to reproduce the issue:

  1. litgpt chat checkpoints/mistralai/Mistral-7B-Instruct-v0.3 --max_new_tokens 2048
  2. enter any prompt

I get answer with no space between words, i.e.: "Hello!I'mjustacomputerprogram,soIdon'thavefeelingslikeahumandoes.ButI'mheretohelpyouwithanyquestionsortasksyoumighthave!HowcanIassistyoutoday?"

Here is the full log:

⚡ ~ litgpt chat checkpoints/mistralai/Mistral-7B-Instruct-v0.3 --max_new_tokens 2048 {'access_token': None, 'checkpoint_dir': PosixPath('checkpoints/mistralai/Mistral-7B-Instruct-v0.3'), 'compile': False, 'max_new_tokens': 2048, 'multiline': False, 'precision': None, 'quantize': None, 'temperature': 0.8, 'top_k': 50, 'top_p': 1.0} Now chatting with Mistral-7B-Instruct-v0.3. To exit, press 'Enter' on an empty prompt.

Seed set to 1234

Prompt: hi, how are you? Reply: Hello!I'mjustacomputerprogram,soIdon'thavefeelingslikeahumandoes.ButI'mheretohelpyouwithanyquestionsortasksyoumighthave!HowcanIassistyoutoday? Time for inference: 3.55 sec total, 12.96 tokens/sec, 46 tokens

What operating system are you using?

Unknown

LitGPT Version

⚡ ~ pip show litgpt | grep Version
Version: 0.5.2 Version 2.0, January 2004 Licensed under the Apache License, Version 2.0 (the "License");

rasbt commented 3 weeks ago

Thanks for flagging this. I know Mistral is using their own tokenizer, but I could swear this worked before. Something to look into some time.