FMInference / FlexLLMGen

Running large language models on a single GPU for throughput-oriented scenarios.
Apache License 2.0
9.18k stars 548 forks source link

[Multi-line Chatbot] Multiple line chat answers cut off? #36

Open SoftologyPro opened 1 year ago

SoftologyPro commented 1 year ago

Are multiple line answers in the chatbot cut off? It seems like it "has more to say" sometimes, but the output is trimmed to just the first line. For example

Human: Write a short poem about a carrot Assistant: Here is a short poem about a carrot. Human:

Should there be the actual poem after it shows "Here is a short poem about a carrot."? If so, how do I edit the chatbot.py script to allow multi-line output?

Thanks.

xaedes commented 1 year ago

The generation of new tokens stops when encountering the stop word. It is defined as "\n" (newline) in https://github.com/FMInference/FlexGen/blob/main/apps/chatbot.py#L35 That is why only the first line will be generated. You can replace the "\n" by something else.

akhilshastrinet commented 1 year ago

With a bit of fiddling I got this working, it uses ### as the ending token. I've also adjusted the prompt a bit: https://github.com/akhilshastrinet/FlexGen/blob/multi-line-chatbot/apps/chatbot.py

SoftologyPro commented 1 year ago

OK, I updated to the latest. Is it even possible for chatbot.py to return multi line answers at all? I cannot get any question to give an answer longer than 1 line.