Open Hotohori opened 1 year ago
It most likely hits a stop token. The model thinks it's done. I don't know too much about it, unless you think it's a 4bit-issue, this isn't the right place to ask. You can try upstream in united or on Discord.
In my personal experience of using it, the essence of the situation is that the model believes that there is nothing more to say in the current scenario, which is usually one of two things.
I use WizardLM-7B-uncencored-GPTQ, pygmalion-7b-4bit-128g-cuda, pygmalion-13b-4bit-128g and PygmalionCoT-7b. All are based on LLaMA and with all I have the same problem:
All 3-5 generation there is only 1 token generated.
When that happens, sometimes after pressing "Submit" several times it generates something, but that didn't help every time and I need to change something in the chat history that it generates more than 1 token. Sometimes a simple space on the end helps, often a new line helps. But sometimes the AI generates complete crap, using directly something from the beginning of the context to generate a own genre tag or author's note tag instead continue the context.
When I switch the model to a other LLaMA based model on a position in the story where it only was generating 1 token, the other model also only generates 1 token. If I use Pygmalion-6b-4bit-128g, a model that is not based on LLaMA, it generates normally. So it looks like it is a problem with models based on LLaMA only.
I have that problem since a long time now and already did a complete fresh KAI installation. Nothing helped so far. I use KAI locally under Win10.