0cc4m / KoboldAI

GNU Affero General Public License v3.0
150 stars 31 forks source link

1 token generation in story mode #49

Open Hotohori opened 1 year ago

Hotohori commented 1 year ago

I use WizardLM-7B-uncencored-GPTQ, pygmalion-7b-4bit-128g-cuda, pygmalion-13b-4bit-128g and PygmalionCoT-7b. All are based on LLaMA and with all I have the same problem:

All 3-5 generation there is only 1 token generated.

When that happens, sometimes after pressing "Submit" several times it generates something, but that didn't help every time and I need to change something in the chat history that it generates more than 1 token. Sometimes a simple space on the end helps, often a new line helps. But sometimes the AI generates complete crap, using directly something from the beginning of the context to generate a own genre tag or author's note tag instead continue the context.

When I switch the model to a other LLaMA based model on a position in the story where it only was generating 1 token, the other model also only generates 1 token. If I use Pygmalion-6b-4bit-128g, a model that is not based on LLaMA, it generates normally. So it looks like it is a problem with models based on LLaMA only.

I have that problem since a long time now and already did a complete fresh KAI installation. Nothing helped so far. I use KAI locally under Win10.

0cc4m commented 1 year ago

It most likely hits a stop token. The model thinks it's done. I don't know too much about it, unless you think it's a 4bit-issue, this isn't the right place to ask. You can try upstream in united or on Discord.

anyezhixie commented 1 year ago

In my personal experience of using it, the essence of the situation is that the model believes that there is nothing more to say in the current scenario, which is usually one of two things.

  1. you ask it to write something it is not trained to write.
  2. It thinks that the current scene is over. The former often requires you to change instructions or retry several times, and even though it is untrained, it may write something monotonous on multiple retries to get by. The latter requires checking the memory layer to see if it is a sentence or action that causes it to think the scene is over. Hope this helps.