LostRuins / koboldcpp

Run GGUF models easily with a KoboldAI UI. One File. Zero Install.
https://github.com/lostruins/koboldcpp
GNU Affero General Public License v3.0
4.66k stars 334 forks source link

Request: Stop generating at new line #38

Closed Enferlain closed 1 year ago

Enferlain commented 1 year ago

I've been trying to use koboldcpp with a 200 token limit, and I've noticed that every model defaults back to generating conversations with itself to fill the set limit, even when I have multiline responses disabled. It doesn't stop the generation, it only hides them from the ui, meaning I still have to wait through the entire imaginary conversation, and if the first line is only a few words, I only receive that output even if the wait time was like a minute, in addition to having to process the prompt that's like 1000-2000 tokens in my case every time, which results in huge wait times.

I think it would be beneficial if the multiline replies option stopped the generation altogether instead of just hiding it, but not sure if that's possible so I figured I'd ask about it.

Kagamma commented 1 year ago

I think enable streaming in chat mode should stop it at generating new line if the next 8 tokens contain your name.

LostRuins commented 1 year ago

That is correct. This it the best option for now until the API official accepts stopper tokens

LostRuins commented 1 year ago

In the latest version (v1.9 at time of writing) stopper tokens have finally been added to the API. The AI should now stop generating excess tokens in chat mode. Do try it out!

notspaghetti commented 1 year ago

It's definitely still doing it as of today:

image

LostRuins commented 1 year ago

The stopper token for Chat Mode is currently the user's name as configured by the UI, it does not stop at a newline without any chatname. Future plan will include a customizable stop token list.

For now you can also use @YellowRoseCx wordstopper fork which supports reading stopping tokens from a file, although this will not be supported via the API.

LostRuins commented 1 year ago

Update: In V1.11 you can now configure any custom stop sequence you want from the UI, under the Memory panel. To use a newline as a stopper, simply set it to "\n"