LostRuins / koboldcpp

A simple one-file way to run various GGML and GGUF models with a KoboldAI UI
https://github.com/lostruins/koboldcpp
GNU Affero General Public License v3.0
4.41k stars 319 forks source link

AI seems to break sometimes - ContextShift bug? #760

Open Hotohori opened 3 months ago

Hotohori commented 3 months ago

I have over the last months a problem that appears from time to time, more or less often, but always after the context history got on its limits. I always thought that it would be an LLM issue until I accidentally noted something.

I was using SillyTavern (but happened inside KCCP as well in story mode) and the AI broke again.

What I mean with broke? That it generate only some less tokens and only totally crap on ever regeneration with the same context. Some random special signs like * and " or a $ price, something that looks like training data and such crap what has nothing to do with the context at all.

The accident was that I clicked on "Impersonate" instead of "Regenerate" inside SillyTavern. I directly stopped the generation. The next time I clicked on "Regenerate" KCCP need to reprocess all tokens.

Instead of "Processing Prompt [BLAS] (1 / 1 tokens)", KCCP was regenerating all near 4096 tokens, what is my context limit I used and after that the LLM has generated a normal answer and was no longer broken without changing a bit on the context at all.

It didn't matter which LLM model I use or what size it has.

So it looks like this must be a bug of ContextShift, what I use in KCCP.

LostRuins commented 3 months ago

It is possible, although it may be hard to repro it without a repeatable scenario. Does it only happen in ST, or Lite too? Do you have a lite save file that can always trigger this?

You can also try disabling contextshift in the launcher.

inspir3dArt commented 2 months ago

I run into the same (or a similar) problem, the LLM begun to talk without any context to our conversation after the first few replies. It turns out that happens when the Max ctx. Tokens in the GUI settings are lower than the --contextsize koboldcpp is initialized with. Setting the max tokens to 32768 fixed it for me. I choosed that high number because it's the biggest context size I use with some models, and koboldcpp sets it automatically down to the value set using --contextsize when processing your first message (It's shown in the terminal). This way you don't need to change the settings every time you use a different model with a different contextsize.