LostRuins / koboldcpp

A simple one-file way to run various GGML and GGUF models with a KoboldAI UI

https://github.com/lostruins/koboldcpp

GNU Affero General Public License v3.0

4.41k stars 319 forks source link

Context shifting doesn't work with edits #523

Open Azirine opened 8 months ago

Azirine commented 8 months ago

Expected Behavior

When editing the last prompt, only the part starting from the first edited word should be processed.

Current Behavior

This currently works without context shifting. However, when the context is shifting, it instead reprocesses the whole context.

Environment and Context

Koboldcpp 1.48.1 MacOS 10.15.7

Steps to Reproduce

Run Koboldcpp 1.48.1 with Context Shifting (A.K.A. EvenSmarterContext) on.
Chat until it starts shifting context.
Click the gear icon, then click edit.
Remove the bot's last message, and replace your last prompt with something different.
Uncheck 'Allow Editing' to finish editing.
Click the triangle play button to generate message, and it will reprocess the whole context before doing so.

LostRuins commented 8 months ago

Context shifting does work with editing to a certain extent. If you're only editing a bit of text at the end (aka the "new" text) then it will work fine. However, if you've edited far enough into the history, 2 things can happen:

The story gets much longer
The story gets shorter

For (1) everything is fine. For (2) however, if the story gets shorter, then old text that has already been "Shifted" out of the context and erased will be needed again, as it has re-entered the context window. Since this is not possible, the prompt needs to be reprocessed.

Azirine commented 8 months ago

For (2), is it possible to add the option to not reinclude text that has been shifted out of the context, so that the prompt doesn't have to be reprocessed?

LostRuins commented 8 months ago

it's not really easy to detect when something like that has happened. You can't differentiate it from a brand new prompt.

gustrd commented 8 months ago

Is it possible to cache the text from the last request, and compare to which char it's equal the new one from the beginning? So it could be used to reuse part of the already processed prompt?

It's just a question. I'm not sure it's possible.

LostRuins commented 8 months ago

Yes, that is already done, and used for context shifting. The issue is once text is shifted away, it's permanently lost. So if you undo a few times and try to generate something new, it will have to reprocess everything.

Azirine commented 8 months ago

The text that is lost represents a small proportion of the whole context. Let's say context shifting starts at 3000 tokens. I remove 100 tokens (my last message and the bot's last message) and add 50 new tokens (rewriting my last message) to the end. The 2900 tokens left in the context plus the 50 new tokens is still sufficient to generate new text, there is no need to go back to retrieve the 50 lost tokens to get the context back up to 3000 tokens and reprocess everything.

LostRuins commented 8 months ago

Yes, theoretically I understand what you are saying. Practically, making an implementation that works across the board is not so straightforward.

LostRuins commented 8 months ago

Anyway, try the new version 1.49, the split memory may help

franshst commented 7 months ago

I support the notion that losing some context on top, when editing at the bottom is acceptable. I regularly have to edit my chat, because the llm misunderstands my question, and its really frustrating that it needs to process five, ten minutes for the whole context. With contextshift, really thanks for that, it's only 20 seconds.