Open Azirine opened 8 months ago
Context shifting does work with editing to a certain extent. If you're only editing a bit of text at the end (aka the "new" text) then it will work fine. However, if you've edited far enough into the history, 2 things can happen:
For (1) everything is fine. For (2) however, if the story gets shorter, then old text that has already been "Shifted" out of the context and erased will be needed again, as it has re-entered the context window. Since this is not possible, the prompt needs to be reprocessed.
For (2), is it possible to add the option to not reinclude text that has been shifted out of the context, so that the prompt doesn't have to be reprocessed?
it's not really easy to detect when something like that has happened. You can't differentiate it from a brand new prompt.
Is it possible to cache the text from the last request, and compare to which char it's equal the new one from the beginning? So it could be used to reuse part of the already processed prompt?
It's just a question. I'm not sure it's possible.
Yes, that is already done, and used for context shifting. The issue is once text is shifted away, it's permanently lost. So if you undo a few times and try to generate something new, it will have to reprocess everything.
The text that is lost represents a small proportion of the whole context. Let's say context shifting starts at 3000 tokens. I remove 100 tokens (my last message and the bot's last message) and add 50 new tokens (rewriting my last message) to the end. The 2900 tokens left in the context plus the 50 new tokens is still sufficient to generate new text, there is no need to go back to retrieve the 50 lost tokens to get the context back up to 3000 tokens and reprocess everything.
Yes, theoretically I understand what you are saying. Practically, making an implementation that works across the board is not so straightforward.
Anyway, try the new version 1.49, the split memory may help
I support the notion that losing some context on top, when editing at the bottom is acceptable. I regularly have to edit my chat, because the llm misunderstands my question, and its really frustrating that it needs to process five, ten minutes for the whole context. With contextshift, really thanks for that, it's only 20 seconds.
Expected Behavior
When editing the last prompt, only the part starting from the first edited word should be processed.
Current Behavior
This currently works without context shifting. However, when the context is shifting, it instead reprocesses the whole context.
Environment and Context
Koboldcpp 1.48.1 MacOS 10.15.7
Steps to Reproduce