Open jojorne opened 7 months ago
That's natural because the ui has no idea about your intentions, just truncating the text to fit max allowance.
Yes, for a single token, backtracking the entire context is wasteful. However, what if you chose to remove 20 tokens instead? Or 2000? At a certain point, you'd want the old context to be included into the text.
I was tinkering with it and first I created a page concept, but when I turned the page, suddenly the AI lost memory because all the context was gone. Then I compared it to a video game camera. You generally don't want to copy the player's exact position and rotation. Imagine the player climbing a staircase quickly, think about what would happen to the camera. So it's like you said, there would have to be a certain percentage of free buffer after adding the memory and author notes like 1/3th? The problem is that only Kobold Lite would have this support, others like SillyTavern would be without it. That's why I decided to open a ticket. Maybe someone will come up with a better idea?
I saw two interesting news today. LLM support for Android through MediaPipe and this in a PR from llama.cpp - there might be more prompt re-processing than necessary in the server example, especially if your client trims the end of the output
and State backtracking - Would be very useful to reduce prompt reprocessing
. I'll keep my fingers crossed. 🤞
I'm having a lot of fun with KoboldCpp. I can generate and edit text. It's very fast until the context overflows with tokens.
Steps to reproduce:
After some analysis, I came to the conclusion that KoboldCpp sees the free space and tries to fit as many tokens as possible in the free space. This will cause the entire context to shift and be reprocessed. Here, take a look. This will fit as many tokens as possible from the beginning of the story now that we have free space:
Since there is nothing that prevents the context from shifting backwards, now the entire context is invalid and needs to be reprocessed. There are many places where things like this happen. See, this doesn't happen if the context is not in an overflow state because there will be no truncated text to shift the context backwards and invalidate the entire context.