LostRuins / koboldcpp

Run GGUF models easily with a KoboldAI UI. One File. Zero Install.
https://github.com/lostruins/koboldcpp
GNU Affero General Public License v3.0
5.25k stars 360 forks source link

Clicking Abort during processing of a long prompt can leave the context broken on subsequent generations #1178

Open actually-a-cat opened 3 weeks ago

actually-a-cat commented 3 weeks ago

To see this, use a prompt with a needle at the beginning followed by 1000+ tokens of padding (enough to create multiple prompt processing batches), then a request for the needle, for example:

The secret password is 71923.
.
.
.
.
.
. (repeated hundreds of times)
[INST] What is the secret password?[/INST]

Mistral Small 22B in Q5_K_L had no problem repeating the password when allowed to process normally. However, if I:

it will spam dots as if the instruction isn't there. It will continue being broken on subsequent generations until the backend is restarted or the text is changed near the beginning so that all of it is reprocessed.

(Use a slow-ish model so you have enough time to abort before the whole input is done processing)

This follows me noticing that when I load a long story and start generating, but quickly change my mind and abort to make some changes, the generations seem to ignore later parts of the context until I restart.

SerialKicked commented 3 weeks ago

Are you using "context shift" or "smart context" in your KCPP settings? If so, that might be the reason why. Try without them and see if the problem persists.