Clicking Abort during processing of a long prompt can leave the context broken on subsequent generations

To see this, use a prompt with a needle at the beginning followed by 1000+ tokens of padding (enough to create multiple prompt processing batches), then a request for the needle, for example:

The secret password is 71923.
.
.
.
.
.
. (repeated hundreds of times)
[INST] What is the secret password?[/INST]

Mistral Small 22B in Q5_K_L had no problem repeating the password when allowed to process normally. However, if I:

Hit Generate
Hit Abort as soon as it appears
Hit Generate again

it will spam dots as if the instruction isn't there. It will continue being broken on subsequent generations until the backend is restarted or the text is changed near the beginning so that all of it is reprocessed.

(Use a slow-ish model so you have enough time to abort before the whole input is done processing)

This follows me noticing that when I load a long story and start generating, but quickly change my mind and abort to make some changes, the generations seem to ignore later parts of the context until I restart.

LostRuins / koboldcpp

Clicking Abort during processing of a long prompt can leave the context broken on subsequent generations #1178