To see this, use a prompt with a needle at the beginning followed by 1000+ tokens of padding (enough to create multiple prompt processing batches), then a request for the needle, for example:
The secret password is 71923.
.
.
.
.
.
. (repeated hundreds of times)
[INST] What is the secret password?[/INST]
Mistral Small 22B in Q5_K_L had no problem repeating the password when allowed to process normally. However, if I:
Hit Generate
Hit Abort as soon as it appears
Hit Generate again
it will spam dots as if the instruction isn't there. It will continue being broken on subsequent generations until the backend is restarted or the text is changed near the beginning so that all of it is reprocessed.
(Use a slow-ish model so you have enough time to abort before the whole input is done processing)
This follows me noticing that when I load a long story and start generating, but quickly change my mind and abort to make some changes, the generations seem to ignore later parts of the context until I restart.
Are you using "context shift" or "smart context" in your KCPP settings? If so, that might be the reason why.
Try without them and see if the problem persists.
To see this, use a prompt with a needle at the beginning followed by 1000+ tokens of padding (enough to create multiple prompt processing batches), then a request for the needle, for example:
Mistral Small 22B in Q5_K_L had no problem repeating the password when allowed to process normally. However, if I:
it will spam dots as if the instruction isn't there. It will continue being broken on subsequent generations until the backend is restarted or the text is changed near the beginning so that all of it is reprocessed.
(Use a slow-ish model so you have enough time to abort before the whole input is done processing)
This follows me noticing that when I load a long story and start generating, but quickly change my mind and abort to make some changes, the generations seem to ignore later parts of the context until I restart.