Context shifting problem?

inspir3dArt commented 2 weeks ago

I had a longer roleplay chat using 0.8.1 with a Q4 k_m gguf quant of L3-8B-Lunar-Stheno. It worked well, until the #42 message, there it took 19 minutes before it replied, like it had to reprocess the entire chat. The models context size is set to 8192.

The log doesn't show anything different than on all other replies, except the big time jump.

The device is a Samsung Galaxy S24 ultra.

Vali-98 commented 2 weeks ago

A few questions:

Was this working in 0.8.0?
Is Bypass Context Length in settings disabled?
What fields are enabled in the chat? (eg. Examples, Scenario, Personality etc)

inspir3dArt commented 2 weeks ago

I haven't tested 0.8.0, but it was a problem back in v0.7.9g too, with is the last version I think I tested before.
Yes, it is disabled. The model showed the context length correctly as 8192 in the models overview, but was at a lower setting in the model settings by default, so I changed it to 8192 there too.
The following fields are used / Enabled by the character card:
- Name
- Description
- First message
- Scenario
- Example messages

The model card has also entries in the "System Prompt" and "Jailbreak (Post history instructions)" that seems to be not used. I copied the System Prompt text manually to the beginning of the Description field inside the character card edit menu in chatter ui before starting the roleplay.

Vali-98 commented 2 weeks ago

Ive done a few tests, and so far I have not been able to reproduce this issue. I did a few test with cards of various field lengths, and so far all of them have properly truncated.

The log doesn't show anything different than on all other replies, except the big time jump.

This is very odd. So long as no data is inserted to the back of context, this shifting should never fail. Did you by chance change anything about the character card or instruct formatting?

inspir3dArt commented 5 days ago

I enabled the 'include Names' option, besides that it's the default Lama 3 preset.

The context shifting doesn't really fail, it just takes really long when it happens. I have made a few tests, knowing the AI's response length of 260 tokens (having the ai curaged to use it), knowing the token count of the character cards from testing in koboldcpp, and knowing that my responses are usually between 30 and 80 tokens, it happens at a total token count around the context length of 8192.

It happens to me with all Lama 3 (as in 3.0) based models like:

L3-8B-Lunar-Stheno
L3-8B-Stheno-v3.2
L3-8B-Niitama-v1
Llama-3SOME-8B-v2

I see a maybe connected thing in koboldcpp, where this models need to process about 260 more tokens than my message every few messages, I guess they shift after every message, or remove things like the stop tokens afterwards.

I don't know if it might be something in the implementation of L3 in lamacpp that causes reprocessing at context shifting, because it never happens with L3.1+ or Mistral / Gemini based models.

Vali-98 / ChatterUI

Context shifting problem? #104