Open inspir3dArt opened 2 weeks ago
A few questions:
Bypass Context Length
in settings disabled?I haven't tested 0.8.0, but it was a problem back in v0.7.9g too, with is the last version I think I tested before.
Yes, it is disabled. The model showed the context length correctly as 8192 in the models overview, but was at a lower setting in the model settings by default, so I changed it to 8192 there too.
The following fields are used / Enabled by the character card:
The model card has also entries in the "System Prompt" and "Jailbreak (Post history instructions)" that seems to be not used. I copied the System Prompt text manually to the beginning of the Description field inside the character card edit menu in chatter ui before starting the roleplay.
Ive done a few tests, and so far I have not been able to reproduce this issue. I did a few test with cards of various field lengths, and so far all of them have properly truncated.
The log doesn't show anything different than on all other replies, except the big time jump.
This is very odd. So long as no data is inserted to the back of context, this shifting should never fail. Did you by chance change anything about the character card or instruct formatting?
I enabled the 'include Names' option, besides that it's the default Lama 3 preset.
The context shifting doesn't really fail, it just takes really long when it happens. I have made a few tests, knowing the AI's response length of 260 tokens (having the ai curaged to use it), knowing the token count of the character cards from testing in koboldcpp, and knowing that my responses are usually between 30 and 80 tokens, it happens at a total token count around the context length of 8192.
It happens to me with all Lama 3 (as in 3.0) based models like:
I see a maybe connected thing in koboldcpp, where this models need to process about 260 more tokens than my message every few messages, I guess they shift after every message, or remove things like the stop tokens afterwards.
I don't know if it might be something in the implementation of L3 in lamacpp that causes reprocessing at context shifting, because it never happens with L3.1+ or Mistral / Gemini based models.
I had a longer roleplay chat using 0.8.1 with a Q4 k_m gguf quant of L3-8B-Lunar-Stheno. It worked well, until the #42 message, there it took 19 minutes before it replied, like it had to reprocess the entire chat. The models context size is set to 8192.
The log doesn't show anything different than on all other replies, except the big time jump.
The device is a Samsung Galaxy S24 ultra.