LostRuins / koboldcpp

A simple one-file way to run various GGML and GGUF models with KoboldAI's UI
https://github.com/lostruins/koboldcpp
GNU Affero General Public License v3.0
4.35k stars 312 forks source link

[Enhancement] Adding support for YaRN scaling #868

Open Robot1me opened 1 month ago

Robot1me commented 1 month ago

Hi! Recently I tested models like Fimbulvetr and came across a helpful thread on Reddit. The thread is about enabling 16k context with good quality by using YaRN scaling in llamacpp. Normally, the model is limited to 4k context, so I got excited to try that out. And turns out, the results are really impressive!

As comparing output quality is difficult, I have taken screenshots for each scaling method. I then picked the "best" and "worst" 10 outputs at 13k context tokens. Obviously they can be influenced by RNG and my personal preference, and can not replace your own experiences with your favorite characters. But I noticed patterns and quality differences between the different scaling methods. I marked good parts with green lines, and bad ones with red lines:

YaRN scaling:

YaRN 2024-05-26_090901

I'm impressed how with YaRN scaling, the model gets things right the majority of the time. The increased context awareness shines through in nearly every response, which makes the scaling particularly valuable for shaping characters through the message history and example messages. It feels like as if the model is barely dumbed down compared to native 4k context.

NTK scaling:

NTK scaling 2024-05-26_100333

The good thing about NTK scaling is that it won't cause the model to malfunction (unlike linear scaling) and it's definitely usable. The bad part is, however, that the majority of the time, the character didn't know as much about the user. It's like as if this scaling method makes the character's memory hazy (as if the early context is not as accessible), requiring multiple attempts to help the character get on track. I also got the impression there is a small, increased chance for writing style errors, but it was OK.

Linear scaling:

koboldcpp linear scaling 2024-05-26_093127

Using linear scaling feels like as if it's only worth using if you have to. It appears like as if the model severely struggled to pull any information from earlier context. The writing style started to drift, plus the model made up new terms ("survibrant") and hallucinated its own context (e.g. "S.W." instead of "S.A.W."). This is problematic because character consistency is bound to degrade very quickly with these issues.

Now my question: Can you consider adding YaRN scaling to koboldcpp? It would be awesome to have this as part of koboldcpp, as this enhanced scaling mixes particularly well with context shifting too.

Thank you for considering and reading! 🙂