LostRuins / koboldcpp

A simple one-file way to run various GGML and GGUF models with KoboldAI's UI
https://github.com/lostruins/koboldcpp
GNU Affero General Public License v3.0
4.36k stars 312 forks source link

Request to keep `smartcontext` option #837

Open Nabokov86 opened 1 month ago

Nabokov86 commented 1 month ago

Regarding this change: 'Deprecated some old flags'.

Might it be possible to preserve the smartcontext option instead of removing it? I find it particularly useful for my workflow.

henk717 commented 1 month ago

What does smartcontext allow you to do that context shifting doesnt?

LostRuins commented 1 month ago

Adding on to what henk said, for GGUF models context shift is a strict upgrade, smartcontext is only useful for old models that don't support it.

And context shift can be disabled with --noshift

Nabokov86 commented 1 month ago

Smart context is significantly faster in certain scenarios.

For example, I use an 8K model with my chat assistant and store my chat history in a single JSON file. With context shifting, it would process the entire 8K context every time I start a conversation, which results in painfully slow generation. In contrast, smart context only processes a portion of the context, making it faster both during processing and generation.

Nabokov86 commented 1 month ago

I also adjusted the default SCTruncationRatio value to only process 20% with smartcontext. This suits my needs perfectly.

While I require 8K context for generation, I don't want the entire 8K processed at once. With context shifting I can't achieve this.

In my opinion, smart context has some benefits in certain use cases.

LostRuins commented 1 month ago

But the question is, how is it preferable to context shift, which is just as good but even faster? That option allows for zero reprocessing but without losing any context at all.

Nabokov86 commented 1 month ago

@LostRuins Context shifting isn't faster for my use case. With context shifting, it processes the entire 8K context and then continues generating at 8K, which is much slower. In contrast, SmartContext only processes a certain amount of text (the last 20% in my scenario) and continues generating at around 1.5K.

As a result, smart context is much faster for me, both during processing and generation, as I don't need to process the entire 8K in the first place.

Additionally, if I remove or modify a chunk of text with context shift, it will cause the entire 8K context to be reprocessed, which is frustrating.

By the way, could you explain why you decided to remove it? It's still available for some models, right? If you believe that smartcontext is inferior, I understand hiding this flag from the help page or advising against its use. However, keeping the functionality available for all models, seems reasonable. It would be beneficial to have a choice between the two.

henk717 commented 1 month ago

Hiding it is basically what he did. The flag should still work at least at the moment. The issue with context shift is that it cuts your effective context in half so if you set it to 8K once your context limit is reached it really just becomes 4K. Thats something most users don't want. You could experiment just putting it at 4K because it should give the same effect.

LostRuins commented 1 month ago

fuck it, fine, i'll revert the smartcontext flag and add it back

aleksusklim commented 1 month ago

Users want two things:

  1. Fast loading of old history for which a cache should be implemented somehow; https://github.com/LostRuins/koboldcpp/issues/445
  2. Reliable editing of old turns without reprocessing as ContextShift does occasionally.

For the second point, here is what you can do: