feature request: add override-kv, or at least a way to specify the pretokenizer

LostRuins / koboldcpp

Run GGUF models easily with a KoboldAI UI. One File. Zero Install.

https://github.com/lostruins/koboldcpp

GNU Affero General Public License v3.0

5.3k stars 362 forks source link

feature request: add override-kv, or at least a way to specify the pretokenizer #819

Open schmorp opened 6 months ago

schmorp commented 6 months ago

llama.cpp has an override-kv option that can be used to override, well, model kv values. This can be useful with the myriad of existing ggufs that don't have a pretokenizer specified. It would be nice if koboldcpp had such an option (either generic override-kv option, or a way to specify/override the pretokenizer string). Or both. override-kv can be useful in a variety of other ways, but is of course more effort than just being able to specify the pretokenizer type.

Having a llama.cpp-compatible syntax for override-kv would also be a plus for users who could use instructions written for llama.cpp.

LostRuins commented 6 months ago

The CLI args for that flag are horrendous. I'm sure there must be a better way to do it.

schmorp commented 6 months ago

Well, the override-kv is more of a low-level tool, but it can be very useful. For the concrete problem of setting the pre-tokenizer type, a simple "--pretokenizer llama3" or so would suffice. less generic, much less horrendous, I would assume.

The advantage of override-kv, other than being generic, would be compatibility with llama.cpp, as I see this a lot.

But having any way to override the pretokenizer with koboldcpp would greatly help. I don't see hundreds or even thousands of models to be requantized anytime soon, and this affects all kinds of models, not just llama3 models. (deepseek, command-r, practically anything that is not llama 2).

It would be pretty much in the same vein as --contextsize or --ropeconfig, which also override model-provided kv values.

Rotatingxenomorph commented 6 months ago

I can't use koboldcpp anymore for command-r plus because nobody wants to requantize it instead of using -override kv in llamacpp.

LostRuins commented 6 months ago

There's a fix for command-r plus coming out in the next version. In the meantime, you can try using an older version first.

Rotatingxenomorph commented 6 months ago

There's a fix for command-r plus coming out in the next version. In the meantime, you can try using an older version first.

Cool, thank you!

Rotatingxenomorph commented 4 months ago

I'm still getting the generation quality warning in 1.68.