Open schmorp opened 6 months ago
The CLI args for that flag are horrendous. I'm sure there must be a better way to do it.
Well, the override-kv is more of a low-level tool, but it can be very useful. For the concrete problem of setting the pre-tokenizer type, a simple "--pretokenizer llama3" or so would suffice. less generic, much less horrendous, I would assume.
The advantage of override-kv, other than being generic, would be compatibility with llama.cpp, as I see this a lot.
But having any way to override the pretokenizer with koboldcpp would greatly help. I don't see hundreds or even thousands of models to be requantized anytime soon, and this affects all kinds of models, not just llama3 models. (deepseek, command-r, practically anything that is not llama 2).
It would be pretty much in the same vein as --contextsize or --ropeconfig, which also override model-provided kv values.
I can't use koboldcpp anymore for command-r plus because nobody wants to requantize it instead of using -override kv in llamacpp.
There's a fix for command-r plus coming out in the next version. In the meantime, you can try using an older version first.
There's a fix for command-r plus coming out in the next version. In the meantime, you can try using an older version first.
Cool, thank you!
I'm still getting the generation quality warning in 1.68.
llama.cpp has an override-kv option that can be used to override, well, model kv values. This can be useful with the myriad of existing ggufs that don't have a pretokenizer specified. It would be nice if koboldcpp had such an option (either generic override-kv option, or a way to specify/override the pretokenizer string). Or both. override-kv can be useful in a variety of other ways, but is of course more effort than just being able to specify the pretokenizer type.
Having a llama.cpp-compatible syntax for override-kv would also be a plus for users who could use instructions written for llama.cpp.