LostRuins / koboldcpp

Run GGUF models easily with a KoboldAI UI. One File. Zero Install.
https://github.com/lostruins/koboldcpp
GNU Affero General Public License v3.0
4.66k stars 334 forks source link

Format change. Just delete this if you're already aware. I assume you are, but just in case. #161

Closed Innomen closed 1 year ago

Innomen commented 1 year ago

https://www.reddit.com/r/LocalLLaMA/comments/13fnyah/comment/jjxc7x2/

unknown (magic, version) combination: 67676a74, 00000002; is this really a GGML file?

"Update models for llama.cpp May 12th breaking quantisation change."

https://huggingface.co/TheBloke/gpt4-x-vicuna-13B-GGML/commit/b4b5f7e523f35306412d10ea9c4922b6f5923719

LostRuins commented 1 year ago

Yes, I am aware. It is a major change and I am cranking out a fix as fast as I can.

Innomen commented 1 year ago

Ok I assumed as much, sorry for the distraction. Thanks for answering :)

LostRuins commented 1 year ago

I've released a new beta version for the new formats! It should also work for the old ones. Do let me know if it works.

Innomen commented 1 year ago

It's generating my first response now :) My machine is slow.

This command caused it to implode, but normal drag and drop worked.

koboldcpp.exe --highpriority --smartcontext --useclblast 0 0 gpt4-x-vicuna-13B.ggml.q8_0.bin

I don't know what those arguments do, I just copied it from advice on the Reddit.

It's generating every slowly, but I don't know if that's the model or something else. I'm not in a big hurry. I'll let you know if I have more. Thanks much for what you've made here. I wouldn't be able to use any of this without kobold.

bubbabug commented 1 year ago

In my experience, on my device --highpriority was not compatible with --smartcontext and would cause the full prompt to load on each generation.