bigcode-project / starcoder.cpp

C++ implementation for 💫StarCoder
445 stars 36 forks source link

Why quantized by starcoder.cpp and by ggml example models are different? #11

Closed s-kostyaev closed 1 year ago

s-kostyaev commented 1 year ago

Looks like quantization, inference and model format are different between starcoder.cpp and upstream ggml. Why? And Why models are incompatible? For example if I try to use inference in starcoder.cpp model quantized by ggml example code I will see segmentation fault. Maybe need to update code to be compatible with upsteam?

aseok commented 1 year ago

Would you please provide instructions and resulted outputs?

s-kostyaev commented 1 year ago

Instructions step 1 Step 2:

%  ./main -t 8 -m path/to/starcoder-ggml-q4_1.bin -p 'def sieve_of_eratosthenes(n):'
main: seed = 1684941190
starcoder_model_load: loading model from '../text-generation-webui/models/starcoder-ggml-q4_1.bin'
starcoder_model_load: n_vocab = 49152
starcoder_model_load: n_ctx   = 8192
starcoder_model_load: n_embd  = 6144
starcoder_model_load: n_head  = 48
starcoder_model_load: n_layer = 40
starcoder_model_load: ftype   = 2003
starcoder_model_load: qntvr   = 2
starcoder_model_load: ggml ctx size = 28956.47 MB
starcoder_model_load: memory size = 15360.00 MB, n_mem = 327680
zsh: segmentation fault  ./main -t 8 -m path/to/starcoder-ggml-q4_1.bin -p

Also hash sums are different between models quantized by ggml and by starcoder.cpp

s-kostyaev commented 1 year ago

If I quantize model by starcoder.cpp and then run with starcoder.cpp all works fine. But if I quantize model by latest ggml and then try to run it with starcoder.cpp I will see segfault.

s-kostyaev commented 1 year ago

Models quantized by latest version of ggml can be found here: https://huggingface.co/NeoDim

NouamaneTazi commented 1 year ago

Also hash sums are different between models quantized by ggml and by starcoder.cpp

hash sum indicates the ggml version used to build your checkpoint. It's normal that if your checkpoint's hash is different from the library it won't run properly. I suggest you use the same library to convert and run the model you want.

And to answer why is starcoder.cpp 's hash different from ggml. Because they don't necessarily support the same features. Hope that answers your questions