Closed s-kostyaev closed 1 year ago
Would you please provide instructions and resulted outputs?
Instructions step 1 Step 2:
% ./main -t 8 -m path/to/starcoder-ggml-q4_1.bin -p 'def sieve_of_eratosthenes(n):'
main: seed = 1684941190
starcoder_model_load: loading model from '../text-generation-webui/models/starcoder-ggml-q4_1.bin'
starcoder_model_load: n_vocab = 49152
starcoder_model_load: n_ctx = 8192
starcoder_model_load: n_embd = 6144
starcoder_model_load: n_head = 48
starcoder_model_load: n_layer = 40
starcoder_model_load: ftype = 2003
starcoder_model_load: qntvr = 2
starcoder_model_load: ggml ctx size = 28956.47 MB
starcoder_model_load: memory size = 15360.00 MB, n_mem = 327680
zsh: segmentation fault ./main -t 8 -m path/to/starcoder-ggml-q4_1.bin -p
Also hash sums are different between models quantized by ggml
and by starcoder.cpp
If I quantize model by starcoder.cpp
and then run with starcoder.cpp
all works fine. But if I quantize model by latest ggml
and then try to run it with starcoder.cpp
I will see segfault.
Models quantized by latest version of ggml
can be found here: https://huggingface.co/NeoDim
Also hash sums are different between models quantized by ggml and by starcoder.cpp
hash sum indicates the ggml version used to build your checkpoint. It's normal that if your checkpoint's hash is different from the library it won't run properly. I suggest you use the same library to convert and run the model you want.
And to answer why is starcoder.cpp 's hash different from ggml. Because they don't necessarily support the same features. Hope that answers your questions
Looks like quantization, inference and model format are different between starcoder.cpp and upstream ggml. Why? And Why models are incompatible? For example if I try to use inference in starcoder.cpp model quantized by ggml example code I will see segmentation fault. Maybe need to update code to be compatible with upsteam?