abdeladim-s / pyllamacpp

Python bindings for llama.cpp
https://abdeladim-s.github.io/pyllamacpp/
MIT License
60 stars 21 forks source link

Can't import vicuna models : `(bad f16 value 5)` #7

Closed mh4ckt3mh4ckt1c4s closed 1 year ago

mh4ckt3mh4ckt1c4s commented 1 year ago

When I try to load the vicuna models downloaded from this page, I have the following error :

# pyllamacpp /models/ggml-vicuna-7b-1.1-q4_2.bin 

██████╗ ██╗   ██╗██╗     ██╗      █████╗ ███╗   ███╗ █████╗  ██████╗██████╗ ██████╗ 
██╔══██╗╚██╗ ██╔╝██║     ██║     ██╔══██╗████╗ ████║██╔══██╗██╔════╝██╔══██╗██╔══██╗
██████╔╝ ╚████╔╝ ██║     ██║     ███████║██╔████╔██║███████║██║     ██████╔╝██████╔╝
██╔═══╝   ╚██╔╝  ██║     ██║     ██╔══██║██║╚██╔╝██║██╔══██║██║     ██╔═══╝ ██╔═══╝ 
██║        ██║   ███████╗███████╗██║  ██║██║ ╚═╝ ██║██║  ██║╚██████╗██║     ██║     
╚═╝        ╚═╝   ╚══════╝╚══════╝╚═╝  ╚═╝╚═╝     ╚═╝╚═╝  ╚═╝ ╚═════╝╚═╝     ╚═╝     

PyLLaMACpp
A simple Command Line Interface to test the package
Version: 2.1.3 

=========================================================================================

[+] Running model `/models/ggml-vicuna-7b-1.1-q4_2.bin`
[+] LLaMA context params: `{}`
[+] GPT params: `{}`
llama_model_load: loading model from '/models/ggml-vicuna-7b-1.1-q4_2.bin' - please wait ...
llama_model_load: n_vocab = 32000
llama_model_load: n_ctx   = 512
llama_model_load: n_embd  = 4096
llama_model_load: n_mult  = 256
llama_model_load: n_head  = 32
llama_model_load: n_layer = 32
llama_model_load: n_rot   = 128
llama_model_load: f16     = 5
llama_model_load: n_ff    = 11008
llama_model_load: n_parts = 1
llama_model_load: type    = 1
llama_model_load: invalid model file '/models/ggml-vicuna-7b-1.1-q4_2.bin' (bad f16 value 5)
llama_init_from_file: failed to load model
Segmentation fault (core dumped)

I do not have this problem when using the gpt4all models. Running the vicuna models with the latest version of llama.cpp works just fine.

pajoma commented 1 year ago

Same for me with model Pi3141/alpaca-native-7B-ggml

Output from llama.cpp

llama.cpp: loading model from ./models/ggml-model-q5_1.bin
llama_model_load_internal: format     = ggjt v1 (latest)
llama_model_load_internal: n_vocab    = 32000
llama_model_load_internal: n_ctx      = 2048
llama_model_load_internal: n_embd     = 4096
llama_model_load_internal: n_mult     = 256
llama_model_load_internal: n_head     = 32
llama_model_load_internal: n_layer    = 32
llama_model_load_internal: n_rot      = 128
llama_model_load_internal: ftype      = 9 (mostly Q5_1)
llama_model_load_internal: n_ff       = 11008
llama_model_load_internal: n_parts    = 1
llama_model_load_internal: model size = 7B
llama_model_load_internal: ggml ctx size =  68,20 KB
llama_model_load_internal: mem required  = 6612,58 MB (+ 1026,00 MB per state)
llama_init_from_file: kv self size  = 1024,00 MB

Output from pyllamacpp

[+] Running model `models/ggml-model-q5_1.bin`
[+] LLaMA context params: `{'n_ctx': 2048}`
[+] GPT params: `{}`
llama_model_load: loading model from 'models/ggml-model-q5_1.bin' - please wait ...
llama_model_load: n_vocab = 32000
llama_model_load: n_ctx   = 2048
llama_model_load: n_embd  = 4096
llama_model_load: n_mult  = 256
llama_model_load: n_head  = 32
llama_model_load: n_layer = 32
llama_model_load: n_rot   = 128
llama_model_load: f16     = 9
llama_model_load: n_ff    = 11008
llama_model_load: n_parts = 1
llama_model_load: type    = 1
llama_model_load: invalid model file 'models/ggml-model-q5_1.bin' (bad f16 value 9)
llama_init_from_file: failed to load model
Segmentation fault
mh4ckt3mh4ckt1c4s commented 1 year ago

Original page have been archived, but links are still available here : https://github.com/nomic-ai/gpt4all/tree/main/gpt4all-chat

abdeladim-s commented 1 year ago

Thanks @mh4ckt3mh4ckt1c4s for reporting the issue. Maybe there are new updates on the llama.cpp side .. I'll try to sync the repo once I get some time.

abdeladim-s commented 1 year ago

Hi guys, I pushed a new release v2.2.0, could you please give it a try ? I tested it with vicuna and alpaca and both seem to be working on my end ?

mh4ckt3mh4ckt1c4s commented 1 year ago

Hello, I tested with Vicuna and it works with 2.2.0 but not with the latest 2.3.0. Is that normal ?

abdeladim-s commented 1 year ago

Hello, I tested with Vicuna and it works with 2.2.0 but not with the latest 2.3.0. Is that normal ?

@mh4ckt3mh4ckt1c4s Yes it is normal as llama.cpp recent changes broke older models, so you will need to re-quantize the old models to work with the new update.

mh4ckt3mh4ckt1c4s commented 1 year ago

Okay, so from my point of view this issue is closed. Thanks for your work !