CANNOT run. Seems something wrong with the model?

(ggml) [lzb@VKF-NLP-GPU-01 ggml-server-example]$ (ggml) [lzb@VKF-NLP-GPU-01 ggml-server-example]$ python3 -m llama_cpp.server --model models/wizardLM-7B.ggmlv3.q4_0.bin gguf_init_from_file: invalid magic number 67676a74 error loading model: llama_model_loader: failed to load model from models/wizardLM-7B.ggmlv3.q4_0.bin

llama_load_model_from_file: failed to load model Traceback (most recent call last): File "/home/lzb/.conda/envs/ggml/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "/home/lzb/.conda/envs/ggml/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/home/lzb/.conda/envs/ggml/lib/python3.8/site-packages/llama_cpp/server/main.py", line 96, in app = create_app(settings=settings) File "/home/lzb/.conda/envs/ggml/lib/python3.8/site-packages/llama_cpp/server/app.py", line 337, in create_app llama = llama_cpp.Llama( File "/home/lzb/.conda/envs/ggml/lib/python3.8/site-packages/llama_cpp/llama.py", line 340, in init assert self.model is not None AssertionError

You're using the GGML model format but the log indicates that you're using the GGUF model format. I suspect you've installed the newer llama-cpp-python[server] which stopped supporting GGML format.

Solution

Either convert your model to GGUF format or install an earlier version of llama-cpp-python that supports GGML. e.g.,

pip3 install llama-cpp-python[server]==0.1.78

see details from https://pypi.org/project/llama-cpp-python/

[!WARNING] Starting with version 0.1.79 the model format has changed from ggmlv3 to gguf. Old model files can be converted using the convert-llama-ggmlv3-to-gguf.py script in llama.cpp

continuedev / ggml-server-example

CANNOT run. Seems something wrong with the model? #4

Solution