ggerganov / llama.cpp

LLM inference in C/C++
MIT License
60.95k stars 8.7k forks source link

Bug: Cannot load GGUF file, it asks if it is GGML. #8094

Closed takosalad closed 4 days ago

takosalad commented 4 days ago

What happened?

I just checked out the git repo, compiled: cmake .. -DLLAMA_CUDA=ON -DLLAMA_BLAS_VENDOR=OpenBLAS
cmake --build . --config Release
and tried to run a gguf file. Got the error: llama.cpp: loading model from models/WizardLM-2-7B-Q8_0-imat.gguf error loading model: unknown (magic, version) combination: 46554747, 00000003; is this really a GGML file? Shouldn't llamacpp load GGUF files just fine?

Name and Version

$ ./llama-cli --version version: 3215 (d62e4aaa) built with cc (GCC) 14.1.1 20240522 for x86_64-pc-linux-gnu

What operating system are you seeing the problem on?

Linux

Relevant log output

$ ./main -m models/WizardLM-2-7B-Q8_0-imat.gguf 
main: build = 843 (6e7cca4)
main: seed  = 1719233708
ggml_init_cublas: found 1 CUDA devices:
  Device 0: NVIDIA GeForce RTX 2080 Ti, compute capability 7.5
llama.cpp: loading model from models/WizardLM-2-7B-Q8_0-imat.gguf
error loading model: unknown (magic, version) combination: 46554747, 00000003; is this really a GGML file?
llama_load_model_from_file: failed to load model
llama_init_from_gpt_params: error: failed to load model 'models/WizardLM-2-7B-Q8_0-imat.gguf'
main: error: unable to load model
ngxson commented 4 days ago

main: build = 843 (6e7cca4)

Your build version is from https://github.com/ggerganov/llama.cpp/commit/6e7cca4 , which is a very old build. Try again with latest build.

Also, the command should be ./llama-cli, not ./main