ggerganov / llama.cpp

LLM inference in C/C++
MIT License
64.6k stars 9.25k forks source link

Bug: Failed to load model #8516

Closed Fulgurance closed 1 month ago

Fulgurance commented 1 month ago

What happened?

Hi guys. I have got a problem after I compile Llama on my machine. It built properly, but when I try to run it, it is looking for a file don't even exist (a model).

Is it normal ?

Name and Version

version: 0 (unknown) built with cc (Gentoo Hardened 14.1.1_p20240622 p2) 14.1.1 20240622 for x86_64-pc-linux-gnu

What operating system are you seeing the problem on?

Linux

Relevant log output

zohran@alienware-m17-r3 ~/Downloads/llama.cpp-b3400 $ ./examples/chat.sh
Log start
main: build = 0 (unknown)
main: built with cc (Gentoo Hardened 14.1.1_p20240622 p2) 14.1.1 20240622 for x86_64-pc-linux-gnu
main: seed  = 1721142929
llama_model_load: error loading model: llama_model_loader: failed to load model from ./models/llama-7b/ggml-model-q4_0.gguf

llama_load_model_from_file: failed to load model
llama_init_from_gpt_params: error: failed to load model './models/llama-7b/ggml-model-q4_0.gguf'
main: error: unable to load model

zohran@alienware-m17-r3 ~/Downloads/llama.cpp-b3400 $ ls 
AUTHORS                        llama-convert-llama2c-to-ggml  llama-simple
build                          llama-cvector-generator        llama-speculative
ci                             llama-embedding                llama-tokenize
cmake                          llama-eval-callback            llama-train-text-from-scratch
CMakeLists.txt                 llama-export-lora              llama-vdot
CMakePresets.json              llama-finetune                 main
common                         llama-gbnf-validator           main.log
CONTRIBUTING.md                llama-gguf                     Makefile
convert_hf_to_gguf.py          llama-gguf-hash                media
convert_hf_to_gguf_update.py   llama-gguf-split               models
convert_llama_ggml_to_gguf.py  llama-gritlm                   mypy.ini
convert_lora_to_gguf.py        llama-imatrix                  Package.swift
docs                           llama-infill                   pocs
examples                       llama-llava-cli                poetry.lock
flake.lock                     llama-lookahead                prompts
flake.nix                      llama-lookup                   pyproject.toml
ggml                           llama-lookup-create            pyrightconfig.json
gguf-py                        llama-lookup-merge             README.md
grammars                       llama-lookup-stats             requirements
include                        llama-parallel                 requirements.txt
libllava.a                     llama-passkey                  scripts
LICENSE                        llama-perplexity               SECURITY.md
llama-baby-llama               llama-q8dot                    server
llama-batched                  llama-quantize                 spm-headers
llama-batched-bench            llama-quantize-stats           src
llama-bench                    llama-retrieval                tests
llama-benchmark-matmult        llama-save-load-state
llama-cli                      llama-server
MartinRepo commented 1 month ago

You should download a model firstly, mate.

If you check your models dir, probably you cannot find llama-7b.

Fulgurance commented 1 month ago

Sorry I am starting with llama. So I clone that model for example: https://huggingface.co/THUDM/glm-4-9b

But I don't see any .gguf file. I guess I have to generate it ? How I am suppose to do that ?

MartinRepo commented 1 month ago

For this kind of model, you need to convert it to gguf, use convert-hf-to-gguf.py to do it, you can find details in documentation.

However, if you just want a quick start, have a try of "quantised-ready" model, like this: https://huggingface.co/TheBloke/Llama-2-13B-chat-GGUF

Fulgurance commented 1 month ago

So I tried to run your model. But I get an error again.

zohran@alienware-m17-r3 ~/Downloads/llama.cpp-b3400 $ ./llama-cli -m /home/zohran/Downloads/Llama-2-13B-chat-GGUF/llama-2-13b-chat.Q8_0.gguf  -p "How are you?"
Log start
main: build = 0 (unknown)
main: built with cc (Gentoo Hardened 14.1.1_p20240622 p2) 14.1.1 20240622 for x86_64-pc-linux-gnu
main: seed  = 1721145009
gguf_init_from_file: invalid magic characters 'vers'
llama_model_load: error loading model: llama_model_loader: failed to load model from /home/zohran/Downloads/Llama-2-13B-chat-GGUF/llama-2-13b-chat.Q8_0.gguf

llama_load_model_from_file: failed to load model
llama_init_from_gpt_params: error: failed to load model '/home/zohran/Downloads/Llama-2-13B-chat-GGUF/llama-2-13b-chat.Q8_0.gguf'
main: error: unable to load model
MartinRepo commented 1 month ago

On huggingface, there is a demo code for llama.cpp (at the top-right corner "Use this model" button). Probably have a try: ./llama-cli --hf-repo "TheBloke/Llama-2-13B-chat-GGUF" -m llama-2-13b-chat.Q2_K.gguf -p "How are you?" -n 128

Fulgurance commented 1 month ago

Okay I understood how to download now.

So I have a question. Which model do you recommend me ? Because I would like to integrant an IA assistant inside the linux distribution I am making, and I would like to teach the assistant how to manage the system with my tools. The one you gave me is good you think for that ? Basically I would like to this assistant to be able to run command.

I would like as well when the AI answer to stop talking too much xD. How can I allow the AI to run some bash command in my system ?

Fulgurance commented 1 month ago

Because the terminal is always showing extra text, I want to avoid that (with im_end and tips):

> Hello
Hi there! How can I help you today?
<|im_end|>

In this example, the 
> 
oldgithubman commented 1 month ago

Okay I understood how to download now.

So I have a question. Which model do you recommend me ? Because I would like to integrant an IA assistant inside the linux distribution I am making, and I would like to teach the assistant how to manage the system with my tools. The one you gave me is good you think for that ? Basically I would like to this assistant to be able to run command.

I would like as well when the AI answer to stop talking too much xD. How can I allow the AI to run some bash command in my system ?

TBH, I don't think you're going to get a good answer to that question here. You're clearly new to this stuff and have a lot of homework to do. What you want to do is extremely complicated and probably well out of reach for your skill level. My advice is to start doing a lot of research and attempt far easier projects first. You're asking how to design a car when you don't know how to drive. Also, this is not the appropriate place for these questions. I figured this comment would be more helpful to you than silence. Good luck

Fulgurance commented 1 month ago

I understood how to do now no worries 👍 I asked somewhere else with people more able to explain. It's not that much complicate...

oldgithubman commented 1 month ago

I understood how to do now no worries 👍 I asked somewhere else with people more able to explain. It's not that much complicate...

Let me know when you're done so I can check it out!