Bug: Model isn't loading

ggerganov / llama.cpp

LLM inference in C/C++

MIT License

67.28k stars 9.67k forks source link

Bug: Model isn't loading #9563

Open iladshyan opened 1 month ago

iladshyan commented 1 month ago

What happened?

llama3.1 isn't loading at all. I get following in the terminal and the program just quits.:

./llama-cli -m "C:\<path>\llama3.1.gguf" -p "The world is a place where"
build: 3787 (6026da52) with MSVC 19.29.30154.0 for x64
main: llama backend init
main: load the model and apply lora adapter, if any

Name and Version

./llama-cli --version version: 3787 (6026da52) built with MSVC 19.29.30154.0 for x64

What operating system are you seeing the problem on?

Windows

Relevant log output

build: 3787 (6026da52) with MSVC 19.29.30154.0 for x64
main: llama backend init
main: load the model and apply lora adapter, if any

ninadakolekar commented 1 month ago

I'm also facing same issue. Tried updating to 3803 but didn't help.

iladshyan commented 1 month ago

I'm also facing same issue. Tried updating to 3803 but didn't help.

By any chance you are migrating from Ollama? If not where did you get your model files?

ninadakolekar commented 1 month ago

I downloaded GGUF from huggingface: https://huggingface.co/bartowski/Phi-3.5-mini-instruct-GGUF

salocinrevenge commented 2 days ago

I tried to run this exactly model: https://huggingface.co/bartowski/Phi-3.5-mini-instruct-GGUF/blob/main/Phi-3.5-mini-instruct-IQ2_M.gguf on UBUNTU and I got the problem of memory:

ggml/src/ggml.c:438: fatal error ggml_aligned_malloc: insufficient memory (attempted to allocate 49152,00 MB)

probably the memory allocator is "over allocing" memory.

I tried a bigger model (in parameters) https://huggingface.co/TheBloke/Llama-2-13B-GGUF/blob/main/llama-2-13b.Q2_K.gguf and the model ran normally with almost no extra memory

slaren commented 2 days ago

@salocinrevenge add -c 1024 to the command line to use a smaller context size.

@ggerganov we keep getting bug reports because people don't realize that they cannot use the full context of current models. Should we revert this change?