cocktailpeanut / dalai

The simplest way to run LLaMA on your local machine
https://cocktailpeanut.github.io/dalai
13.1k stars 1.42k forks source link

main: failed to load model from 'models/7B/ggml-model-q4_0.bin' #290

Open pdavis68 opened 1 year ago

pdavis68 commented 1 year ago

I've spent hours struggling to get all this to work. I would really appreciate any help anyone can offer.

I'm running in a Windows 10 environment.

I've tried running npx dalai llama install 7B --home F:\LLM\dalai

It mostly installs but the post processing of the model doesn't seem to work. The ggml-model-q4_0.bin is empty and the return code from the quantize method suggests that an illegal instruction is being executed (I was running it as admin and I ran it manually to check the errorlevel).

The quantize "usage" suggests that it wants a model-f32.bin, but a -f16 file is what's produced during the post processing.

I got the latest code from alpaca.cpp and build it. I generated a a -f32 file using its convert-pth--to-gml.py. I then used its quantize executable to produce a ggml-model-q4_0.bin

Another issue I had is that dalai is trying to run the executable "main" (which doesn't exist) from the "build\Release" directory, which is empty, because everything got built to "build\bin\Release"

Based on digging through issues, I surmised that llama.exe is what needed to be renamed main.exe. But since I built everything from the latest alpaca.cpp, I instead took their chat.exe and renamed it main.exe and copied it into the "build\Release" folder

Still, I had no success. Looking at the console, I saw the command that was being run, so I run it from the command-line manually:

This the the command and the results I get:

main --seed -1 --threads 4 --n_predict 200 --model models/7B/ggml-model-q4_0.bin --top_k 40 --top_p 0.9 --temp 0.8 --repeat_last_n 64 --repeat_penalty 1.3 -p "The expected response for a highly intelligent chatbot to `""Are you working`"" is \n"
main: seed = 1679870158
llama_model_load: loading model from 'models/7B/ggml-model-q4_0.bin' - please wait ...
llama_model_load: failed to open 'models/7B/ggml-model-q4_0.bin'
main: failed to load model from 'models/7B/ggml-model-q4_0.bin'

I'm not sure what else to do at this point. I hope that some of the issues I've noticed can lead to fixes in the install. But in the meantime, if anyone has any ideas for how I can get this working, I'd appreciate it.

Update: I built an f16 file and generated the q4_0 file for that. Same problem. I would think since I used convert-pth--to-gml.py, quantize, and main (chat.exe) from the latest alpaca.cpp, that main should be able to read the model quantized by that code.

Update 2 Got the latest code from llama.cpp. Then built everything. The quantize returns with an exit code of -1073741795 (0xC0000022). The one from alpaca.cpp generates a q4_0 file that's 296K. I don't know if that's correct or not, but it's the one that doesn't work with the chat.exe (main) shown above.

Update 3 Ran the install in a Linux VM and everything went much smoother. The main executable runs and responds, but not from the dalai web site. In the console, it LOOKS like it's executing it correctly and in the proper directory, but it just hangs and nothing happens. But running the same command in a terminal works just fine.

I would prefer to run it in Windows, but this will suffice for the time being.

suoko commented 1 year ago

LLAMA works fine, I have problems with alpaca though, this is what I see when running from website and debug activated:

main: seed = 1683269634
llama_model_load: loading model from 'models/7B/ggml-model-q4_0.bin' - please wait ...
llama_model_load: invalid model file 'models/7B/ggml-model-q4_0.bin' (bad magic)
main: failed to load model from 'models/7B/ggml-model-q4_0.bin'
user@pc03: ~/dalai/alpaca
user@pc03:~/dalai/alpaca$ exit
exit
shasaur commented 1 year ago

Any updates on this? Experiencing the same issue and error main: failed to load model from 'models/7B/ggml-model-q4_0.bin'

The full error shown in the web interface with debug enabled:

/root/dalai/alpaca/main --seed -1 --threads 4 --n_predict 200 --model models/7B/ggml-model-q4_0.bin --top_k 40 --top_p 0.9 --temp 0.8 --repeat_last_n 64 --repeat_penalty 1.3 -p "Testing 1 2 "
exit
root@25f131e6438c:~/dalai/alpaca# /root/dalai/alpaca/main --seed -1 --threads 4 --n_predict 200 --model models/7B/ggml-model-q4_0.bin --top_k 40 --top_p 0.9 --temp 0.8 --repeat_last_n 64 --repeat_penalty 1.3 -p "Testing 1 2 "

main: seed = 1683271858

llama_model_load: loading model from 'models/7B/ggml-model-q4_0.bin' - please wait ...

llama_model_load: invalid model file 'models/7B/ggml-model-q4_0.bin' (bad magic)

main: failed to load model from 'models/7B/ggml-model-q4_0.bin'

root@25f131e6438c:~/dalai/alpaca# exit

exit

When you say you are running it in Windows, I assume you are using Docker. If so, and you were able to get it working in a Linux VM, could it be that perhaps the Linux version being used in the dalai docker image should be different? Or is it just a problem with Docker and you used the same Linux version?

natethinks commented 1 year ago

Getting the same error on an M1 Macbook Pro.

wip-abramson commented 1 year ago

Running into th esame problem on ubuntu

mrtsm commented 1 year ago

Same on Ubuntu Google VM

phou34 commented 1 year ago

Running into the exact same issue on Windows. Anyone with thoughts on how to resolve?

main: seed = 1684118299
llama_model_load: loading model from 'models/7B/ggml-model-q4_0.bin' - please wait ...
llama_model_load: invalid model file 'models/7B/ggml-model-q4_0.bin' (bad magic)      
main: failed to load model from 'models/7B/ggml-model-q4_0.bin'X
CyberRide commented 1 year ago

Running into the exact same issue on Windows. Anyone with thoughts on how to resolve?

main: seed = 1684118299
llama_model_load: loading model from 'models/7B/ggml-model-q4_0.bin' - please wait ...
llama_model_load: invalid model file 'models/7B/ggml-model-q4_0.bin' (bad magic)      
main: failed to load model from 'models/7B/ggml-model-q4_0.bin'X

no. uptill now still having problem running it

BernLeWal commented 1 year ago

The same with me, on Ubuntu+docker and WSL2+Ubuntu+docker.

fabiomb commented 1 year ago

Same problem on Windows 11

llama_model_load: loading model from 'models/7B/ggml-model-q4_0.bin' - please wait ...
llama_model_load: failed to open 'models/7B/ggml-model-q4_0.bin'
main: failed to load model from 'models/7B/ggml-model-q4_0.bin'
DaveInchy commented 1 year ago

Same on a VPS 6 Core Intel CPU with 16GB Memory, Same on my custom "minimized" arch install on WSL2 Win11, Same on Windows with git bash and Powershell with a GPU RTX 2060 Native - Also no luck on a HyperV VM with GPU passthrough, it is also an arch installation with open-source gpu drivers, desktop GNOME idk if that matters, probs not.

Tried to run 30B Alpaca, had not enough ram so it core dumped but no errors reading the file.

main: seed = 1684838005
llama_model_load: loading model from 'models/30B/ggml-model-q4_0.bin' - please wait ...
llama_model_load: ggml ctx size = 25631.50 MB
Segmentation fault (core dumped)

Tried to run 13B Alpaca

llama_model_load: loading model from 'models/13B/ggml-model-q4_0.bin' - please wait ...
llama_model_load: invalid model file 'models/13B/ggml-model-q4_0.bin' (bad magic)
main: failed to load model from 'models/13B/ggml-model-q4_0.bin'
root@ubuntu: ~/dalai/alpaca
root@ubuntu: ~/dalai/alpaca# exit

Tried the "would work def." 7B Alpaca model once more

main: seed = 1684838095
llama_model_load: loading model from 'models/7B/ggml-model-q4_0.bin' - please wait ...
llama_model_load: invalid model file 'models/7B/ggml-model-q4_0.bin' (bad magic)
main: failed to load model from 'models/7B/ggml-model-q4_0.bin'
root@ubuntu: ~/dalai/alpaca
root@ubuntu: ~/dalai/alpaca# exit

exit

Idk what models you guys talking about, llama or alpaca - self quantized? pre-quantizied? A modified version?

I'm pretty sure that llama works fine for me up to 13B was tested, 30B and 65B is something i have no hardware for.

I hope this helps

EDIT

I just tried all the llama models, they all seem to be fine, and the binary files loadin like normal. However when i run both models i get this as result: image

josmac69 commented 1 year ago

I installed docker version on Ubuntu 22 and 30B model is working but 7B and 13B are failing. Although I installed them repeatedly. Looks like files are broken on source.

RockyNiu commented 1 year ago

Mark. Same issue here. Will try previous version.

RockyNiu commented 1 year ago

Might try to use this commit 66bc9af0f5c0a9ff386f20a8b2f351b47eed25a5 (the last published version before May 19th, reference https://www.reddit.com/r/LocalLLaMA/comments/13md90j/another_new_llamacpp_ggml_breaking_change/)

BernLeWal commented 1 year ago

I used the already quantizied models downloaded by the link from https://github.com/ItsPi3141/alpaca.cpp/tree/master

RockyNiu commented 1 year ago

Resolved it by downloading curl -o ggml-model-q4_0.bin -C - https://ipfs.io/ipfs/QmQ1bf2BTnYxq73MFJWu1B7bQ2UD6qG7D7YDCxhTndVkPC please make sure it in the dictionary models/7B

It works, I'm not sure about the source though, since it's from ipfs, here is the reference https://github.com/ItsPi3141/alpaca.cpp/tree/master

my-umd commented 1 year ago

Hmm. Not working for me after downloading with the curl command here.

my-umd commented 1 year ago

Update: I still cannot get the downloaded model (from huggingface) to work with llama.cpp on an AWS AMI EC2 instance (t2.xlarge). I had to follow llama.cpp instruction to make it from model weights, and it worked. Maybe worth a try if you cannot use the pre-made ggml q4_0 model(s). On the EC2 instance, it took me about 2-3 hours to make 7B, 13B, 30B, and 65B from downloading to converting.