Open pdavis68 opened 1 year ago
LLAMA works fine, I have problems with alpaca though, this is what I see when running from website and debug activated:
main: seed = 1683269634
llama_model_load: loading model from 'models/7B/ggml-model-q4_0.bin' - please wait ...
llama_model_load: invalid model file 'models/7B/ggml-model-q4_0.bin' (bad magic)
main: failed to load model from 'models/7B/ggml-model-q4_0.bin'
user@pc03: ~/dalai/alpaca
user@pc03:~/dalai/alpaca$ exit
exit
Any updates on this? Experiencing the same issue and error main: failed to load model from 'models/7B/ggml-model-q4_0.bin'
The full error shown in the web interface with debug enabled:
/root/dalai/alpaca/main --seed -1 --threads 4 --n_predict 200 --model models/7B/ggml-model-q4_0.bin --top_k 40 --top_p 0.9 --temp 0.8 --repeat_last_n 64 --repeat_penalty 1.3 -p "Testing 1 2 "
exit
root@25f131e6438c:~/dalai/alpaca# /root/dalai/alpaca/main --seed -1 --threads 4 --n_predict 200 --model models/7B/ggml-model-q4_0.bin --top_k 40 --top_p 0.9 --temp 0.8 --repeat_last_n 64 --repeat_penalty 1.3 -p "Testing 1 2 "
main: seed = 1683271858
llama_model_load: loading model from 'models/7B/ggml-model-q4_0.bin' - please wait ...
llama_model_load: invalid model file 'models/7B/ggml-model-q4_0.bin' (bad magic)
main: failed to load model from 'models/7B/ggml-model-q4_0.bin'
root@25f131e6438c:~/dalai/alpaca# exit
exit
When you say you are running it in Windows, I assume you are using Docker. If so, and you were able to get it working in a Linux VM, could it be that perhaps the Linux version being used in the dalai docker image should be different? Or is it just a problem with Docker and you used the same Linux version?
Getting the same error on an M1 Macbook Pro.
Running into th esame problem on ubuntu
Same on Ubuntu Google VM
Running into the exact same issue on Windows. Anyone with thoughts on how to resolve?
main: seed = 1684118299
llama_model_load: loading model from 'models/7B/ggml-model-q4_0.bin' - please wait ...
llama_model_load: invalid model file 'models/7B/ggml-model-q4_0.bin' (bad magic)
main: failed to load model from 'models/7B/ggml-model-q4_0.bin'X
Running into the exact same issue on Windows. Anyone with thoughts on how to resolve?
main: seed = 1684118299 llama_model_load: loading model from 'models/7B/ggml-model-q4_0.bin' - please wait ... llama_model_load: invalid model file 'models/7B/ggml-model-q4_0.bin' (bad magic) main: failed to load model from 'models/7B/ggml-model-q4_0.bin'X
no. uptill now still having problem running it
The same with me, on Ubuntu+docker and WSL2+Ubuntu+docker.
Same problem on Windows 11
llama_model_load: loading model from 'models/7B/ggml-model-q4_0.bin' - please wait ...
llama_model_load: failed to open 'models/7B/ggml-model-q4_0.bin'
main: failed to load model from 'models/7B/ggml-model-q4_0.bin'
Same on a VPS 6 Core Intel CPU with 16GB Memory, Same on my custom "minimized" arch install on WSL2 Win11, Same on Windows with git bash and Powershell with a GPU RTX 2060 Native - Also no luck on a HyperV VM with GPU passthrough, it is also an arch installation with open-source gpu drivers, desktop GNOME idk if that matters, probs not.
main: seed = 1684838005
llama_model_load: loading model from 'models/30B/ggml-model-q4_0.bin' - please wait ...
llama_model_load: ggml ctx size = 25631.50 MB
Segmentation fault (core dumped)
llama_model_load: loading model from 'models/13B/ggml-model-q4_0.bin' - please wait ...
llama_model_load: invalid model file 'models/13B/ggml-model-q4_0.bin' (bad magic)
main: failed to load model from 'models/13B/ggml-model-q4_0.bin'
root@ubuntu: ~/dalai/alpaca
root@ubuntu: ~/dalai/alpaca# exit
main: seed = 1684838095
llama_model_load: loading model from 'models/7B/ggml-model-q4_0.bin' - please wait ...
llama_model_load: invalid model file 'models/7B/ggml-model-q4_0.bin' (bad magic)
main: failed to load model from 'models/7B/ggml-model-q4_0.bin'
root@ubuntu: ~/dalai/alpaca
root@ubuntu: ~/dalai/alpaca# exit
exit
Idk what models you guys talking about, llama or alpaca - self quantized? pre-quantizied? A modified version?
I'm pretty sure that llama works fine for me up to 13B was tested, 30B and 65B is something i have no hardware for.
I hope this helps
I just tried all the llama models, they all seem to be fine, and the binary files loadin like normal. However when i run both models i get this as result:
I installed docker version on Ubuntu 22 and 30B model is working but 7B and 13B are failing. Although I installed them repeatedly. Looks like files are broken on source.
Mark. Same issue here. Will try previous version.
Might try to use this commit 66bc9af0f5c0a9ff386f20a8b2f351b47eed25a5
(the last published version before May 19th, reference https://www.reddit.com/r/LocalLLaMA/comments/13md90j/another_new_llamacpp_ggml_breaking_change/)
I used the already quantizied models downloaded by the link from https://github.com/ItsPi3141/alpaca.cpp/tree/master
Resolved it by downloading
curl -o ggml-model-q4_0.bin -C - https://ipfs.io/ipfs/QmQ1bf2BTnYxq73MFJWu1B7bQ2UD6qG7D7YDCxhTndVkPC
please make sure it in the dictionary models/7B
It works, I'm not sure about the source though, since it's from ipfs, here is the reference https://github.com/ItsPi3141/alpaca.cpp/tree/master
Update: I still cannot get the downloaded model (from huggingface) to work with llama.cpp on an AWS AMI EC2 instance (t2.xlarge). I had to follow llama.cpp instruction to make it from model weights, and it worked. Maybe worth a try if you cannot use the pre-made ggml q4_0 model(s). On the EC2 instance, it took me about 2-3 hours to make 7B, 13B, 30B, and 65B from downloading to converting.
I've spent hours struggling to get all this to work. I would really appreciate any help anyone can offer.
I'm running in a Windows 10 environment.
I've tried running
npx dalai llama install 7B --home F:\LLM\dalai
It mostly installs but the post processing of the model doesn't seem to work. The ggml-model-q4_0.bin is empty and the return code from the quantize method suggests that an illegal instruction is being executed (I was running it as admin and I ran it manually to check the errorlevel).
The quantize "usage" suggests that it wants a model-f32.bin, but a -f16 file is what's produced during the post processing.
I got the latest code from alpaca.cpp and build it. I generated a a -f32 file using its convert-pth--to-gml.py. I then used its quantize executable to produce a ggml-model-q4_0.bin
Another issue I had is that dalai is trying to run the executable "main" (which doesn't exist) from the "build\Release" directory, which is empty, because everything got built to "build\bin\Release"
Based on digging through issues, I surmised that llama.exe is what needed to be renamed main.exe. But since I built everything from the latest alpaca.cpp, I instead took their chat.exe and renamed it main.exe and copied it into the "build\Release" folder
Still, I had no success. Looking at the console, I saw the command that was being run, so I run it from the command-line manually:
This the the command and the results I get:
I'm not sure what else to do at this point. I hope that some of the issues I've noticed can lead to fixes in the install. But in the meantime, if anyone has any ideas for how I can get this working, I'd appreciate it.
Update: I built an f16 file and generated the q4_0 file for that. Same problem. I would think since I used convert-pth--to-gml.py, quantize, and main (chat.exe) from the latest alpaca.cpp, that main should be able to read the model quantized by that code.
Update 2 Got the latest code from llama.cpp. Then built everything. The quantize returns with an exit code of -1073741795 (0xC0000022). The one from alpaca.cpp generates a q4_0 file that's 296K. I don't know if that's correct or not, but it's the one that doesn't work with the chat.exe (main) shown above.
Update 3 Ran the install in a Linux VM and everything went much smoother. The main executable runs and responds, but not from the dalai web site. In the console, it LOOKS like it's executing it correctly and in the proper directory, but it just hangs and nothing happens. But running the same command in a terminal works just fine.
I would prefer to run it in Windows, but this will suffice for the time being.