Illegal Instruction (core dumped) even after disabling AVX2 and FMA

CyberSinister commented 1 year ago

Hi, I'm very new to all of this and pyllamacpp so I'm sorry in advance if the details provided in this issue aren't good enough or up to par but I've been having some issues when doing: python -c 'from pyllamacpp.model import Model'

I know this has something do with my CPU and I've also followed this guide exactly: https://github.com/nomic-ai/pygpt4all/issues/71. I have an older server machine with 2 Intel Xeon X5670.

How do I figure out what's going on and how do I fix it?

abdeladim-s commented 1 year ago

Hi @CyberSinister, It's ok, I know this illegal instruction error is very annoying!

First, What OS are you using ?
how did you install the package ? through pip or did you build it yourself ?
Does llama.cpp work without issues ?

CyberSinister commented 1 year ago

Hey @abdeladim-s , thank you so much for such a quick and understanding response, I really appreciate it. Here are the details you asked for:

I'm using Ubuntu 22.04
I tried both - I first installed it through pip which gave me the same error and then I uninstalled it, cloned the git repo and edited the CMakeLists.txt (I turned AVX, AVX2 and FMA off) and then ran the following command: pip install -e .

I later thought I should leave AVX2 and FMA off and keep only AVX turned on but that didn't work either.

I'm really sorry I don't know how to test llama.cpp itself but I did go ahead and make the same changes inside llama.cpp/CMakeLists.txt and then ran the make command inside the llama.cpp folder. I tried doing this both before and after running pip install -e . inside the main pyllamacpp folder.

Another thing I must add is that I'm using Ubuntu on Hyper-V. I'm sorry I didn't tell you this earlier, I failed to realize that this could be relevant.

abdeladim-s commented 1 year ago

You are welcome @CyberSinister. I hope we will find a solution together.

So you are using Windows Hyper-V, why not just use WSL ? it is more efficient ? Also have you tried to use just Windows without any VM ?

So what I want you to do is to try llama.cpp first. I will explain to you how:

Clone the repo and cd to the llama.cpp folder as you did before.
Don't make any changes to CmakeLists.txt yet, just run make.
If the compilation process went successful, an executable file called main will be generated.
run ./main -h to see how it works but basically like the following ./main -m ./path/to/ggml-model.bin -p "Once upon a time, "
if this works and the model generated something without any issues then we can move to the next step.

Let me know if you find any issues.

CyberSinister commented 1 year ago

Hey @abdeladim-s , thank you once again.

TLDR; It works without having to make any changes to the CMakeLists.txt.

Yes I'm using using Windows Hyper-V. I use Hyper-V because that's where I want to use this. I have multiple VMs running Ubuntu for different tasks and I wanted to create a Flask wrapper around GPT4All so I can create my own API with it. I've already done this a while ago and still am currently using it with OpenAI's API which works fine but it was getting too expensive so now I wanted to host something of my own and not go broke in an attempt to learn something new xD. I'm a Full Stack Developer but I'm really curious and enjoy working with LLMs. I haven't yet tried to use just Windows without a VM because I wanted to deploy it on the server directly and use it there.

Anyways, so I did what you asked and here's what I have for you sir:

I first built llama.cpp without making any changes to the CMakeLists.txt which went fine but upon trying to use it, I ran into a few roadblocks. I tried using the GPT4All GPT4All-13B-snoozy.ggmlv3.q4_0.bin model with the newly created main executable and got the following response: Invalid model file ../../../model_path/GPT4All-13B-snoozy.ggmlv3.q4_0.bin (unsupported format version 3, expected 1) After which I realized that maybe I should try using something that has v1 in it's name instead of v3 so I downloaded the ggml-gpt4all-j-v1.3-groovy.bin model which was too old apparently.

I later realized that I can't just directly use models downloaded from the gtp4all website (I might be wrong but I tried 4 different models) so I downloaded a converted a model from http://gpt4all.io/models/ggml-gpt4all-l13b-snoozy.bin which ACTUALLY worked.

But it was too slow - I gave it a small prompt "What are you?" and after 25 minutes, all it wrote was "I am an AI-language mod" and it was still trying to complete that response. I tried increasing the number of threads by passing the --threads 4 argument when executing the main file but that didn't make much of a difference.

I guess it works, I don't know how to speed it up - I'm guessing that using a smaller model will be faster but do I add a GPU if I want to work with 13B models? Thank you for getting me this far, I've learnt a bit more than I knew yesterday and wouldn't have gotten there without your help. Are we now ready for the next step?

I'm sorry if such a long response wasn't something you were expecting but I thought it would help others not make the same mistakes as me - but then again, not everyone's as dumb as me.

abdeladim-s commented 1 year ago

hey @CyberSinister,

Thank you for the detailed response, it will certainly help others running into the same issue. And you are not dumb, it just a learning curve that we all had to go through it first, just keep up the patience and the hard work and enjoy the process :smile:

So it looks like everything is good, except the speed issue, which is expected I think, because you are running it on top of a VM on top of an old CPU. The LLM inference is RAM and CPU consuming, you need to run it on raw hardware and reduce the usage of all other apps as max as possible. If you can, try run it on WSL or raw windows, to easily test and to compare with the VM.
If you are sure that your CPU does not support AVX or FMA, then I don't think it will run fast, but at least it should work.
Also try to use smaller models, the smaller is the model, the fast it will run (I don't know if gpt4all has been updated to the latest version, but you can try other models as well like WizardLM 7b or Vicuna 7b and so on, there are a myriad of finetuned LLMs now).
You don't actually need a GPU to run any model, you just need a strong CPU and enough RAM.

That being said, let us now move to the next step:

go to the top folder of pyllamacpp
don't make any changes to the top folder CMakeLists.txt as well.
build the python package with pip install -e .
if the compilation went successfully, run this command pyllamacpp path/to/ggml/model, (make sure that this model worked in the previous step), this will load the model and gives you an interaction-like chat. test it and see if it works.
if it works then copy/paste the quickstart script and run it.

If it didn't work let me know what issue you ran into ?

abdeladim-s / pyllamacpp

Illegal Instruction (core dumped) even after disabling AVX2 and FMA #18