Closed mattsmithuk closed 1 month ago
That sounds like the llama-cpp-python extension is failing to load because the CPU doesn't support one of the required instructions. Can you provide some information about what hardware you are using? x86 vs arm, does it support AVX or FMA?
You can try building your own wheel on the docker host using the scripts in the /dist folder of this repo.
Closing because not enough info
I'm having this issue too on a Synology DS220+, 10GB DDR4, Intel Celeron J4025 (possibly x86?)
I'm having this issue too on a Synology DS220+, 10GB DDR4, Intel Celeron J4025 (possibly x86?)
Ok a quick google search shows that CPU does not support AVX extensions. This means we need to provide an non-AVX, AVX, and AVX512 builds for x86 wheels and then pick the appropriate one at install time based on the features reported by the CPU.
Is this hard to do and would it be worth the effort? As in, would the CPU (or NAS in general) even be able to efficiently run the LLM?
Is this hard to do and would it be worth the effort? As in, would the CPU (or NAS in general) even be able to efficiently run the LLM?
Hard? No. Llama.cpp already supports easily building those variants out of the box and I was able to get it added to the Github actions builds for this repo last night. Should be included in the next release.
Able to run on a NAS? Actually yes. You don't need a giant model to handle common household requests. I have been messing with fine-tuning the new TinyLllama 1.1B releases (will probably be Home-1B-v3) and have been able to get a model that is really accurate and responds in 3-5 seconds on a Raspberry Pi 5 using the new prompt caching feature in the integration. "Prompt Caching" pre-processes your house state in the background and Llama.cpp caches the results so that the model is ready to respond to your request when it comes in.
Also some recent changes to Llama.cpp should drastically speed up the prompt processing (~20x increase on fp16 and ~10x increase on q_8) that will get pulled into this project once they make it into llama-cpp-python.
That's awesome to hear. I was actually just about to switch my HA setup from that NAS to a Pi 5 so good to know it'll still perform well on there (hopefully even better). When do you anticipate the next release to be built and ready for download?
That's awesome to hear. I was actually just about to switch my HA setup from that NAS to a Pi 5 so good to know it'll still perform well on there (hopefully even better). When do you anticipate the next release to be built and ready for download?
At least a week. Need to wait for the updates to make it into a llama-cpp-python release
Just to check, does this current model work on Pi 5 or do I need to wait for the new update that includes(?) ARM and x86 support?
Same Issue here (probably the same issue). I have a i5-3470S x86 CPU (With AVX not AVR). It would be awesome to use this integration as it works perfectly with ollama, but I want to run the model on my homeassistant server. Thanks for this awesome work!
Just to check, does this current model work on Pi 5 or do I need to wait for the new update that includes(?) ARM and x86 support?
Yes it already works. There was just an improvement recently that makes it faster.
Same Issue here (probably the same issue). I have a i5-3470S x86 CPU (With AVX not AVR). It would be awesome to use this integration as it works perfectly with ollama, but I want to run the model on my homeassistant server. Thanks for this awesome work!
The next integration release should have a -noavx
wheel that will download if it detects your CPU doesn't have the correct feature set.
just a quick question. When the new version of this component is released, will I have to somehow uninstall the llama-cpp-python library because that is still not the correct version?
just a quick question. When the new version of this component is released, will I have to somehow uninstall the llama-cpp-python library because that is still not the correct version?
Hopefully not. I want to make it so that the integration will detect an old version installed and update it.
I've finally got my hands on the Pi 5, but unfortunately it seems to take a while to respond and after only a few queries it shuts my Pi down, turning the LED red. I got an aluminium heatsink to see if it was just overheating but that only seems to have increased the amount of messages I can send before it shuts down. Am I doing something wrong? (Also tell me if I should open a new issue if needed)
I've finally got my hands on the Pi 5, but unfortunately it seems to take a while to respond and after only a few queries it shuts my Pi down, turning the LED red. I got an aluminium heatsink to see if it was just overheating but that only seems to have increased the amount of messages I can send before it shuts down. Am I doing something wrong? (Also tell me if I should open a new issue if needed)
Hmm my best guess would be overheating or drawing too much power for the power adapter. If it keeps happening then open a different issue.
This should be fixed now. v0.2.13 now provides llama-cpp-python wheels for non AVX systems on amd64 and i386 systems
Hi, thanks for your work! I just tried it and it said "installing new llama cpp version " but the integration still crashes the compleate Homeassistent server with the exact same error logs as in the first post. I hope this can be fixed :) (CPU: i5-3470S) (But my cpu does support AVX, so there seems to be another problem not fixed in this update) Thanks
Hi, thanks for your work! I just tried it and it said "installing new llama cpp version " but the integration still crashes the compleate Homeassistent server with the exact same error logs as in the first post. I hope this can be fixed :) (CPU: i5-3470S) (But my cpu does support AVX, so there seems to be another problem not fixed in this update) Thanks
Please post the full Home Assistant log with debug logging enabled for the component from when the program crashes like the original reporter.
This should be it (from the fault log). Looks to me the same as the original poster. Debug cant be enabled becuase the setup process crashes homeassistant. The normal log files show nothing. (If you want me to test anything or provide anything else let me know) https://pastebin.com/XDZqWmxR
System Homeassistant Supervised running on a i5-3470S with 8gb RAM
This should be it (from the fault log). Looks to me the same as the original poster. Debug cant be enabled becuase the setup process crashes homeassistant. The normal log files show nothing. (If you want me to test anything or provide anything else let me know) https://pastebin.com/XDZqWmxR
System Homeassistant Supervised running on a i5-3470S with 8gb RAM
Did some more digging and apparently the default Llama.cpp build config enables AVX and AVX2 (AVX2 being the issue here). I've updated the detection code in the integration to prevent it from installing the default build unless AVX2 is detected instead of just AVX.
Can you uninstall the llama-cpp-python package manually by running pip3 uninstall llama-cpp-python
in the Home assistant container? After that you should be able to re-download the integration (make sure to fully remove and re-install it) and hopefully it will install the -noavx
version. I'll look into AVX vs AVX2 builds in the future if this ends up being confirmed as the issue.
Hi, the problem is that when installing the new wheel I think it can't find it. (installed the new -fix version via hacs and rebooted the HA instance). Also it still tries to install a AVX version as I can understand. Per INTEL this CPU has Intel® SSE4.1, Intel® SSE4.2, Intel® AVX (so no AVX2)
Update I saw that it tried to pull from v0.2.13 not v0.2.13-fix. I changed the version in the const.py, it installed correctly this time but obviously it does nothing since it still tries to install the default wheel.
Update2 - It works - kind of I realised that there a 2 problems at play here:
if platform_suffix == "amd64" or platform_suffix == "i386":
rejects intel CPUs and therefore always the standart AVX wheel is installed. I just set instruction_extensions_suffix = "-noavx" permanently in "utils.py" for now 2024-04-26 06:07:50.439 ERROR (SyncWorker_22) [homeassistant.util.package] Unable to install package https://github.com/acon96/home-llm/releases/download/v0.2.13/llama_cpp_python-0.2.64-cp312-cp312-musllinux_1_2_x86_64.whl: ERROR: HTTP error 404 while getting https://github.com/acon96/home-llm/releases/download/v0.2.13/llama_cpp_python-0.2.64-cp312-cp312-musllinux_1_2_x86_64.whl ERROR: Could not install requirement llama-cpp-python==0.2.64 from https://github.com/acon96/home-llm/releases/download/v0.2.13/llama_cpp_python-0.2.64-cp312-cp312-musllinux_1_2_x86_64.whl because of HTTP error 404 Client Error: Not Found for url: https://github.com/acon96/home-llm/releases/download/v0.2.13/llama_cpp_python-0.2.64-cp312-cp312-musllinux_1_2_x86_64.whl for URL https://github.com/acon96/home-llm/releases/download/v0.2.13/llama_cpp_python-0.2.64-cp312-cp312-musllinux_1_2_x86_64.whl 2024-04-26 06:07:50.439 ERROR (SyncWorker_22) [custom_components.llama_conversation.utils] Error installing llama-cpp-python. Could not install the binary wheels from GitHub for platform: x86_64, python version: 3.12. Please manually build or download the wheels and place them in the `/config/custom_components/llama_conversation` directory.Make sure that you download the correct .whl file for your platform and python version from the GitHub releases page. 2024-04-26 06:07:50.440 WARNING (MainThread) [custom_components.llama_conversation.config_flow] Failed to install wheel: False
hmmm, I'm facing the same issue running on an intel N100.
I tries the -noavx or renaming the -noavix removing the suffix and even patched the utils.py to add instruction_extensions_suffix = "-noavx"
on line 100 after it has tried to define it...
But no luck, I fear the avx file is already someplace else and beeing used without being replaced by the ones I provide...
Thing is I'm using ssh and am root on the OS itself and I am very unclear as to where I should pip install it from, on the OS itself ? inside one of the containers ?
Thing is I'm using ssh and am root on the OS itself and I am very unclear as to where I should pip install it from, on the OS itself ? inside one of the containers ?
- the following is not strictly needed as the problem then occurs at pip installation therefore here would be another fix needed in the install function of llama-cpp-python for a real working installation-
the workaround that works for me (no longterm fix. Just makes the integration usable. Any update would probably revert back to getting errors)
sudo docker exec -it homeassistant bash
-> should get you into the console of your HA docker container (Be very careful you can mess up a lot of things here!)pip3 uninstall llama-cpp-python
pip3 install YOUR_WHEEL_NAME.whl
hmmmm.... Going the first route isn't going to work it seems as even manually installing a wheel with the -noavx suffix seems to not work as I get this log message :
homeassistant:/config/custom_components/llama_conversation# pip3 install llama_cpp_python-0.2.64-cp312-cp312-musllinux_1_2_x86_64-noavx.whl
ERROR: llama_cpp_python-0.2.64-cp312-cp312-musllinux_1_2_x86_64-noavx.whl is not a valid wheel filename.
The file is there and if I simply remove the -noavx
suffix the pip3 install works fine manually.
So I got the pip3 install to work, and I can go through the integration setup by choosing "existing model" then passing the full path /config/custom_components/llama_conversation/acon96/Home-3B-v3-GGUF/Home-3B-v3.q4_k_m.gguf
but still the container crashes...
I'm probably going to restart home assistant and change the CPU from kvm64 to host in proxmox... Should be the Intel n100 then and I can maybe even try it with avx ;)
well, turns out the default CPU type of kvm64 does not play nice with the LLM ;) I have llama integration showing up now, some llama_logs in the homeassistant container, and can even select llama in the convesation agent ;) Sadly the voice assistant does not answer "what time is it"... will try again with english as I tried in French, will see... Note that I'm not seeing any llama logs on the homeassistant container when asing anything to the assistant...
Thanks for the help ! If you have some directions as to where to find traces of what llama is doing or if it's having issues... Feel free to give me pointers plz ;)
For thos able to do actual fixing, I think
-noavx
Edit: seems I need better eyes :
2024-04-26 13:53:00.804 ERROR (SyncWorker_22) [custom_components.llama_conversation.agent] There were too many entities exposed when attempting to generate a response for Home-3B-v3.q4_k_m.gguf and it exceeded the context size for the model. Please reduce the number of entities exposed (200) or increase the model's context size (2048)
Looks to be clear enough will try to do one or the other ;)
Edit 2 Logs are everywhere.... I am now in a position where llama crashed the container, and I have no logs, until I look at the VM screen where the OS complains python has an outofmemory error... Increasing RAM and tryng again...
Sadly no luck so far for me using HA. I managed a few times to setup the integration but sometimes simply adding the integration crashes the home assistant container. I'm using this build on an Intel N100 https://github.com/acon96/home-llm/releases/download/v0.2.13-fix/llama_cpp_python-0.2.64-cp312-cp312-musllinux_1_2_x86_64-noavx.whl In a HAOS VM now set to CPU=host, maybe virtualization is the issue... I'll maybe try a longer route and build myself a LLM container and point this integration to it, this way it's probably easier to debug where is the issue
Got this issue:
Error installing llama-cpp-python. Could not install the binary wheels from GitHub for platform: x86_64, python version: 3.12. Please manually build or download the wheels and place them in the /config/custom_components/llama_conversation
directory.Make sure that you download the correct .whl file for your platform and python version from the GitHub releases page.
OK. I did more research and I think I finally have the configurations correct for the normal, avx512
, and noavx
versions. Please install v0.2.14 and it should upgrade llama-cpp-python to the correct version.
I'm going to close this thread since the issue no longer crashes Home Assistant. Will continue tracking any illegal instruction problems in the new issue mentioned above
Describe the bug
Fails when trying to create integration using default settings as described in Quick Start guide. Only error displayed is "Unknown error occurred" after pressing "Submit" on the "Configure the selected model page". Home-Assistant appears to restart and reload all integrations at this point.
Expected behaviour
Integration to install without errors
Logs
https://pastebin.com/kgQUW7E0)
Setup Home-Assistant 2024.3.3 running in a Docker container