Closed MrMojoR closed 8 months ago
Thanks for reporting and I appreciate your building the official repo to verify! I had a quick look and can replicate the issue.
I guess that it may be a problem with the wheels for exllamav2.. will look into it further and see about building it from source in the image. The following commit may be the root of the issue: https://github.com/oobabooga/text-generation-webui/commit/bde7f00cae8306884c31d855092463ca04ce26ac.
I don't think, that that is the issue, HQQ loader did not work for me either. This was not working for some time, but I thought the original repo is faulty. Now I really wanted to upgrade to try out the exllamav2 0.15, which has some great memory management improvements.
I will have to see when I have time to debug it properly. I do not think it is 'missing' the CUDA runtime - the step you suggested refers to setting up a conda
environment, and this image uses venv
. Have you successfully used the HQQ loader in the official image? If so, could you please point me to the model and settings you used? I will check that out as well, when I'm looking at the exllamav2 issue in more detail.
Yes, I have used one successfully, this was the model: https://huggingface.co/mobiuslabsgmbh/Mixtral-8x7B-Instruct-v0.1-hf-attn-4bit-moe-2bit-HQQ There is only one setting, I used the pytorch backend.
Thanks very much - tried it out and got an error about flash attention: /venv/lib/python3.10/site-packages/flash_attn_2_cuda.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZN2at4_ops15sum_IntList_out4callERKNS_6TensorEN3c1016OptionalArrayRefIlEEbSt8optionalINS5_10ScalarTypeEERS2_
This is again some C library error, I still suspect that somehow we miss that cuda runtime.
Thank you for the heads up - it was a library version mismatch, and thankfully a simple fix! New stable images are building and will be up in about an hour.
Thank you very much for the quick fix!
On Mon, Mar 11, 2024, 22:42 Atinoda @.***> wrote:
Thank you for the heads up - it was a library version mismatch, and thankfully a simple fix! New stable images are building and will be up in about an hour.
— Reply to this email directly, view it on GitHub https://github.com/Atinoda/text-generation-webui-docker/issues/44#issuecomment-1989496510, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA7GVX57RZIGUTMXCA5BDYLYXYQLXAVCNFSM6AAAAABEQ7GBS2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSOBZGQ4TMNJRGA . You are receiving this because you authored the thread.Message ID: @.***>
This happens for the two consequent nightly versions, and I have also built an image from the 2024-03-10 snapshot version: https://github.com/oobabooga/text-generation-webui/releases/tag/snapshot-2024-03-10 . The issue happens both of them. This is the base-nvidia version. When I try to load an exllamav2 modell, I receive this error message:
File "/app/modules/ui_model_menu.py", line 245, in load_model_wrapper shared.model, shared.tokenizer = load_model(selected_model, loader) File "/app/modules/models.py", line 87, in load_model output = load_func_map[loader](model_name) File "/app/modules/models.py", line 378, in ExLlamav2_HF_loader from modules.exllamav2_hf import Exllamav2HF File "/app/modules/exllamav2_hf.py", line 7, in from exllamav2 import ( File "/venv/lib/python3.10/site-packages/exllamav2/init.py", line 3, in from exllamav2.model import ExLlamaV2 File "/venv/lib/python3.10/site-packages/exllamav2/model.py", line 23, in from exllamav2.config import ExLlamaV2Config File "/venv/lib/python3.10/site-packages/exllamav2/config.py", line 2, in from exllamav2.fasttensors import STFile File "/venv/lib/python3.10/site-packages/exllamav2/fasttensors.py", line 5, in from exllamav2.ext import exllamav2_ext as ext_c File "/venv/lib/python3.10/site-packages/exllamav2/ext.py", line 15, in import exllamav2_ext ImportError: /venv/lib/python3.10/site-packages/exllamav2_ext.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZN3c107WarningC1ESt7variantIJNS0_11UserWarningENS0_18DeprecationWarningEEERKNS_14SourceLocationESsb
I built an image from the official repo as well, and that worked flowlessly. I think the issue could be this step from the official repository:
conda install -y -c "nvidia/label/cuda-12.1.1" cuda-runtime
I couldn't find this step in the Dockerfile here. Thanks for the help!