Open amida47 opened 5 months ago
I think the issue is that there is currently no cuda prebuild of the latest 0.2.78 version, and pip pulls latest by default. I had the same problem installing it on a local machine.
It can be worked around by specifically installing the previous version
pip install --no-cache-dir llama-cpp-python==0.2.77 --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cu124
Hopefully, the author keeps providing updated cuda builds though.
I think the issue is that there is currently no cuda prebuild of the latest 0.2.78 version, and pip pulls latest by default. I had the same problem installing it on a local machine.
It can be worked around by specifically installing the previous version
pip install --no-cache-dir llama-cpp-python==0.2.77 --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cu124
Hopefully, the author keeps providing updated cuda builds though.
Thank you, works on my CUDA 12.3 version using this link instead:
pip install --no-cache-dir llama-cpp-python==0.2.90 --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cu123
Prerequisites
Please answer the following questions for yourself before submitting an issue.
Expected Behavior
I installed llama-cpp-python[server] using:
pip install llama-cpp-python[server] \ --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cu122
I used
cu122
after running:and when I run the server using this command:
!python3 -m llama_cpp.server --hf_model_repo_id Qwen/Qwen2-7B-Instruct-GGUF --model 'qwen2-7b-instruct-q6_k.gguf' \ --n_ctx 32000 --host 0.0.0.0 --port 8188 --flash_attn True --n_gpu_layers -1
using the
--n_gpu_layers -1
should've loaded the model on the gpuCurrent Behavior
the model is instead loaded on cpu
Environment and Context
Please provide detailed information about your computer setup. This is important in case the issue is not reproducible except for under certain specific conditions.
it is a colab environment with a T4 gpu
update
installing llama-cpp-python using:
!CMAKE_ARGS="-DLLAMA_CUDA=on" pip install llama-cpp-python[server]
fixed the problem, but the problem is that it takes 18 mins to install, so using a prebuilt is still preferred, then I am not closing this issue for time being.