Open javiagu13 opened 1 year ago
Same problem here.
Docker file:
FROM nvidia/cuda:11.8.0-cudnn8-runtime-ubuntu22.04
...
RUN apt-get update && apt-get install -y python3 python3-pip cmake iproute2
RUN CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install llama-cpp-python --upgrade --no-cache-dir
...
In the docker bash:
root@8c01ef13fd43:/app# find / -name "*cublas*"
/var/lib/dpkg/info/libcublas-11-8.md5sums
/var/lib/dpkg/info/libcublas-11-8.list
/usr/share/doc/libcublas-11-8
/usr/local/lib/python3.10/dist-packages/torch/lib/libcublasLt.so.11
/usr/local/lib/python3.10/dist-packages/torch/lib/libcublas.so.11
/usr/local/cuda-11.8/targets/x86_64-linux/lib/libcublasLt.so.11
/usr/local/cuda-11.8/targets/x86_64-linux/lib/libcublas.so.11
/usr/local/cuda-11.8/targets/x86_64-linux/lib/libcublas.so.11.11.3.6
/usr/local/cuda-11.8/targets/x86_64-linux/lib/libcublasLt.so.11.11.3.6
But it only use CPU, no GPU acceleration
Same problem here.
Docker file:
FROM nvidia/cuda:11.8.0-cudnn8-runtime-ubuntu22.04 ... RUN apt-get update && apt-get install -y python3 python3-pip cmake iproute2 RUN CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install llama-cpp-python --upgrade --no-cache-dir ...
In the docker bash:
root@8c01ef13fd43:/app# find / -name "*cublas*" /var/lib/dpkg/info/libcublas-11-8.md5sums /var/lib/dpkg/info/libcublas-11-8.list /usr/share/doc/libcublas-11-8 /usr/local/lib/python3.10/dist-packages/torch/lib/libcublasLt.so.11 /usr/local/lib/python3.10/dist-packages/torch/lib/libcublas.so.11 /usr/local/cuda-11.8/targets/x86_64-linux/lib/libcublasLt.so.11 /usr/local/cuda-11.8/targets/x86_64-linux/lib/libcublas.so.11 /usr/local/cuda-11.8/targets/x86_64-linux/lib/libcublas.so.11.11.3.6 /usr/local/cuda-11.8/targets/x86_64-linux/lib/libcublasLt.so.11.11.3.6
But it only use CPU, no GPU acceleration
Change to use 11.8.0-cudnn8-devel-ubuntu22.04 solve this problem.
Expected Behavior
I want llama-cpp-python to be able to load GGUF models with GPU inside docker. It works properly while installing llama-cpp-python on interactive mode but not inside the dockerfile. Since I work in a hospital my aim is to be able to do it offline (using the downloaded tar.gz file of llama-cpp-python).
Environment and Context
I am working on a windows 11 machine and my docker container runs an ubuntu 20.04
Failure Information (for bugs)
Basically while loading the model GPU is clearly not loaded (BLAS=0 and no messasge regarding layer offloading)
Steps to Reproduce
I downloaded the tar.gz file of llama-cpp-python in a linux 20.04 via
Then I run the following inside the dockerfile:
While trying to load the Llama model it does not load the GPU, it works, but without GPU.
However, if I go to interactive mode of the container, uninstall llama_cpp_python, and run the following command it works perfectly:
I also tried the following way to check whether the environment was not properly set and it does not load GPU either:
It seems to me that there is some problem regarding CUBLAS or CMAKE.
Thank you for your time, Javi