abetlen / llama-cpp-python

Python bindings for llama.cpp
https://llama-cpp-python.readthedocs.io
MIT License
7.82k stars 934 forks source link

Docker GPU installation works in interactive mode but not in the dockerfile #742

Open javiagu13 opened 1 year ago

javiagu13 commented 1 year ago

Expected Behavior

I want llama-cpp-python to be able to load GGUF models with GPU inside docker. It works properly while installing llama-cpp-python on interactive mode but not inside the dockerfile. Since I work in a hospital my aim is to be able to do it offline (using the downloaded tar.gz file of llama-cpp-python).

Environment and Context

I am working on a windows 11 machine and my docker container runs an ubuntu 20.04

Failure Information (for bugs)

Basically while loading the model GPU is clearly not loaded (BLAS=0 and no messasge regarding layer offloading)

Steps to Reproduce

I downloaded the tar.gz file of llama-cpp-python in a linux 20.04 via

pip download llama-cpp-python

Then I run the following inside the dockerfile:

RUN /bin/bash -c 'CMAKE_ARGS="-DLLAMA_CUBLAS=on" pip install /app/llama_cpp_folder/llama_cpp_python-0.2.6.tar.gz'

While trying to load the Llama model it does not load the GPU, it works, but without GPU.

However, if I go to interactive mode of the container, uninstall llama_cpp_python, and run the following command it works perfectly:

CMAKE_ARGS="-DLLAMA_CUBLAS=on" pip install llama-cpp-python

I also tried the following way to check whether the environment was not properly set and it does not load GPU either:

ENV CMAKE_ARGS="-DLLAMA_CUBLAS=on"
RUN pip install /app/llama_cpp_folder/llama_cpp_python-0.2.6.tar.gz

It seems to me that there is some problem regarding CUBLAS or CMAKE.

Thank you for your time, Javi

ShunL12324 commented 1 year ago

Same problem here.

Docker file:

FROM nvidia/cuda:11.8.0-cudnn8-runtime-ubuntu22.04
...
RUN apt-get update && apt-get install -y python3 python3-pip cmake iproute2
RUN CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install llama-cpp-python --upgrade --no-cache-dir
...

In the docker bash:

root@8c01ef13fd43:/app# find / -name "*cublas*"
/var/lib/dpkg/info/libcublas-11-8.md5sums
/var/lib/dpkg/info/libcublas-11-8.list
/usr/share/doc/libcublas-11-8
/usr/local/lib/python3.10/dist-packages/torch/lib/libcublasLt.so.11
/usr/local/lib/python3.10/dist-packages/torch/lib/libcublas.so.11
/usr/local/cuda-11.8/targets/x86_64-linux/lib/libcublasLt.so.11
/usr/local/cuda-11.8/targets/x86_64-linux/lib/libcublas.so.11
/usr/local/cuda-11.8/targets/x86_64-linux/lib/libcublas.so.11.11.3.6
/usr/local/cuda-11.8/targets/x86_64-linux/lib/libcublasLt.so.11.11.3.6

But it only use CPU, no GPU acceleration

ShunL12324 commented 1 year ago

Same problem here.

Docker file:

FROM nvidia/cuda:11.8.0-cudnn8-runtime-ubuntu22.04
...
RUN apt-get update && apt-get install -y python3 python3-pip cmake iproute2
RUN CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install llama-cpp-python --upgrade --no-cache-dir
...

In the docker bash:

root@8c01ef13fd43:/app# find / -name "*cublas*"
/var/lib/dpkg/info/libcublas-11-8.md5sums
/var/lib/dpkg/info/libcublas-11-8.list
/usr/share/doc/libcublas-11-8
/usr/local/lib/python3.10/dist-packages/torch/lib/libcublasLt.so.11
/usr/local/lib/python3.10/dist-packages/torch/lib/libcublas.so.11
/usr/local/cuda-11.8/targets/x86_64-linux/lib/libcublasLt.so.11
/usr/local/cuda-11.8/targets/x86_64-linux/lib/libcublas.so.11
/usr/local/cuda-11.8/targets/x86_64-linux/lib/libcublas.so.11.11.3.6
/usr/local/cuda-11.8/targets/x86_64-linux/lib/libcublasLt.so.11.11.3.6

But it only use CPU, no GPU acceleration

Change to use 11.8.0-cudnn8-devel-ubuntu22.04 solve this problem.