abetlen / llama-cpp-python

Python bindings for llama.cpp
https://llama-cpp-python.readthedocs.io
MIT License
7.65k stars 919 forks source link

CUDA 12.x and llama-cpp-python 0.2.84 gcc-13/gcc-14 conflict #1643

Closed fdutenho closed 1 month ago

fdutenho commented 1 month ago

I'm running Debian 12 with an NVIDIA GeForce RTX 4090

Here's my problem. I cannot use llama-cpp-python with CUDA.

What do I do:

1) install CUDA via apt install cuda-12-5 according https://developer.nvidia.com/cuda-downloads?target_os=Linux&target_arch=x86_64&Distribution=Debian&target_version=12&target_type=deb_local nvcc and nvidia-smi shows the expected output.

2) install llama-cpp-python

Attempt 1) pip install llama-cpp-python --force-reinstall --upgrade --no-cache-dir Installs llama-cpp-python but the python code is not using the GPU.

Attempt 2) CMAKE_ARGS="-DGGML_CUDA=on" pip install llama-cpp-python --force-reinstall --upgrade --no-cache-dir will fail with gcc versions later than 13 are not supported(CUDA 12.x requires and installs gcc-14)

Attempt 3) NVCC_PREPEND_FLAGS="-allow-unsupported-compiler" CMAKE_ARGS="-DGGML_CUDA=on " pip install llama-cpp-python --force-reinstall --upgrade --no-cache-dir --verbose fails as well with 20 errors while compiling....

==> I think I need llama-cpp-python with CMAKE_ARGS="-DGGML_CUDA=on" that work with gcc-14...


evidence for Attempt 1)

llm = Llama(
    model_path="...open_llama_7b/ggml-q4.gguf",
    n_gpu_layers=-1,
    verbose=True
)

results in

...
AVX = 1 | AVX_VNNI = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | AVX512_BF16 = 0 | FMA = 1 | NEON = 0 | SVE = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 |
...
fdutenho commented 1 month ago

next attempt (trying to trick the gcc version conflict)

1) install CUDA via apt install cuda-12-5 according https://developer.nvidia.com/cuda-downloads?target_os=Linux&target_arch=x86_64&Distribution=Debian&target_version=12&target_type=deb_local nvcc and nvidia-smi shows the expected output.

sudo apt install cuda-12-5

2) install gcc-13 and g++-13

sudo apt install -y gcc-13 g++-13

3) remove default symbolic links and set them to gcc-13 and g++-13

  sudo rm -f /usr/bin/gcc
  sudo rm -f /usr/bin/g++
  sudo ln -s /usr/bin/gcc-13 /usr/bin/gcc
  sudo ln -s /usr/bin/g++-13 /usr/bin/g++

4) install llama-cpp-python with FORCE_CMAKE=1

FORCE_CMAKE=1 CMAKE_ARGS="-DGGML_CUDA=on" pip install llama-cpp-python --force-reinstall --upgrade --no-cache-dir

5) fix the numpy version conflict

sudo pip uninstall -y "numpy>2.0"
sudo pip install "numpy<2.0"

evidence

llm = Llama(
    model_path="...open_llama_7b/ggml-q4.gguf",
    n_gpu_layers=-1,
    verbose=True
)

results in

...
llm_load_tensors: offloading 32 repeating layers to GPU
llm_load_tensors: offloading non-repeating layers to GPU
llm_load_tensors: offloaded 33/33 layers to GPU
...
AVX = 1 | AVX_VNNI = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | AVX512_BF16 = 0 | FMA = 1 | NEON = 0 | SVE = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 |
...

but also in

ggml_cuda_init: failed to initialize CUDA: forward compatibility was attempted on non supported HW

An older CUDA version with gcc-13 dependecies is not availabe for Debian 12. CUDA 12.2 and below are only available for Debian 11...

fdutenho commented 1 month ago

Uninstalling everything (see https://forums.developer.nvidia.com/t/nvidia-smi-has-failed-because-it-couldnt-communicate-with-the-nvidia-driver-make-sure-that-the-latest-nvidia-driver-is-installed-and-running/197141/5) and then

1) remove everything

sudo apt remove -y --purge '^nvidia-.*'
sudo apt remove -y --purge '^libnvidia-.*'
sudo apt remove -y --purge 'cuda.*'
sudo apt autoremove -y
sudo apt autoclean -y

2) Install nvidia drivers

sudo deb http://deb.debian.org/debian/ sid main contrib non-free non-free-firmware
sudo apt update
sudo apt install nvidia-driver firmware-misc-nonfree
# reboot ?!
sudo apt install -y linux-headers-$(uname -r)

3) install CUDA via apt install cuda-12-4 according https://developer.nvidia.com/cuda-12-4-1-download-archive?target_os=Linux&target_arch=x86_64&Distribution=Debian&target_version=12&target_type=deb_local nvcc and nvidia-smi shows the expected output.

sudo apt install cuda-12-4

4) install llama-cpp-python with FORCE_CMAKE=1

FORCE_CMAKE=1 CMAKE_ARGS="-DGGML_CUDA=on" pip install llama-cpp-python --force-reinstall --upgrade --no-cache-dir

5) fix the numpy version conflict

sudo pip uninstall -y "numpy>2.0"
sudo pip install "numpy<2.0"

evidence

llm = Llama(
    model_path="...open_llama_7b/ggml-q4.gguf",
    n_gpu_layers=-1,
    verbose=True
)

results in

...
ggml_cuda_init: found 1 CUDA devices:
  Device 0: NVIDIA GeForce RTX 4090, compute capability 8.9, VMM: yes
...
llm_load_tensors: offloading 32 repeating layers to GPU
llm_load_tensors: offloading non-repeating layers to GPU
llm_load_tensors: offloaded 33/33 layers to GPU
...
AVX = 1 | AVX_VNNI = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | AVX512_BF16 = 0 | FMA = 1 | NEON = 0 | SVE = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 |
...

An older CUDA version with gcc-13 dependecies is not availabe for Debian 12. CUDA 12.2 and below are only available for Debian 11...

fdutenho commented 1 month ago

tried to recap and replay: failed. Did the same steps and got no CUDA - frustrating :-(

AVX = 1 | AVX_VNNI = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | AVX512_BF16 = 0 | FMA = 1 | NEON = 0 | SVE = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 |
fdutenho commented 1 month ago

found it: do NOT(!) install the nvidia-driver

Uninstalling everything (see https://forums.developer.nvidia.com/t/nvidia-smi-has-failed-because-it-couldnt-communicate-with-the-nvidia-driver-make-sure-that-the-latest-nvidia-driver-is-installed-and-running/197141/5) and then

1) remove everything

sudo apt remove -y --purge '^nvidia-.*'
sudo apt remove -y --purge '^libnvidia-.*'
sudo apt remove -y --purge 'cuda.*'
sudo apt autoremove -y
sudo apt autoclean -y

2) install CUDA via apt install cuda-12-4 according https://developer.nvidia.com/cuda-12-4-1-download-archive?target_os=Linux&target_arch=x86_64&Distribution=Debian&target_version=12&target_type=deb_local nvcc and nvidia-smi shows the expected output.

sudo apt install cuda-12-4

3) install llama-cpp-python with FORCE_CMAKE=1

FORCE_CMAKE=1 CMAKE_ARGS="-DGGML_CUDA=on" pip install llama-cpp-python --force-reinstall --upgrade --no-cache-dir

4) fix the numpy version conflict

sudo pip uninstall -y "numpy>2.0"
sudo pip install "numpy<2.0"

evidence

llm = Llama(
    model_path="...open_llama_7b/ggml-q4.gguf",
    n_gpu_layers=-1,
    verbose=True
)

results in

...
ggml_cuda_init: found 1 CUDA devices:
  Device 0: NVIDIA GeForce RTX 4090, compute capability 8.9, VMM: yes
...
llm_load_tensors: offloading 32 repeating layers to GPU
llm_load_tensors: offloading non-repeating layers to GPU
llm_load_tensors: offloaded 33/33 layers to GPU
...
AVX = 1 | AVX_VNNI = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | AVX512_BF16 = 0 | FMA = 1 | NEON = 0 | SVE = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 |
...