Closed fdutenho closed 1 month ago
next attempt (trying to trick the gcc version conflict)
1) install CUDA via apt install cuda-12-5
according https://developer.nvidia.com/cuda-downloads?target_os=Linux&target_arch=x86_64&Distribution=Debian&target_version=12&target_type=deb_local
nvcc
and nvidia-smi
shows the expected output.
sudo apt install cuda-12-5
2) install gcc-13
and g++-13
sudo apt install -y gcc-13 g++-13
3) remove default symbolic links and set them to gcc-13 and g++-13
sudo rm -f /usr/bin/gcc
sudo rm -f /usr/bin/g++
sudo ln -s /usr/bin/gcc-13 /usr/bin/gcc
sudo ln -s /usr/bin/g++-13 /usr/bin/g++
4) install llama-cpp-python
with FORCE_CMAKE=1
FORCE_CMAKE=1 CMAKE_ARGS="-DGGML_CUDA=on" pip install llama-cpp-python --force-reinstall --upgrade --no-cache-dir
5) fix the numpy
version conflict
sudo pip uninstall -y "numpy>2.0"
sudo pip install "numpy<2.0"
evidence
llm = Llama(
model_path="...open_llama_7b/ggml-q4.gguf",
n_gpu_layers=-1,
verbose=True
)
results in
...
llm_load_tensors: offloading 32 repeating layers to GPU
llm_load_tensors: offloading non-repeating layers to GPU
llm_load_tensors: offloaded 33/33 layers to GPU
...
AVX = 1 | AVX_VNNI = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | AVX512_BF16 = 0 | FMA = 1 | NEON = 0 | SVE = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 |
...
but also in
ggml_cuda_init: failed to initialize CUDA: forward compatibility was attempted on non supported HW
An older CUDA version with gcc-13 dependecies is not availabe for Debian 12. CUDA 12.2 and below are only available for Debian 11...
Uninstalling everything (see https://forums.developer.nvidia.com/t/nvidia-smi-has-failed-because-it-couldnt-communicate-with-the-nvidia-driver-make-sure-that-the-latest-nvidia-driver-is-installed-and-running/197141/5) and then
1) remove everything
sudo apt remove -y --purge '^nvidia-.*'
sudo apt remove -y --purge '^libnvidia-.*'
sudo apt remove -y --purge 'cuda.*'
sudo apt autoremove -y
sudo apt autoclean -y
2) Install nvidia drivers
sudo deb http://deb.debian.org/debian/ sid main contrib non-free non-free-firmware
sudo apt update
sudo apt install nvidia-driver firmware-misc-nonfree
# reboot ?!
sudo apt install -y linux-headers-$(uname -r)
3) install CUDA via apt install cuda-12-4
according https://developer.nvidia.com/cuda-12-4-1-download-archive?target_os=Linux&target_arch=x86_64&Distribution=Debian&target_version=12&target_type=deb_local
nvcc
and nvidia-smi
shows the expected output.
sudo apt install cuda-12-4
4) install llama-cpp-python
with FORCE_CMAKE=1
FORCE_CMAKE=1 CMAKE_ARGS="-DGGML_CUDA=on" pip install llama-cpp-python --force-reinstall --upgrade --no-cache-dir
5) fix the numpy
version conflict
sudo pip uninstall -y "numpy>2.0"
sudo pip install "numpy<2.0"
evidence
llm = Llama(
model_path="...open_llama_7b/ggml-q4.gguf",
n_gpu_layers=-1,
verbose=True
)
results in
...
ggml_cuda_init: found 1 CUDA devices:
Device 0: NVIDIA GeForce RTX 4090, compute capability 8.9, VMM: yes
...
llm_load_tensors: offloading 32 repeating layers to GPU
llm_load_tensors: offloading non-repeating layers to GPU
llm_load_tensors: offloaded 33/33 layers to GPU
...
AVX = 1 | AVX_VNNI = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | AVX512_BF16 = 0 | FMA = 1 | NEON = 0 | SVE = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 |
...
An older CUDA version with gcc-13 dependecies is not availabe for Debian 12. CUDA 12.2 and below are only available for Debian 11...
tried to recap and replay: failed. Did the same steps and got no CUDA - frustrating :-(
AVX = 1 | AVX_VNNI = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | AVX512_BF16 = 0 | FMA = 1 | NEON = 0 | SVE = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 |
found it: do NOT(!) install the nvidia-driver
Uninstalling everything (see https://forums.developer.nvidia.com/t/nvidia-smi-has-failed-because-it-couldnt-communicate-with-the-nvidia-driver-make-sure-that-the-latest-nvidia-driver-is-installed-and-running/197141/5) and then
1) remove everything
sudo apt remove -y --purge '^nvidia-.*'
sudo apt remove -y --purge '^libnvidia-.*'
sudo apt remove -y --purge 'cuda.*'
sudo apt autoremove -y
sudo apt autoclean -y
2) install CUDA via apt install cuda-12-4
according https://developer.nvidia.com/cuda-12-4-1-download-archive?target_os=Linux&target_arch=x86_64&Distribution=Debian&target_version=12&target_type=deb_local
nvcc
and nvidia-smi
shows the expected output.
sudo apt install cuda-12-4
3) install llama-cpp-python
with FORCE_CMAKE=1
FORCE_CMAKE=1 CMAKE_ARGS="-DGGML_CUDA=on" pip install llama-cpp-python --force-reinstall --upgrade --no-cache-dir
4) fix the numpy
version conflict
sudo pip uninstall -y "numpy>2.0"
sudo pip install "numpy<2.0"
evidence
llm = Llama(
model_path="...open_llama_7b/ggml-q4.gguf",
n_gpu_layers=-1,
verbose=True
)
results in
...
ggml_cuda_init: found 1 CUDA devices:
Device 0: NVIDIA GeForce RTX 4090, compute capability 8.9, VMM: yes
...
llm_load_tensors: offloading 32 repeating layers to GPU
llm_load_tensors: offloading non-repeating layers to GPU
llm_load_tensors: offloaded 33/33 layers to GPU
...
AVX = 1 | AVX_VNNI = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | AVX512_BF16 = 0 | FMA = 1 | NEON = 0 | SVE = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 |
...
I'm running Debian 12 with an NVIDIA GeForce RTX 4090
Here's my problem. I cannot use
llama-cpp-python
with CUDA.What do I do:
1) install CUDA via
apt install cuda-12-5
according https://developer.nvidia.com/cuda-downloads?target_os=Linux&target_arch=x86_64&Distribution=Debian&target_version=12&target_type=deb_localnvcc
andnvidia-smi
shows the expected output.2) install
llama-cpp-python
Attempt 1)
pip install llama-cpp-python --force-reinstall --upgrade --no-cache-dir
Installsllama-cpp-python
but the python code is not using the GPU.Attempt 2)
CMAKE_ARGS="-DGGML_CUDA=on" pip install llama-cpp-python --force-reinstall --upgrade --no-cache-dir
will fail withgcc versions later than 13 are not supported
(CUDA 12.x requires and installsgcc-14
)Attempt 3)
NVCC_PREPEND_FLAGS="-allow-unsupported-compiler" CMAKE_ARGS="-DGGML_CUDA=on " pip install llama-cpp-python --force-reinstall --upgrade --no-cache-dir --verbose
fails as well with 20 errors while compiling....==> I think I need
llama-cpp-python
withCMAKE_ARGS="-DGGML_CUDA=on"
that work withgcc-14
...evidence for Attempt 1)
results in