abetlen / llama-cpp-python

Python bindings for llama.cpp
https://llama-cpp-python.readthedocs.io
MIT License
8.14k stars 969 forks source link

Error install on Ubuntu 23.10 (CUDA) #1103

Open celsowm opened 10 months ago

celsowm commented 10 months ago

Prerequisites

Please answer the following questions for yourself before submitting an issue.

Expected Behavior

To install correctly

Current Behavior

error

Environment and Context

I am trying to install llama cpp on Ubuntu 23.10 using:

CMAKE_ARGS="-DLLAMA_CUBLAS=on" pip install llama-cpp-python

But I got this error:

[46 lines of output]
      *** scikit-build-core 0.7.1 using CMake 3.27.4 (wheel)
      *** Configuring CMake...
      loading initial cache file /tmp/tmpnqf71w1p/build/CMakeInit.txt
      -- The C compiler identification is GNU 13.2.0
      -- The CXX compiler identification is GNU 13.2.0
      -- Detecting C compiler ABI info
      -- Detecting C compiler ABI info - done
      -- Check for working C compiler: /usr/bin/cc - skipped
      -- Detecting C compile features
      -- Detecting C compile features - done
      -- Detecting CXX compiler ABI info
      -- Detecting CXX compiler ABI info - done
      -- Check for working CXX compiler: /usr/bin/c++ - skipped
      -- Detecting CXX compile features
      -- Detecting CXX compile features - done
      -- Found Git: /usr/bin/git (found version "2.40.1")
      -- Performing Test CMAKE_HAVE_LIBC_PTHREAD
      -- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
      -- Found Threads: TRUE
      -- Unable to find cuda_runtime.h in "/usr/local/include" for CUDAToolkit_INCLUDE_DIR.
      -- Unable to find cublas_v2.h in either "" or "/math_libs/include"
      -- Unable to find cudart library.
      -- Could NOT find CUDAToolkit (missing: CUDAToolkit_INCLUDE_DIR CUDA_CUDART) (found version "11.8.89")
      CMake Warning at vendor/llama.cpp/CMakeLists.txt:360 (message):
        cuBLAS not found

Its crazy because it says "found version "11.8.89" about CUDA !

m-from-space commented 10 months ago

My suggestion:

CUDACXX=/usr/local/cuda-12/bin/nvcc CMAKE_ARGS="-DLLAMA_CUBLAS=on -DCMAKE_CUDA_ARCHITECTURES=native" FORCE_CMAKE=1 pip install llama-cpp-python --force-reinstall --upgrade

celsowm commented 10 months ago

My suggestion:

CUDACXX=/usr/local/cuda-12/bin/nvcc CMAKE_ARGS="-DLLAMA_CUBLAS=on -DCMAKE_CUDA_ARCHITECTURES=native" FORCE_CMAKE=1 pip install llama-cpp-python --force-reinstall --upgrade

Hi, thanks for your suggestion ! Meanwhile I downgrade my gcc13 to 12 and updated my nvidia cuda driver from 11.8 to 12.0, the result now is:

Building wheel for llama-cpp-python (pyproject.toml) did not run successfully.
  │ exit code: 1
  ╰─> [133 lines of output]
      *** scikit-build-core 0.7.1 using CMake 3.28.1 (wheel)
      *** Configuring CMake...
      loading initial cache file /tmp/tmp624g1wvw/build/CMakeInit.txt
      -- The C compiler identification is GNU 12.3.0
      -- The CXX compiler identification is GNU 12.3.0
      -- Detecting C compiler ABI info
      -- Detecting C compiler ABI info - done
      -- Check for working C compiler: /usr/bin/gcc - skipped
      -- Detecting C compile features
      -- Detecting C compile features - done
      -- Detecting CXX compiler ABI info
      -- Detecting CXX compiler ABI info - done
      -- Check for working CXX compiler: /usr/bin/g++ - skipped
      -- Detecting CXX compile features
      -- Detecting CXX compile features - done
      -- Found Git: /usr/bin/git (found version "2.40.1")
      -- Performing Test CMAKE_HAVE_LIBC_PTHREAD
      -- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
      -- Found Threads: TRUE
      -- Found CUDAToolkit: /usr/include (found version "12.0.140")
      -- cuBLAS found
      CMake Error at /tmp/pip-build-env-xj0kr9y6/normal/local/lib/python3.11/dist-packages/cmake/data/share/cmake-3.28/Modules/CMakeDetermineCompilerId.cmake:780 (message):
        Compiling the CUDA compiler identification source file
        "CMakeCUDACompilerId.cu" failed.

        Compiler: /usr/bin/nvcc

        Build flags:

        Id flags: --keep;--keep-dir;tmp -v

        The output was:

        2

        #$ _NVVM_BRANCH_=nvvm

        #$ _SPACE_=

        #$ _CUDART_=cudart

        #$ _HERE_=/usr/lib/nvidia-cuda-toolkit/bin

        #$ _THERE_=/usr/lib/nvidia-cuda-toolkit/bin

        #$ _TARGET_SIZE_=

        #$ _TARGET_DIR_=

        #$ _TARGET_SIZE_=64

        #$ NVVMIR_LIBRARY_DIR=/usr/lib/nvidia-cuda-toolkit/libdevice

        #$
        PATH=/usr/lib/nvidia-cuda-toolkit/bin:/tmp/pip-build-env-xj0kr9y6/overlay/local/bin:/tmp/pip-build-env-xj0kr9y6/normal/local/bin:/home/celso/.local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin

        #$ LIBRARIES= -L/usr/lib/x86_64-linux-gnu/stubs -L/usr/lib/x86_64-linux-gnu

        #$ rm tmp/a_dlink.reg.c

        #$ gcc -D__CUDA_ARCH_LIST__=520 -E -x c++ -D__CUDACC__ -D__NVCC__
        -D__CUDACC_VER_MAJOR__=12 -D__CUDACC_VER_MINOR__=0
        -D__CUDACC_VER_BUILD__=140 -D__CUDA_API_VER_MAJOR__=12
        -D__CUDA_API_VER_MINOR__=0 -D__NVCC_DIAG_PRAGMA_SUPPORT__=1 -include
        "cuda_runtime.h" -m64 "CMakeCUDACompilerId.cu" -o
        "tmp/CMakeCUDACompilerId.cpp4.ii"

        #$ cudafe++ --c++17 --gnu_version=120300 --display_error_number
        --orig_src_file_name "CMakeCUDACompilerId.cu" --orig_src_path_name
        "/tmp/tmp624g1wvw/build/CMakeFiles/3.28.1/CompilerIdCUDA/CMakeCUDACompilerId.cu"
        --allow_managed --m64 --parse_templates --gen_c_file_name
        "tmp/CMakeCUDACompilerId.cudafe1.cpp" --stub_file_name
        "CMakeCUDACompilerId.cudafe1.stub.c" --gen_module_id_file
        --module_id_file_name "tmp/CMakeCUDACompilerId.module_id"
        "tmp/CMakeCUDACompilerId.cpp4.ii"

        /usr/local/include/cuda_runtime.h(654): error: the global scope has no
        "cudaMemAdvise_v2"

        /usr/local/include/cuda_runtime.h(666): error: the global scope has no
        "cudaMemPrefetchAsync_v2"

        /usr/local/include/cuda_runtime.h(2301): error: identifier "cudaKernel_t"
        is undefined

        /usr/local/include/cuda_runtime.h(2301): error: identifier "kernelPtr" is
        undefined

        /usr/local/include/cuda_runtime.h(2302): error: expected an expression

        /usr/local/include/cuda_runtime.h(2303): error: too many initializer values

        /usr/local/include/cuda_runtime.h(2304): error: expected a ";"

        CMakeCUDACompilerId.cu(453): error: identifier "info_compiler" is undefined

        8 errors detected in the compilation of "CMakeCUDACompilerId.cu".

        # --error 0x2 --

      Call Stack (most recent call first):
        /tmp/pip-build-env-xj0kr9y6/normal/local/lib/python3.11/dist-packages/cmake/data/share/cmake-3.28/Modules/CMakeDetermineCompilerId.cmake:8 (CMAKE_DETERMINE_COMPILER_ID_BUILD)
        /tmp/pip-build-env-xj0kr9y6/normal/local/lib/python3.11/dist-packages/cmake/data/share/cmake-3.28/Modules/CMakeDetermineCompilerId.cmake:53 (__determine_compiler_id_test)
        /tmp/pip-build-env-xj0kr9y6/normal/local/lib/python3.11/dist-packages/cmake/data/share/cmake-3.28/Modules/CMakeDetermineCUDACompiler.cmake:135 (CMAKE_DETERMINE_COMPILER_ID)
        vendor/llama.cpp/CMakeLists.txt:306 (enable_language)

      -- Configuring incomplete, errors occurred!

      *** CMake configuration failed
      [end of output]
celsowm commented 10 months ago

@m-from-space your ideia worked it ! (combined with setting gcc-12 as default) now, how can I test if the lib is using gpu?

yousecjoe commented 10 months ago

@m-from-space your ideia worked it ! (combined with setting gcc-12 as default) now, how can I test if the lib is using GPU?

Read the output, use it, and monitor the CUDA utilization.

-DLLAMA_CUBLAS=on is what enables the GPU on NVIDIA hardware.

When run, you will see in the output where the GPU is being used for things like offloading. You will also know the GPU is in use when you monitor the CUDA utilization on the GPU during the response and see CUDA utilization.

cannguyen275 commented 10 months ago

Facing same issue here in anaconda. Trying cuda > 12 doesn't work. Do you have any ideas on how to fix it?

Building wheels for collected packages: llama-cpp-python
  Building wheel for llama-cpp-python (pyproject.toml) ... error
  error: subprocess-exited-with-error

  × Building wheel for llama-cpp-python (pyproject.toml) did not run successfully.
  │ exit code: 1
  ╰─> [46 lines of output]
      *** scikit-build-core 0.7.1 using CMake 3.28.1 (wheel)
      *** Configuring CMake...
      2024-01-22 17:47:25,115 - scikit_build_core - WARNING - libdir/ldlibrary: /home/cannguyen/miniconda3/envs/env1/lib/libpython3.11.a is not a real file!
      2024-01-22 17:47:25,115 - scikit_build_core - WARNING - Can't find a Python library, got libdir=/home/cannguyen/miniconda3/envs/env1/lib, ldlibrary=libpython3.11.a, multiarch=x86_64-linux-gnu, masd=None
      loading initial cache file /tmp/tmpgi2n8624/build/CMakeInit.txt
      -- The C compiler identification is GNU 13.2.0
      -- The CXX compiler identification is GNU 13.2.0
      -- Detecting C compiler ABI info
      -- Detecting C compiler ABI info - done
      -- Check for working C compiler: /usr/bin/cc - skipped
      -- Detecting C compile features
      -- Detecting C compile features - done
      -- Detecting CXX compiler ABI info
      -- Detecting CXX compiler ABI info - done
      -- Check for working CXX compiler: /usr/bin/c++ - skipped
      -- Detecting CXX compile features
      -- Detecting CXX compile features - done
      -- Found Git: /usr/bin/git (found version "2.40.1")
      -- Performing Test CMAKE_HAVE_LIBC_PTHREAD
      -- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
      -- Found Threads: TRUE
      -- Unable to find cudart library.
      -- Could NOT find CUDAToolkit (missing: CUDA_CUDART) (found version "12.1.105")
      CMake Warning at vendor/llama.cpp/CMakeLists.txt:360 (message):
        cuBLAS not found

      -- CUDA host compiler is GNU
      CMake Error at vendor/llama.cpp/CMakeLists.txt:536 (get_flags):
        get_flags Function invoked with incorrect arguments for function named:
        get_flags

      -- CMAKE_SYSTEM_PROCESSOR: x86_64
      -- x86 detected
      CMake Warning (dev) at CMakeLists.txt:21 (install):
        Target llama has PUBLIC_HEADER files but no PUBLIC_HEADER DESTINATION.
      This warning is for project developers.  Use -Wno-dev to suppress it.

      CMake Warning (dev) at CMakeLists.txt:30 (install):
        Target llama has PUBLIC_HEADER files but no PUBLIC_HEADER DESTINATION.
      This warning is for project developers.  Use -Wno-dev to suppress it.

      -- Configuring incomplete, errors occurred!

      *** CMake configuration failed
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
  ERROR: Failed building wheel for llama-cpp-python
Failed to build llama-cpp-python
ERROR: Could not build wheels for llama-cpp-python, which is required to install pyproject.toml-based projects

I install this library by running this command: CUDACXX=/home/cannguyen/miniconda3/envs/env1/bin/nvcc CMAKE_ARGS="-DLLAMA_CUBLAS=on -DCMAKE_CUDA_ARCHITECTURES=native" FORCE_CMAKE=1 pip install llama-cpp-python --force-reinstall --upgrade

This is what I have related to CUDA in my conda environment:

> conda list | grep cuda
cuda-cudart               12.1.105                      0    nvidia
cuda-cupti                12.1.105                      0    nvidia
cuda-libraries            12.1.0                        0    nvidia
cuda-nvcc                 12.1.105                      0    nvidia/label/cuda-12.1.1
cuda-nvrtc                12.1.105                      0    nvidia
cuda-nvtx                 12.1.105                      0    nvidia
cuda-opencl               12.3.101                      0    nvidia
cuda-runtime              12.1.0                        0    nvidia
pytorch                   2.1.1           py3.11_cuda12.1_cudnn8.9.2_0    pytorch
pytorch-cuda              12.1                 ha16c6d3_5    pytorch
pytorch-mutex             1.0                        cuda    pytorch
cannguyen275 commented 10 months ago

Hello from above issue, I've passed it by:

conda install nvidia::cuda-nvcc
conda install nvidia::cuda-toolkit
conda install gcc=12 -c conda-forge
conda install -c conda-forge gxx_linux-64

After that, the installation will be stuck at:

  [18/23] : && /usr/bin/g++ -fPIC -Wl,-O2 -Wl,--sort-common -Wl,--as-needed -Wl,-z,relro -Wl,-z,now -Wl,--disable-new-dtags -Wl,--gc-sections -Wl,--allow-shlib-undefined -Wl,-rpath,/home/cannguyen/miniconda3/envs/env1/lib -Wl,-rpath-link,/home/cannguyen/miniconda3/envs/env1/lib -L/home/cannguyen/miniconda3/envs/env1/lib -shared -Wl,-soname,libggml_shared.so -o vendor/llama.cpp/libggml_shared.so vendor/llama.cpp/CMakeFiles/ggml.dir/ggml.c.o vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-alloc.c.o vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-backend.c.o vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-quants.c.o vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-cuda.cu.o -L/home/cannguyen/miniconda3/envs/env1/lib/gcc/x86_64-conda-linux-gnu/12.3.0   -L/home/cannguyen/miniconda3/envs/env1/lib/gcc   -L/home/cannguyen/miniconda3/envs/env1/x86_64-conda-linux-gnu/lib   -L/home/cannguyen/miniconda3/envs/env1/x86_64-conda-linux-gnu/sysroot/lib   -L/home/cannguyen/miniconda3/envs/env1/x86_64-conda-linux-gnu/sysroot/usr/lib /home/cannguyen/miniconda3/envs/env1/lib/libcudart.so  /home/cannguyen/miniconda3/envs/env1/lib/libcublas.so  /home/cannguyen/miniconda3/envs/env1/lib/libcublasLt.so  /home/cannguyen/miniconda3/envs/env1/lib/stubs/libcuda.so  -pthread  /home/cannguyen/miniconda3/envs/env1/lib/libculibos.a  -lcudadevrt  -lcudart_static  -lrt  -lpthread  -ldl -L"/home/cannguyen/miniconda3/envs/env1/lib/stubs" -L"/home/cannguyen/miniconda3/envs/env1/lib" && :
  FAILED: vendor/llama.cpp/libggml_shared.so
  : && /usr/bin/g++ -fPIC -Wl,-O2 -Wl,--sort-common -Wl,--as-needed -Wl,-z,relro -Wl,-z,now -Wl,--disable-new-dtags -Wl,--gc-sections -Wl,--allow-shlib-undefined -Wl,-rpath,/home/cannguyen/miniconda3/envs/env1/lib -Wl,-rpath-link,/home/cannguyen/miniconda3/envs/env1/lib -L/home/cannguyen/miniconda3/envs/env1/lib -shared -Wl,-soname,libggml_shared.so -o vendor/llama.cpp/libggml_shared.so vendor/llama.cpp/CMakeFiles/ggml.dir/ggml.c.o vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-alloc.c.o vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-backend.c.o vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-quants.c.o vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-cuda.cu.o -L/home/cannguyen/miniconda3/envs/env1/lib/gcc/x86_64-conda-linux-gnu/12.3.0   -L/home/cannguyen/miniconda3/envs/env1/lib/gcc   -L/home/cannguyen/miniconda3/envs/env1/x86_64-conda-linux-gnu/lib   -L/home/cannguyen/miniconda3/envs/env1/x86_64-conda-linux-gnu/sysroot/lib   -L/home/cannguyen/miniconda3/envs/env1/x86_64-conda-linux-gnu/sysroot/usr/lib /home/cannguyen/miniconda3/envs/env1/lib/libcudart.so  /home/cannguyen/miniconda3/envs/env1/lib/libcublas.so  /home/cannguyen/miniconda3/envs/env1/lib/libcublasLt.so  /home/cannguyen/miniconda3/envs/env1/lib/stubs/libcuda.so  -pthread  /home/cannguyen/miniconda3/envs/env1/lib/libculibos.a  -lcudadevrt  -lcudart_static  -lrt  -lpthread  -ldl -L"/home/cannguyen/miniconda3/envs/env1/lib/stubs" -L"/home/cannguyen/miniconda3/envs/env1/lib" && :
  /usr/bin/ld: cannot find /lib64/libpthread.so.0: No such file or directory
  /usr/bin/ld: cannot find /usr/lib64/libpthread_nonshared.a: No such file or directory
  collect2: error: ld returned 1 exit status

Searching around but cannot find any solutions. I found that install this library in conda env is pretty hard. As I want to try LLama models in Langchain, may I have your advice for an easier solution?

cy94 commented 2 months ago

I'm getting the same issue using Llava with CUDA, was anyone able to fix this?