Issue in installing llama-cpp-python.

couragelfyang commented 10 months ago
I'm trying to install llama-cpp-python through CMAKE_ARGS="-DLLAMA_CUBLAS=on" pip install llama-cpp-python but met this error. Any suggestion?
× Building wheel for llama-cpp-python (pyproject.toml) did not run successfully.
  │ exit code: 1
  ╰─> [71 lines of output]
      *** scikit-build-core 0.7.1 using CMake 3.22.1 (wheel)
      *** Configuring CMake...
      loading initial cache file /tmp/tmph92nnsrw/build/CMakeInit.txt
      -- The C compiler identification is GNU 11.4.0
      -- The CXX compiler identification is GNU 11.4.0
      -- Detecting C compiler ABI info
      -- Detecting C compiler ABI info - done
      -- Check for working C compiler: /usr/bin/cc - skipped
      -- Detecting C compile features
      -- Detecting C compile features - done
      -- Detecting CXX compiler ABI info
      -- Detecting CXX compiler ABI info - done
      -- Check for working CXX compiler: /usr/bin/c++ - skipped
      -- Detecting CXX compile features
      -- Detecting CXX compile features - done
      -- Found Git: /usr/bin/git (found version "2.34.1")
      -- Looking for pthread.h
      -- Looking for pthread.h - found
      -- Performing Test CMAKE_HAVE_LIBC_PTHREAD
      -- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
      -- Found Threads: TRUE
      -- Found CUDAToolkit: /usr/local/cuda/include (found version "11.6.55")
      -- cuBLAS found
      -- The CUDA compiler identification is NVIDIA 11.6.55
      -- Detecting CUDA compiler ABI info
      -- Detecting CUDA compiler ABI info - done
      -- Check for working CUDA compiler: /usr/local/cuda/bin/nvcc - skipped
      -- Detecting CUDA compile features
      -- Detecting CUDA compile features - done
      -- Using CUDA architectures: 52;61;70
      -- CUDA host compiler is GNU 11.4.0

      -- CMAKE_SYSTEM_PROCESSOR: x86_64
      -- x86 detected
      INSTALL TARGETS - target llama has PUBLIC_HEADER files but no PUBLIC_HEADER DESTINATION.
      INSTALL TARGETS - target llama has PUBLIC_HEADER files but no PUBLIC_HEADER DESTINATION.
      -- Configuring done
      -- Generating done
      -- Build files have been written to: /tmp/tmph92nnsrw/build
      *** Building project with Ninja...
      [1/23] cd /tmp/pip-install-lps8mve2/llama-cpp-python_f08f076e92114539aaa45aee27b04713/vendor/llama.cpp && /usr/bin/cmake -DMSVC= -DCMAKE_C_COMPILER_VERSION=11.4.0 -DCMAKE_C_COMPILER_ID=GNU -DCMAKE_VS_PLATFORM_NAME= -DCMAKE_C_COMPILER=/usr/bin/cc -P /tmp/pip-install-lps8mve2/llama-cpp-python_f08f076e92114539aaa45aee27b04713/vendor/llama.cpp/common/../scripts/gen-build-info-cpp.cmake
      -- Found Git: /usr/bin/git (found version "2.34.1")
      [2/23] /usr/bin/c++ -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUBLAS -DK_QUANTS_PER_ITERATION=2 -D_GNU_SOURCE -D_XOPEN_SOURCE=600  -O3 -DNDEBUG -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Wno-array-bounds -Wno-format-truncation -Wextra-semi -march=native -std=gnu++11 -MD -MT vendor/llama.cpp/common/CMakeFiles/build_info.dir/build-info.cpp.o -MF vendor/llama.cpp/common/CMakeFiles/build_info.dir/build-info.cpp.o.d -o vendor/llama.cpp/common/CMakeFiles/build_info.dir/build-info.cpp.o -c /tmp/pip-install-lps8mve2/llama-cpp-python_f08f076e92114539aaa45aee27b04713/vendor/llama.cpp/common/build-info.cpp
      [3/23] /usr/bin/c++ -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUBLAS -DK_QUANTS_PER_ITERATION=2 -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -I/tmp/pip-install-lps8mve2/llama-cpp-python_f08f076e92114539aaa45aee27b04713/vendor/llama.cpp/common/. -I/tmp/pip-install-lps8mve2/llama-cpp-python_f08f076e92114539aaa45aee27b04713/vendor/llama.cpp/. -O3 -DNDEBUG -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Wno-array-bounds -Wno-format-truncation -Wextra-semi -march=native -std=gnu++11 -MD -MT vendor/llama.cpp/common/CMakeFiles/common.dir/console.cpp.o -MF vendor/llama.cpp/common/CMakeFiles/common.dir/console.cpp.o.d -o vendor/llama.cpp/common/CMakeFiles/common.dir/console.cpp.o -c /tmp/pip-install-lps8mve2/llama-cpp-python_f08f076e92114539aaa45aee27b04713/vendor/llama.cpp/common/console.cpp
      [4/23] /usr/bin/cc -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUBLAS -DK_QUANTS_PER_ITERATION=2 -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -I/tmp/pip-install-lps8mve2/llama-cpp-python_f08f076e92114539aaa45aee27b04713/vendor/llama.cpp/. -isystem /usr/local/cuda/include -O3 -DNDEBUG -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wshadow -Wstrict-prototypes -Wpointer-arith -Wmissing-prototypes -Werror=implicit-int -Werror=implicit-function-declaration -Wdouble-promotion -march=native -std=gnu11 -MD -MT vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-alloc.c.o -MF vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-alloc.c.o.d -o vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-alloc.c.o -c /tmp/pip-install-lps8mve2/llama-cpp-python_f08f076e92114539aaa45aee27b04713/vendor/llama.cpp/ggml-alloc.c
      [5/23] /usr/bin/c++ -DLLAMA_BUILD -DLLAMA_SHARED -I/tmp/pip-install-lps8mve2/llama-cpp-python_f08f076e92114539aaa45aee27b04713/vendor/llama.cpp/examples/llava/. -I/tmp/pip-install-lps8mve2/llama-cpp-python_f08f076e92114539aaa45aee27b04713/vendor/llama.cpp/examples/llava/../.. -I/tmp/pip-install-lps8mve2/llama-cpp-python_f08f076e92114539aaa45aee27b04713/vendor/llama.cpp/examples/llava/../../common -I/tmp/pip-install-lps8mve2/llama-cpp-python_f08f076e92114539aaa45aee27b04713/vendor/llama.cpp/. -isystem /usr/local/cuda/include -O3 -DNDEBUG -fPIC -Wno-cast-qual -MD -MT vendor/llama.cpp/examples/llava/CMakeFiles/llava.dir/llava.cpp.o -MF vendor/llama.cpp/examples/llava/CMakeFiles/llava.dir/llava.cpp.o.d -o vendor/llama.cpp/examples/llava/CMakeFiles/llava.dir/llava.cpp.o -c /tmp/pip-install-lps8mve2/llama-cpp-python_f08f076e92114539aaa45aee27b04713/vendor/llama.cpp/examples/llava/llava.cpp
      [6/23] /usr/bin/cc -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUBLAS -DK_QUANTS_PER_ITERATION=2 -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -I/tmp/pip-install-lps8mve2/llama-cpp-python_f08f076e92114539aaa45aee27b04713/vendor/llama.cpp/. -isystem /usr/local/cuda/include -O3 -DNDEBUG -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wshadow -Wstrict-prototypes -Wpointer-arith -Wmissing-prototypes -Werror=implicit-int -Werror=implicit-function-declaration -Wdouble-promotion -march=native -std=gnu11 -MD -MT vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-backend.c.o -MF vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-backend.c.o.d -o vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-backend.c.o -c /tmp/pip-install-lps8mve2/llama-cpp-python_f08f076e92114539aaa45aee27b04713/vendor/llama.cpp/ggml-backend.c
      [7/23] /usr/bin/c++ -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUBLAS -DK_QUANTS_PER_ITERATION=2 -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -I/tmp/pip-install-lps8mve2/llama-cpp-python_f08f076e92114539aaa45aee27b04713/vendor/llama.cpp/common/. -I/tmp/pip-install-lps8mve2/llama-cpp-python_f08f076e92114539aaa45aee27b04713/vendor/llama.cpp/. -O3 -DNDEBUG -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Wno-array-bounds -Wno-format-truncation -Wextra-semi -march=native -std=gnu++11 -MD -MT vendor/llama.cpp/common/CMakeFiles/common.dir/sampling.cpp.o -MF vendor/llama.cpp/common/CMakeFiles/common.dir/sampling.cpp.o.d -o vendor/llama.cpp/common/CMakeFiles/common.dir/sampling.cpp.o -c /tmp/pip-install-lps8mve2/llama-cpp-python_f08f076e92114539aaa45aee27b04713/vendor/llama.cpp/common/sampling.cpp
      [8/23] /usr/bin/c++ -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUBLAS -DK_QUANTS_PER_ITERATION=2 -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -I/tmp/pip-install-lps8mve2/llama-cpp-python_f08f076e92114539aaa45aee27b04713/vendor/llama.cpp/common/. -I/tmp/pip-install-lps8mve2/llama-cpp-python_f08f076e92114539aaa45aee27b04713/vendor/llama.cpp/. -O3 -DNDEBUG -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Wno-array-bounds -Wno-format-truncation -Wextra-semi -march=native -std=gnu++11 -MD -MT vendor/llama.cpp/common/CMakeFiles/common.dir/grammar-parser.cpp.o -MF vendor/llama.cpp/common/CMakeFiles/common.dir/grammar-parser.cpp.o.d -o vendor/llama.cpp/common/CMakeFiles/common.dir/grammar-parser.cpp.o -c /tmp/pip-install-lps8mve2/llama-cpp-python_f08f076e92114539aaa45aee27b04713/vendor/llama.cpp/common/grammar-parser.cpp
      [9/23] /usr/bin/c++  -I/tmp/pip-install-lps8mve2/llama-cpp-python_f08f076e92114539aaa45aee27b04713/vendor/llama.cpp/common/. -I/tmp/pip-install-lps8mve2/llama-cpp-python_f08f076e92114539aaa45aee27b04713/vendor/llama.cpp/. -I/tmp/pip-install-lps8mve2/llama-cpp-python_f08f076e92114539aaa45aee27b04713/vendor/llama.cpp/examples/llava/. -I/tmp/pip-install-lps8mve2/llama-cpp-python_f08f076e92114539aaa45aee27b04713/vendor/llama.cpp/examples/llava/../.. -I/tmp/pip-install-lps8mve2/llama-cpp-python_f08f076e92114539aaa45aee27b04713/vendor/llama.cpp/examples/llava/../../common -O3 -DNDEBUG -MD -MT vendor/llama.cpp/examples/llava/CMakeFiles/llava-cli.dir/llava-cli.cpp.o -MF vendor/llama.cpp/examples/llava/CMakeFiles/llava-cli.dir/llava-cli.cpp.o.d -o vendor/llama.cpp/examples/llava/CMakeFiles/llava-cli.dir/llava-cli.cpp.o -c /tmp/pip-install-lps8mve2/llama-cpp-python_f08f076e92114539aaa45aee27b04713/vendor/llama.cpp/examples/llava/llava-cli.cpp
      [10/23] /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUBLAS -DK_QUANTS_PER_ITERATION=2 -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -I/tmp/pip-install-lps8mve2/llama-cpp-python_f08f076e92114539aaa45aee27b04713/vendor/llama.cpp/. -isystem=/usr/local/cuda/include -O3 -DNDEBUG --generate-code=arch=compute_52,code=[compute_52,sm_52] --generate-code=arch=compute_61,code=[compute_61,sm_61] --generate-code=arch=compute_70,code=[compute_70,sm_70] -Xcompiler=-fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -use_fast_math -Wno-pedantic -Xcompiler "-Wno-array-bounds -Wno-format-truncation -Wextra-semi" -march=native -std=c++11 -MD -MT vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-cuda.cu.o -MF vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-cuda.cu.o.d -x cu -c /tmp/pip-install-lps8mve2/llama-cpp-python_f08f076e92114539aaa45aee27b04713/vendor/llama.cpp/ggml-cuda.cu -o vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-cuda.cu.o
      FAILED: vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-cuda.cu.o
      /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUBLAS -DK_QUANTS_PER_ITERATION=2 -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -I/tmp/pip-install-lps8mve2/llama-cpp-python_f08f076e92114539aaa45aee27b04713/vendor/llama.cpp/. -isystem=/usr/local/cuda/include -O3 -DNDEBUG --generate-code=arch=compute_52,code=[compute_52,sm_52] --generate-code=arch=compute_61,code=[compute_61,sm_61] --generate-code=arch=compute_70,code=[compute_70,sm_70] -Xcompiler=-fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -use_fast_math -Wno-pedantic -Xcompiler "-Wno-array-bounds -Wno-format-truncation -Wextra-semi" -march=native -std=c++11 -MD -MT vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-cuda.cu.o -MF vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-cuda.cu.o.d -x cu -c /tmp/pip-install-lps8mve2/llama-cpp-python_f08f076e92114539aaa45aee27b04713/vendor/llama.cpp/ggml-cuda.cu -o vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-cuda.cu.o
      /tmp/pip-install-lps8mve2/llama-cpp-python_f08f076e92114539aaa45aee27b04713/vendor/llama.cpp/ggml-cuda.cu(626): error: identifier "__hmax2" is undefined

      /tmp/pip-install-lps8mve2/llama-cpp-python_f08f076e92114539aaa45aee27b04713/vendor/llama.cpp/ggml-cuda.cu(5462): error: identifier "__hmax2" is undefined

      /tmp/pip-install-lps8mve2/llama-cpp-python_f08f076e92114539aaa45aee27b04713/vendor/llama.cpp/ggml-cuda.cu(5474): error: identifier "__hmax" is undefined

      /tmp/pip-install-lps8mve2/llama-cpp-python_f08f076e92114539aaa45aee27b04713/vendor/llama.cpp/ggml-cuda.cu(5481): error: identifier "__hmax" is undefined

      4 errors detected in the compilation of "/tmp/pip-install-lps8mve2/llama-cpp-python_f08f076e92114539aaa45aee27b04713/vendor/llama.cpp/ggml-cuda.cu".
      [11/23] /usr/bin/cc -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUBLAS -DK_QUANTS_PER_ITERATION=2 -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -I/tmp/pip-install-lps8mve2/llama-cpp-python_f08f076e92114539aaa45aee27b04713/vendor/llama.cpp/. -isystem /usr/local/cuda/include -O3 -DNDEBUG -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wshadow -Wstrict-prototypes -Wpointer-arith -Wmissing-prototypes -Werror=implicit-int -Werror=implicit-function-declaration -Wdouble-promotion -march=native -std=gnu11 -MD -MT vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-quants.c.o -MF vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-quants.c.o.d -o vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-quants.c.o -c /tmp/pip-install-lps8mve2/llama-cpp-python_f08f076e92114539aaa45aee27b04713/vendor/llama.cpp/ggml-quants.c
      [12/23] /usr/bin/c++ -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUBLAS -DK_QUANTS_PER_ITERATION=2 -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -I/tmp/pip-install-lps8mve2/llama-cpp-python_f08f076e92114539aaa45aee27b04713/vendor/llama.cpp/common/. -I/tmp/pip-install-lps8mve2/llama-cpp-python_f08f076e92114539aaa45aee27b04713/vendor/llama.cpp/. -O3 -DNDEBUG -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Wno-array-bounds -Wno-format-truncation -Wextra-semi -march=native -std=gnu++11 -MD -MT vendor/llama.cpp/common/CMakeFiles/common.dir/train.cpp.o -MF vendor/llama.cpp/common/CMakeFiles/common.dir/train.cpp.o.d -o vendor/llama.cpp/common/CMakeFiles/common.dir/train.cpp.o -c /tmp/pip-install-lps8mve2/llama-cpp-python_f08f076e92114539aaa45aee27b04713/vendor/llama.cpp/common/train.cpp
      [13/23] /usr/bin/c++ -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUBLAS -DK_QUANTS_PER_ITERATION=2 -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -I/tmp/pip-install-lps8mve2/llama-cpp-python_f08f076e92114539aaa45aee27b04713/vendor/llama.cpp/common/. -I/tmp/pip-install-lps8mve2/llama-cpp-python_f08f076e92114539aaa45aee27b04713/vendor/llama.cpp/. -O3 -DNDEBUG -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Wno-array-bounds -Wno-format-truncation -Wextra-semi -march=native -std=gnu++11 -MD -MT vendor/llama.cpp/common/CMakeFiles/common.dir/common.cpp.o -MF vendor/llama.cpp/common/CMakeFiles/common.dir/common.cpp.o.d -o vendor/llama.cpp/common/CMakeFiles/common.dir/common.cpp.o -c /tmp/pip-install-lps8mve2/llama-cpp-python_f08f076e92114539aaa45aee27b04713/vendor/llama.cpp/common/common.cpp
      [14/23] /usr/bin/cc -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUBLAS -DK_QUANTS_PER_ITERATION=2 -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -I/tmp/pip-install-lps8mve2/llama-cpp-python_f08f076e92114539aaa45aee27b04713/vendor/llama.cpp/. -isystem /usr/local/cuda/include -O3 -DNDEBUG -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wshadow -Wstrict-prototypes -Wpointer-arith -Wmissing-prototypes -Werror=implicit-int -Werror=implicit-function-declaration -Wdouble-promotion -march=native -std=gnu11 -MD -MT vendor/llama.cpp/CMakeFiles/ggml.dir/ggml.c.o -MF vendor/llama.cpp/CMakeFiles/ggml.dir/ggml.c.o.d -o vendor/llama.cpp/CMakeFiles/ggml.dir/ggml.c.o -c /tmp/pip-install-lps8mve2/llama-cpp-python_f08f076e92114539aaa45aee27b04713/vendor/llama.cpp/ggml.c
      [15/23] /usr/bin/c++ -DLLAMA_BUILD -DLLAMA_SHARED -I/tmp/pip-install-lps8mve2/llama-cpp-python_f08f076e92114539aaa45aee27b04713/vendor/llama.cpp/examples/llava/. -I/tmp/pip-install-lps8mve2/llama-cpp-python_f08f076e92114539aaa45aee27b04713/vendor/llama.cpp/examples/llava/../.. -I/tmp/pip-install-lps8mve2/llama-cpp-python_f08f076e92114539aaa45aee27b04713/vendor/llama.cpp/examples/llava/../../common -I/tmp/pip-install-lps8mve2/llama-cpp-python_f08f076e92114539aaa45aee27b04713/vendor/llama.cpp/. -isystem /usr/local/cuda/include -O3 -DNDEBUG -fPIC -Wno-cast-qual -MD -MT vendor/llama.cpp/examples/llava/CMakeFiles/llava.dir/clip.cpp.o -MF vendor/llama.cpp/examples/llava/CMakeFiles/llava.dir/clip.cpp.o.d -o vendor/llama.cpp/examples/llava/CMakeFiles/llava.dir/clip.cpp.o -c /tmp/pip-install-lps8mve2/llama-cpp-python_f08f076e92114539aaa45aee27b04713/vendor/llama.cpp/examples/llava/clip.cpp
      [16/23] /usr/bin/c++ -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUBLAS -DK_QUANTS_PER_ITERATION=2 -DLLAMA_BUILD -DLLAMA_SHARED -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dllama_EXPORTS -I/tmp/pip-install-lps8mve2/llama-cpp-python_f08f076e92114539aaa45aee27b04713/vendor/llama.cpp/. -isystem /usr/local/cuda/include -O3 -DNDEBUG -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Wno-array-bounds -Wno-format-truncation -Wextra-semi -march=native -std=gnu++11 -MD -MT vendor/llama.cpp/CMakeFiles/llama.dir/llama.cpp.o -MF vendor/llama.cpp/CMakeFiles/llama.dir/llama.cpp.o.d -o vendor/llama.cpp/CMakeFiles/llama.dir/llama.cpp.o -c /tmp/pip-install-lps8mve2/llama-cpp-python_f08f076e92114539aaa45aee27b04713/vendor/llama.cpp/llama.cpp
      ninja: build stopped: subcommand failed.

      *** CMake build failed
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
  ERROR: Failed building wheel for llama-cpp-python
Failed to build llama-cpp-python
ERROR: Could not build wheels for llama-cpp-python, which is required to install pyproject.toml-based projects
aniljava commented 10 months ago
Try with cuda >= 12
Similar issues with 11.xx https://github.com/huggingface/candle/issues/353
MrJefter commented 10 months ago
I think it's a bug with last minor update or so it seems, because i tried compiling with llama-cpp-python==0.2.27 and it worked normally.
Full command for building with CUDA support: FORCE_CMAKE=1 CMAKE_ARGS="-DLLAMA_CUBLAS=on" pip install llama-cpp-python==0.2.27
abetlen commented 10 months ago
@MrJefter latest minor update only updated llama.cpp so unless something broke there nothing here should've changed.
MrJefter commented 10 months ago
@abetlen anyway, downgrade to 0.2.27 fixes this issue
PositivPy commented 10 months ago
We should invent/use a way to only download stable versions when doing pip install X rather than installing bleeding edge. Minor versions should be only for the repo's devs really as they tend to break.
m-from-space commented 10 months ago
The error message clearly points to a llama.cpp issue...
vendor/llama.cpp/ggml-cuda.cu(626): error: identifier "__hmax2" is undefined
abetlen / llama-cpp-python

Issue in installing llama-cpp-python. #1081