error running 'make instruct-nvidia`

I'm running the install as stated on the guide. Using RHEL9.2 and G3 instance in IBM Cloud with GPU, but when I run 'make instruct-nvidia' I get this error:

Building wheels for collected packages: llama_cpp_python Building wheel for llama_cpp_python (pyproject.toml): started Building wheel for llama_cpp_python (pyproject.toml): still running... Building wheel for llama_cpp_python (pyproject.toml): finished with status 'error' error: subprocess-exited-with-error

× Building wheel for llama_cpp_python (pyproject.toml) did not run successfully. │ exit code: 1 ╰─> [292 lines of output] scikit-build-core 0.9.4 using CMake 3.29.3 (wheel) Configuring CMake... 2024-05-27 17:52:09,122 - scikit_build_core - WARNING - Can't find a Python library, got libdir=/usr/lib64, ldlibrary=libpython3.11.so, multiarch=x86_64-linux-gnu, masd=None loading initial cache file /tmp/tmpqf4uxcth/build/CMakeInit.txt -- The C compiler identification is GNU 11.4.1 -- The CXX compiler identification is GNU 11.4.1 -- Detecting C compiler ABI info -- Detecting C compiler ABI info - done -- Check for working C compiler: /usr/bin/cc - skipped -- Detecting C compile features -- Detecting C compile features - done -- Detecting CXX compiler ABI info -- Detecting CXX compiler ABI info - done -- Check for working CXX compiler: /usr/bin/c++ - skipped -- Detecting CXX compile features -- Detecting CXX compile features - done -- Found Git: /usr/bin/git (found version "2.43.0") -- Performing Test CMAKE_HAVE_LIBC_PTHREAD -- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success -- Found Threads: TRUE CMake Warning at vendor/llama.cpp/CMakeLists.txt:390 (message): LLAMA_CUBLAS is deprecated and will be removed in the future.

    Use LLAMA_CUDA instead

  -- Found CUDAToolkit: /usr/local/cuda/targets/x86_64-linux/include (found version "12.3.107")
  -- CUDA found
  -- The CUDA compiler identification is NVIDIA 12.3.107
  -- Detecting CUDA compiler ABI info
  -- Detecting CUDA compiler ABI info - done
  -- Check for working CUDA compiler: /usr/local/cuda/bin/nvcc - skipped
  -- Detecting CUDA compile features
  -- Detecting CUDA compile features - done
  -- Using CUDA architectures: 52;61;70
  -- CUDA host compiler is GNU 11.4.1

  -- Warning: ccache not found - consider installing it for faster compilation or disable this warning with LLAMA_CCACHE=OFF
  -- CMAKE_SYSTEM_PROCESSOR: x86_64
  -- x86 detected
  CMake Warning (dev) at CMakeLists.txt:26 (install):
    Target llama has PUBLIC_HEADER files but no PUBLIC_HEADER DESTINATION.
  This warning is for project developers.  Use -Wno-dev to suppress it.

  CMake Warning (dev) at CMakeLists.txt:35 (install):
    Target llama has PUBLIC_HEADER files but no PUBLIC_HEADER DESTINATION.
  This warning is for project developers.  Use -Wno-dev to suppress it.

  -- Configuring done (24.3s)
  -- Generating done (0.0s)
  -- Build files have been written to: /tmp/tmpqf4uxcth/build
  *** Building project with Ninja...
  Change Dir: '/tmp/tmpqf4uxcth/build'

  Run Build Command(s): /tmp/pip-build-env-gyccnkm3/normal/lib64/python3.11/site-packages/ninja/data/bin/ninja -v
  [1/56] /usr/bin/cc -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -DGGML_SCHED_MAX_COPIES=4 -DGGML_USE_CUDA -DGGML_USE_LLAMAFILE -DK_Q

...

  [21/56] /usr/bin/cc -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -DGGML_SCHED_MAX_COPIES=4 -DGGML_USE_CUDA -DGGML_USE_LLAMAFILE -DK_QUANTS_PER_ITERATION=2 -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -I/tmp/pip-install-uojkipms/llama-cpp-python_33cccd461278400b9375678f6012915d/vendor/llama.cpp/. -isystem /usr/local/cuda/targets/x86_64-linux/include -mno-avx -O3 -DNDEBUG -std=gnu11 -fPIC -Wshadow -Wstrict-prototypes -Wpointer-arith -Wmissing-prototypes -Werror=implicit-int -Werror=implicit-function-declaration -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wdouble-promotion -march=native -MD -MT vendor/llama.cpp/CMakeFiles/ggml.dir/ggml.c.o -MF vendor/llama.cpp/CMakeFiles/ggml.dir/ggml.c.o.d -o vendor/llama.cpp/CMakeFiles/ggml.dir/ggml.c.o -c /tmp/pip-install-uojkipms/llama-cpp-python_33cccd461278400b9375678f6012915d/vendor/llama.cpp/ggml.c
  /tmp/pip-install-uojkipms/llama-cpp-python_33cccd461278400b9375678f6012915d/vendor/llama.cpp/ggml.c: In function ‘ggml_vec_mad_f16’:
  /tmp/pip-install-uojkipms/llama-cpp-python_33cccd461278400b9375678f6012915d/vendor/llama.cpp/ggml.c:1868:45: warning: passing argument 1 of ‘__sse_f16x4_load’ discards ‘const’ qualifier from pointer target type [-Wdiscarded-qualifiers]
   1868 |             ax[j] = GGML_F16_VEC_LOAD(x + i + j*GGML_F16_EPR, j);
        |                                             ^
  /tmp/pip-install-uojkipms/llama-cpp-python_33cccd461278400b9375678f6012915d/vendor/llama.cpp/ggml.c:1518:50: note: in definition of macro ‘GGML_F32Cx4_LOAD’
   1518 | #define GGML_F32Cx4_LOAD(x)     __sse_f16x4_load(x)
        |                                                  ^
  /tmp/pip-install-uojkipms/llama-cpp-python_33cccd461278400b9375678f6012915d/vendor/llama.cpp/ggml.c:1868:21: note: in expansion of macro ‘GGML_F16_VEC_LOAD’
   1868 |             ax[j] = GGML_F16_VEC_LOAD(x + i + j*GGML_F16_EPR, j);
        |                     ^~~~~~~~~~~~~~~~~
  /tmp/pip-install-uojkipms/llama-cpp-python_33cccd461278400b9375678f6012915d/vendor/llama.cpp/ggml.c:1493:52: note: expected ‘ggml_fp16_t *’ {aka ‘short unsigned int *’} but argument is of type ‘const ggml_fp16_t *’ {aka ‘const short unsigned int *’}
   1493 | static inline __m128 __sse_f16x4_load(ggml_fp16_t *x) {
        |                                       ~~~~~~~~~~~~~^
  [22/56] /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -DGGML_SCHED_MAX_COPIES=4 -DGGML_USE_CUDA -DGGML_USE_LLAMAFILE -DK_QUANTS_PER_ITERATION=2 -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -I/tmp/pip-install-uojkipms/llama-cpp-python_33cccd461278400b9375678f6012915d/vendor/llama.cpp/. -isystem /usr/local/cuda/targets/x86_64-linux/include -O3 -DNDEBUG -std=c++11 "--generate-code=arch=compute_52,code=[compute_52,sm_52]" "--generate-code=arch=compute_61,code=[compute_61,sm_61]" "--generate-code=arch=compute_70,code=[compute_70,sm_70]" -Xcompiler=-fPIC -use_fast_math -Xcompiler "-Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Wno-pedantic -march=native" -MD -MT vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-cuda/scale.cu.o -MF vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-cuda/scale.cu.o.d -x cu -c /tmp/pip-install-uojkipms/llama-cpp-python_33cccd461278400b9375678f6012915d/vendor/llama.cpp/ggml-cuda/scale.cu -o vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-cuda/scale.cu.o
  [23/56] /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -DGGML_SCHED_MAX_COPIES=4 -DGGML_USE_CUDA -DGGML_USE_LLAMAFILE -DK_QUANTS_PER_ITERATION=2 -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -I/tmp/pip-install-uojkipms/llama-cpp-python_33cccd461278400b9375678f6012915d/vendor/llama.cpp/. -isystem /usr/local/cuda/targets/x86_64-linux/include -O3 -DNDEBUG -std=c++11 "--generate-code=arch=compute_52,code=[compute_52,sm_52]" "--generate-code=arch=compute_61,code=[compute_61,sm_61]" "--generate-code=arch=compute_70,code=[compute_70,sm_70]" -Xcompiler=-fPIC -use_fast_math -Xcompiler "-Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Wno-pedantic -march=native" -MD -MT vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-cuda/sumrows.cu.o -MF vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-cuda/sumrows.cu.o.d -x cu -c /tmp/pip-install-uojkipms/llama-cpp-python_33cccd461278400b9375678f6012915d/vendor/llama.cpp/ggml-cuda/sumrows.cu -o vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-cuda/sumrows.cu.o
  [24/56] cd /tmp/pip-install-uojkipms/llama-cpp-python_33cccd461278400b9375678f6012915d/vendor/llama.cpp && /tmp/pip-build-env-gyccnkm3/normal/lib64/python3.11/site-packages/cmake/data/bin/cmake -DMSVC= -DCMAKE_C_COMPILER_VERSION=11.4.1 -DCMAKE_C_COMPILER_ID=GNU -DCMAKE_VS_PLATFORM_NAME= -DCMAKE_C_COMPILER=/usr/bin/cc -P /tmp/pip-install-uojkipms/llama-cpp-python_33cccd461278400b9375678f6012915d/vendor/llama.cpp/common/../scripts/gen-build-info-cpp.cmake
  -- Found Git: /usr/bin/git (found version "2.43.0")
  [25/56] /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -DGGML_SCHED_MAX_COPIES=4 -DGGML_USE_CUDA -DGGML_USE_LLAMAFILE -DK_QUANTS_PER_ITERATION=2 -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -I/tmp/pip-install-uojkipms/llama-cpp-python_33cccd461278400b9375678f6012915d/vendor/llama.cpp/. -isystem /usr/local/cuda/targets/x86_64-linux/include -O3 -DNDEBUG -std=c++11 "--generate-code=arch=compute_52,code=[compute_52,sm_52]" "--generate-code=arch=compute_61,code=[compute_61,sm_61]" "--generate-code=arch=compute_70,code=[compute_70,sm_70]" -Xcompiler=-fPIC -use_fast_math -Xcompiler "-Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Wno-pedantic -march=native" -MD -MT vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-cuda/tsembd.cu.o -MF vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-cuda/tsembd.cu.o.d -x cu -c /tmp/pip-install-uojkipms/llama-cpp-python_33cccd461278400b9375678f6012915d/vendor/llama.cpp/ggml-cuda/tsembd.cu -o vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-cuda/tsembd.cu.o
  [26/56] /usr/bin/c++ -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -DGGML_SCHED_MAX_COPIES=4 -DGGML_USE_CUDA -DGGML_USE_LLAMAFILE -DK_QUANTS_PER_ITERATION=2 -D_GNU_SOURCE -D_XOPEN_SOURCE=600  -O3 -DNDEBUG -std=gnu++11 -fPIC -Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wno-format-truncation -Wextra-semi -march=native -MD -MT vendor/llama.cpp/common/CMakeFiles/build_info.dir/build-info.cpp.o -MF vendor/llama.cpp/common/CMakeFiles/build_info.dir/build-info.cpp.o.d -o vendor/llama.cpp/common/CMakeFiles/build_info.dir/build-info.cpp.o -c /tmp/pip-install-uojkipms/llama-cpp-python_33cccd461278400b9375678f6012915d/vendor/llama.cpp/common/build-info.cpp
  [27/56] /usr/bin/c++ -DGGML_USE_CUBLAS -DGGML_USE_CUDA -DLLAMA_BUILD -DLLAMA_SHARED -I/tmp/pip-install-uojkipms/llama-cpp-python_33cccd461278400b9375678f6012915d/vendor/llama.cpp/examples/llava/. -I/tmp/pip-install-uojkipms/llama-cpp-python_33cccd461278400b9375678f6012915d/vendor/llama.cpp/examples/llava/../.. -I/tmp/pip-install-uojkipms/llama-cpp-python_33cccd461278400b9375678f6012915d/vendor/llama.cpp/examples/llava/../../common -I/tmp/pip-install-uojkipms/llama-cpp-python_33cccd461278400b9375678f6012915d/vendor/llama.cpp/. -isystem /usr/local/cuda/targets/x86_64-linux/include -O3 -DNDEBUG -fPIC -Wno-cast-qual -MD -MT vendor/llama.cpp/examples/llava/CMakeFiles/llava.dir/llava.cpp.o -MF vendor/llama.cpp/examples/llava/CMakeFiles/llava.dir/llava.cpp.o.d -o vendor/llama.cpp/examples/llava/CMakeFiles/llava.dir/llava.cpp.o -c /tmp/pip-install-uojkipms/llama-cpp-python_33cccd461278400b9375678f6012915d/vendor/llama.cpp/examples/llava/llava.cpp
  [28/56] /usr/bin/c++ -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -DGGML_SCHED_MAX_COPIES=4 -DGGML_USE_CUDA -DGGML_USE_LLAMAFILE -DK_QUANTS_PER_ITERATION=2 -DLLAMA_BUILD -DLLAMA_SHARED -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dllama_EXPORTS -I/tmp/pip-install-uojkipms/llama-cpp-python_33cccd461278400b9375678f6012915d/vendor/llama.cpp/. -isystem /usr/local/cuda/targets/x86_64-linux/include -O3 -DNDEBUG -std=gnu++11 -fPIC -Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wno-format-truncation -Wextra-semi -march=native -MD -MT vendor/llama.cpp/CMakeFiles/llama.dir/unicode-data.cpp.o -MF vendor/llama.cpp/CMakeFiles/llama.dir/unicode-data.cpp.o.d -o vendor/llama.cpp/CMakeFiles/llama.dir/unicode-data.cpp.o -c /tmp/pip-install-uojkipms/llama-cpp-python_33cccd461278400b9375678f6012915d/vendor/llama.cpp/unicode-data.cpp
  [29/56] /usr/bin/c++ -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -DGGML_SCHED_MAX_COPIES=4 -DGGML_USE_CUDA -DGGML_USE_LLAMAFILE -DK_QUANTS_PER_ITERATION=2 -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -I/tmp/pip-install-uojkipms/llama-cpp-python_33cccd461278400b9375678f6012915d/vendor/llama.cpp/common/. -I/tmp/pip-install-uojkipms/llama-cpp-python_33cccd461278400b9375678f6012915d/vendor/llama.cpp/. -O3 -DNDEBUG -std=gnu++11 -fPIC -Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wno-format-truncation -Wextra-semi -march=native -MD -MT vendor/llama.cpp/common/CMakeFiles/common.dir/console.cpp.o -MF vendor/llama.cpp/common/CMakeFiles/common.dir/console.cpp.o.d -o vendor/llama.cpp/common/CMakeFiles/common.dir/console.cpp.o -c /tmp/pip-install-uojkipms/llama-cpp-python_33cccd461278400b9375678f6012915d/vendor/llama.cpp/common/console.cpp
  [30/56] /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -DGGML_SCHED_MAX_COPIES=4 -DGGML_USE_CUDA -DGGML_USE_LLAMAFILE -DK_QUANTS_PER_ITERATION=2 -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -I/tmp/pip-install-uojkipms/llama-cpp-python_33cccd461278400b9375678f6012915d/vendor/llama.cpp/. -isystem /usr/local/cuda/targets/x86_64-linux/include -O3 -DNDEBUG -std=c++11 "--generate-code=arch=compute_52,code=[compute_52,sm_52]" "--generate-code=arch=compute_61,code=[compute_61,sm_61]" "--generate-code=arch=compute_70,code=[compute_70,sm_70]" -Xcompiler=-fPIC -use_fast_math -Xcompiler "-Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Wno-pedantic -march=native" -MD -MT vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-cuda/softmax.cu.o -MF vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-cuda/softmax.cu.o.d -x cu -c /tmp/pip-install-uojkipms/llama-cpp-python_33cccd461278400b9375678f6012915d/vendor/llama.cpp/ggml-cuda/softmax.cu -o vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-cuda/softmax.cu.o
  [31/56] /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -DGGML_SCHED_MAX_COPIES=4 -DGGML_USE_CUDA -DGGML_USE_LLAMAFILE -DK_QUANTS_PER_ITERATION=2 -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -I/tmp/pip-install-uojkipms/llama-cpp-python_33cccd461278400b9375678f6012915d/vendor/llama.cpp/. -isystem /usr/local/cuda/targets/x86_64-linux/include -O3 -DNDEBUG -std=c++11 "--generate-code=arch=compute_52,code=[compute_52,sm_52]" "--generate-code=arch=compute_61,code=[compute_61,sm_61]" "--generate-code=arch=compute_70,code=[compute_70,sm_70]" -Xcompiler=-fPIC -use_fast_math -Xcompiler "-Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Wno-pedantic -march=native" -MD -MT vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-cuda/upscale.cu.o -MF vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-cuda/upscale.cu.o.d -x cu -c /tmp/pip-install-uojkipms/llama-cpp-python_33cccd461278400b9375678f6012915d/vendor/llama.cpp/ggml-cuda/upscale.cu -o vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-cuda/upscale.cu.o
  [32/56] /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -DGGML_SCHED_MAX_COPIES=4 -DGGML_USE_CUDA -DGGML_USE_LLAMAFILE -DK_QUANTS_PER_ITERATION=2 -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -I/tmp/pip-install-uojkipms/llama-cpp-python_33cccd461278400b9375678f6012915d/vendor/llama.cpp/. -isystem /usr/local/cuda/targets/x86_64-linux/include -O3 -DNDEBUG -std=c++11 "--generate-code=arch=compute_52,code=[compute_52,sm_52]" "--generate-code=arch=compute_61,code=[compute_61,sm_61]" "--generate-code=arch=compute_70,code=[compute_70,sm_70]" -Xcompiler=-fPIC -use_fast_math -Xcompiler "-Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Wno-pedantic -march=native" -MD -MT vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-cuda/unary.cu.o -MF vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-cuda/unary.cu.o.d -x cu -c /tmp/pip-install-uojkipms/llama-cpp-python_33cccd461278400b9375678f6012915d/vendor/llama.cpp/ggml-cuda/unary.cu -o vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-cuda/unary.cu.o
  [33/56] /usr/bin/c++ -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -DGGML_SCHED_MAX_COPIES=4 -DGGML_USE_CUDA -DGGML_USE_LLAMAFILE -DK_QUANTS_PER_ITERATION=2 -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -I/tmp/pip-install-uojkipms/llama-cpp-python_33cccd461278400b9375678f6012915d/vendor/llama.cpp/common/. -I/tmp/pip-install-uojkipms/llama-cpp-python_33cccd461278400b9375678f6012915d/vendor/llama.cpp/. -O3 -DNDEBUG -std=gnu++11 -fPIC -Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wno-format-truncation -Wextra-semi -march=native -MD -MT vendor/llama.cpp/common/CMakeFiles/common.dir/sampling.cpp.o -MF vendor/llama.cpp/common/CMakeFiles/common.dir/sampling.cpp.o.d -o vendor/llama.cpp/common/CMakeFiles/common.dir/sampling.cpp.o -c /tmp/pip-install-uojkipms/llama-cpp-python_33cccd461278400b9375678f6012915d/vendor/llama.cpp/common/sampling.cpp
  [34/56] /usr/bin/c++ -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -DGGML_SCHED_MAX_COPIES=4 -DGGML_USE_CUDA -DGGML_USE_LLAMAFILE -DK_QUANTS_PER_ITERATION=2 -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -I/tmp/pip-install-uojkipms/llama-cpp-python_33cccd461278400b9375678f6012915d/vendor/llama.cpp/common/. -I/tmp/pip-install-uojkipms/llama-cpp-python_33cccd461278400b9375678f6012915d/vendor/llama.cpp/. -O3 -DNDEBUG -std=gnu++11 -fPIC -Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wno-format-truncation -Wextra-semi -march=native -MD -MT vendor/llama.cpp/common/CMakeFiles/common.dir/grammar-parser.cpp.o -MF vendor/llama.cpp/common/CMakeFiles/common.dir/grammar-parser.cpp.o.d -o vendor/llama.cpp/common/CMakeFiles/common.dir/grammar-parser.cpp.o -c /tmp/pip-install-uojkipms/llama-cpp-python_33cccd461278400b9375678f6012915d/vendor/llama.cpp/common/grammar-parser.cpp
  [35/56] /usr/bin/c++ -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -DGGML_SCHED_MAX_COPIES=4 -DGGML_USE_CUDA -DGGML_USE_LLAMAFILE -DK_QUANTS_PER_ITERATION=2 -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -I/tmp/pip-install-uojkipms/llama-cpp-python_33cccd461278400b9375678f6012915d/vendor/llama.cpp/. -isystem /usr/local/cuda/targets/x86_64-linux/include -O3 -DNDEBUG -std=gnu++11 -fPIC -Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wno-format-truncation -Wextra-semi -march=native -MD -MT vendor/llama.cpp/CMakeFiles/ggml.dir/sgemm.cpp.o -MF vendor/llama.cpp/CMakeFiles/ggml.dir/sgemm.cpp.o.d -o vendor/llama.cpp/CMakeFiles/ggml.dir/sgemm.cpp.o -c /tmp/pip-install-uojkipms/llama-cpp-python_33cccd461278400b9375678f6012915d/vendor/llama.cpp/sgemm.cpp
  FAILED: vendor/llama.cpp/CMakeFiles/ggml.dir/sgemm.cpp.o
  /usr/bin/c++ -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -DGGML_SCHED_MAX_COPIES=4 -DGGML_USE_CUDA -DGGML_USE_LLAMAFILE -DK_QUANTS_PER_ITERATION=2 -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -I/tmp/pip-install-uojkipms/llama-cpp-python_33cccd461278400b9375678f6012915d/vendor/llama.cpp/. -isystem /usr/local/cuda/targets/x86_64-linux/include -O3 -DNDEBUG -std=gnu++11 -fPIC -Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wno-format-truncation -Wextra-semi -march=native -MD -MT vendor/llama.cpp/CMakeFiles/ggml.dir/sgemm.cpp.o -MF vendor/llama.cpp/CMakeFiles/ggml.dir/sgemm.cpp.o.d -o vendor/llama.cpp/CMakeFiles/ggml.dir/sgemm.cpp.o -c /tmp/pip-install-uojkipms/llama-cpp-python_33cccd461278400b9375678f6012915d/vendor/llama.cpp/sgemm.cpp
  /tmp/ccf8kgSy.s: Assembler messages:
  /tmp/ccf8kgSy.s:14392: Error: unsupported instruction `vpdpbusd'
  /tmp/ccf8kgSy.s:14403: Error: unsupported instruction `vpdpbusd'
  /tmp/ccf8kgSy.s:14414: Error: unsupported instruction `vpdpbusd'
  /tmp/ccf8kgSy.s:14425: Error: unsupported instruction `vpdpbusd'
  /tmp/ccf8kgSy.s:14496: Error: unsupported instruction `vpdpbusd'
  /tmp/ccf8kgSy.s:14507: Error: unsupported instruction `vpdpbusd'
  /tmp/ccf8kgSy.s:14516: Error: unsupported instruction `vpdpbusd'
  /tmp/ccf8kgSy.s:14522: Error: unsupported instruction `vpdpbusd'
  /tmp/ccf8kgSy.s:14834: Error: unsupported instruction `vpdpbusd'
  /tmp/ccf8kgSy.s:14846: Error: unsupported instruction `vpdpbusd'
  /tmp/ccf8kgSy.s:15210: Error: unsupported instruction `vpdpbusd'
  /tmp/ccf8kgSy.s:15589: Error: unsupported instruction `vpdpbusd'
  /tmp/ccf8kgSy.s:15905: Error: unsupported instruction `vpdpbusd'
  /tmp/ccf8kgSy.s:15917: Error: unsupported instruction `vpdpbusd'
  /tmp/ccf8kgSy.s:15978: Error: unsupported instruction `vpdpbusd'
  /tmp/ccf8kgSy.s:16196: Error: unsupported instruction `vpdpbusd'
  /tmp/ccf8kgSy.s:16267: Error: unsupported instruction `vpdpbusd'
  /tmp/ccf8kgSy.s:16490: Error: unsupported instruction `vpdpbusd'
  /tmp/ccf8kgSy.s:16514: Error: unsupported instruction `vpdpbusd'
  /tmp/ccf8kgSy.s:16542: Error: unsupported instruction `vpdpbusd'
  /tmp/ccf8kgSy.s:16742: Error: unsupported instruction `vpdpbusd'
  /tmp/ccf8kgSy.s:16756: Error: unsupported instruction `vpdpbusd'
  /tmp/ccf8kgSy.s:16782: Error: unsupported instruction `vpdpbusd'
  /tmp/ccf8kgSy.s:16792: Error: unsupported instruction `vpdpbusd'
  /tmp/ccf8kgSy.s:16978: Error: unsupported instruction `vpdpbusd'
  /tmp/ccf8kgSy.s:16990: Error: unsupported instruction `vpdpbusd'
  /tmp/ccf8kgSy.s:17000: Error: unsupported instruction `vpdpbusd'
  /tmp/ccf8kgSy.s:17011: Error: unsupported instruction `vpdpbusd'
  /tmp/ccf8kgSy.s:17179: Error: unsupported instruction `vpdpbusd'
  /tmp/ccf8kgSy.s:17196: Error: unsupported instruction `vpdpbusd'
  /tmp/ccf8kgSy.s:17206: Error: unsupported instruction `vpdpbusd'
  /tmp/ccf8kgSy.s:17215: Error: unsupported instruction `vpdpbusd'
  /tmp/ccf8kgSy.s:17404: Error: unsupported instruction `vpdpbusd'
  /tmp/ccf8kgSy.s:17418: Error: unsupported instruction `vpdpbusd'
  /tmp/ccf8kgSy.s:17430: Error: unsupported instruction `vpdpbusd'
  /tmp/ccf8kgSy.s:17441: Error: unsupported instruction `vpdpbusd'
  /tmp/ccf8kgSy.s:17450: Error: unsupported instruction `vpdpbusd'
  /tmp/ccf8kgSy.s:17459: Error: unsupported instruction `vpdpbusd'
  /tmp/ccf8kgSy.s:17471: Error: unsupported instruction `vpdpbusd'
  /tmp/ccf8kgSy.s:17477: Error: unsupported instruction `vpdpbusd'
  /tmp/ccf8kgSy.s:17526: Error: unsupported instruction `vpdpbusd'
  /tmp/ccf8kgSy.s:17538: Error: unsupported instruction `vpdpbusd'
  /tmp/ccf8kgSy.s:17550: Error: unsupported instruction `vpdpbusd'
  /tmp/ccf8kgSy.s:17561: Error: unsupported instruction `vpdpbusd'
  /tmp/ccf8kgSy.s:17698: Error: unsupported instruction `vpdpbusd'
  /tmp/ccf8kgSy.s:17712: Error: unsupported instruction `vpdpbusd'
  /tmp/ccf8kgSy.s:17724: Error: unsupported instruction `vpdpbusd'
  /tmp/ccf8kgSy.s:17734: Error: unsupported instruction `vpdpbusd'
  /tmp/ccf8kgSy.s:17743: Error: unsupported instruction `vpdpbusd'
  /tmp/ccf8kgSy.s:17752: Error: unsupported instruction `vpdpbusd'
  /tmp/ccf8kgSy.s:17764: Error: unsupported instruction `vpdpbusd'
  /tmp/ccf8kgSy.s:17770: Error: unsupported instruction `vpdpbusd'
  /tmp/ccf8kgSy.s:17820: Error: unsupported instruction `vpdpbusd'
  /tmp/ccf8kgSy.s:17832: Error: unsupported instruction `vpdpbusd'
  /tmp/ccf8kgSy.s:17844: Error: unsupported instruction `vpdpbusd'
  /tmp/ccf8kgSy.s:17855: Error: unsupported instruction `vpdpbusd'
  /tmp/ccf8kgSy.s:18022: Error: unsupported instruction `vpdpbusd'
  /tmp/ccf8kgSy.s:18037: Error: unsupported instruction `vpdpbusd'
  /tmp/ccf8kgSy.s:18048: Error: unsupported instruction `vpdpbusd'
  /tmp/ccf8kgSy.s:18223: Error: unsupported instruction `vpdpbusd'
  /tmp/ccf8kgSy.s:18237: Error: unsupported instruction `vpdpbusd'
  /tmp/ccf8kgSy.s:18249: Error: unsupported instruction `vpdpbusd'
  /tmp/ccf8kgSy.s:18258: Error: unsupported instruction `vpdpbusd'
  /tmp/ccf8kgSy.s:18270: Error: unsupported instruction `vpdpbusd'
  /tmp/ccf8kgSy.s:18276: Error: unsupported instruction `vpdpbusd'
  /tmp/ccf8kgSy.s:18320: Error: unsupported instruction `vpdpbusd'
  /tmp/ccf8kgSy.s:18332: Error: unsupported instruction `vpdpbusd'
  /tmp/ccf8kgSy.s:18345: Error: unsupported instruction `vpdpbusd'
  /tmp/ccf8kgSy.s:18461: Error: unsupported instruction `vpdpbusd'
  /tmp/ccf8kgSy.s:18475: Error: unsupported instruction `vpdpbusd'
  /tmp/ccf8kgSy.s:18487: Error: unsupported instruction `vpdpbusd'
  /tmp/ccf8kgSy.s:18497: Error: unsupported instruction `vpdpbusd'
  /tmp/ccf8kgSy.s:18509: Error: unsupported instruction `vpdpbusd'
  /tmp/ccf8kgSy.s:18515: Error: unsupported instruction `vpdpbusd'
  /tmp/ccf8kgSy.s:18558: Error: unsupported instruction `vpdpbusd'
  /tmp/ccf8kgSy.s:18570: Error: unsupported instruction `vpdpbusd'
  /tmp/ccf8kgSy.s:18583: Error: unsupported instruction `vpdpbusd'
  /tmp/ccf8kgSy.s:18727: Error: unsupported instruction `vpdpbusd'
  /tmp/ccf8kgSy.s:18742: Error: unsupported instruction `vpdpbusd'
  /tmp/ccf8kgSy.s:18895: Error: unsupported instruction `vpdpbusd'
  /tmp/ccf8kgSy.s:18910: Error: unsupported instruction `vpdpbusd'
  /tmp/ccf8kgSy.s:18923: Error: unsupported instruction `vpdpbusd'
  /tmp/ccf8kgSy.s:18932: Error: unsupported instruction `vpdpbusd'
  /tmp/ccf8kgSy.s:18967: Error: unsupported instruction `vpdpbusd'
  /tmp/ccf8kgSy.s:18980: Error: unsupported instruction `vpdpbusd'
  /tmp/ccf8kgSy.s:19081: Error: unsupported instruction `vpdpbusd'
  /tmp/ccf8kgSy.s:19096: Error: unsupported instruction `vpdpbusd'
  /tmp/ccf8kgSy.s:19109: Error: unsupported instruction `vpdpbusd'
  /tmp/ccf8kgSy.s:19116: Error: unsupported instruction `vpdpbusd'
  /tmp/ccf8kgSy.s:19152: Error: unsupported instruction `vpdpbusd'
  /tmp/ccf8kgSy.s:19165: Error: unsupported instruction `vpdpbusd'
  /tmp/ccf8kgSy.s:19291: Error: unsupported instruction `vpdpbusd'
  /tmp/ccf8kgSy.s:19486: Error: unsupported instruction `vpdpbusd'
  /tmp/ccf8kgSy.s:19503: Error: unsupported instruction `vpdpbusd'
  /tmp/ccf8kgSy.s:19601: Error: unsupported instruction `vpdpbusd'
  /tmp/ccf8kgSy.s:19612: Error: unsupported instruction `vpdpbusd'
  /tmp/ccf8kgSy.s:19620: Error: unsupported instruction `vpdpbusd'
  /tmp/ccf8kgSy.s:19626: Error: unsupported instruction `vpdpbusd'
  /tmp/ccf8kgSy.s:19948: Error: unsupported instruction `vpdpbusd'
  /tmp/ccf8kgSy.s:19966: Error: unsupported instruction `vpdpbusd'
  /tmp/ccf8kgSy.s:19982: Error: unsupported instruction `vpdpbusd'
  /tmp/ccf8kgSy.s:20054: Error: unsupported instruction `vpdpbusd'
  /tmp/ccf8kgSy.s:20062: Error: unsupported instruction `vpdpbusd'
  /tmp/ccf8kgSy.s:20067: Error: unsupported instruction `vpdpbusd'
  /tmp/ccf8kgSy.s:20348: Error: unsupported instruction `vpdpbusd'
  /tmp/ccf8kgSy.s:20756: Error: unsupported instruction `vpdpbusd'
  /tmp/ccf8kgSy.s:21099: Error: unsupported instruction `vpdpbusd'
  /tmp/ccf8kgSy.s:21152: Error: unsupported instruction `vpdpbusd'
  /tmp/ccf8kgSy.s:21494: Error: unsupported instruction `vpdpbusd'
  /tmp/ccf8kgSy.s:21731: Error: unsupported instruction `vpdpbusd'
  /tmp/ccf8kgSy.s:21766: Error: unsupported instruction `vpdpbusd'
  /tmp/ccf8kgSy.s:21797: Error: unsupported instruction `vpdpbusd'
  /tmp/ccf8kgSy.s:22267: Error: unsupported instruction `vpdpbusd'
  /tmp/ccf8kgSy.s:22288: Error: unsupported instruction `vpdpbusd'
  /tmp/ccf8kgSy.s:22302: Error: unsupported instruction `vpdpbusd'
  /tmp/ccf8kgSy.s:22316: Error: unsupported instruction `vpdpbusd'
  /tmp/ccf8kgSy.s:22495: Error: unsupported instruction `vpdpbusd'
  /tmp/ccf8kgSy.s:22515: Error: unsupported instruction `vpdpbusd'
  /tmp/ccf8kgSy.s:22527: Error: unsupported instruction `vpdpbusd'
  /tmp/ccf8kgSy.s:22537: Error: unsupported instruction `vpdpbusd'
  /tmp/ccf8kgSy.s:22734: Error: unsupported instruction `vpdpbusd'
  /tmp/ccf8kgSy.s:22808: Error: unsupported instruction `vpdpbusd'
  /tmp/ccf8kgSy.s:22814: Error: unsupported instruction `vpdpbusd'
  /tmp/ccf8kgSy.s:22868: Error: unsupported instruction `vpdpbusd'
  /tmp/ccf8kgSy.s:22880: Error: unsupported instruction `vpdpbusd'
  /tmp/ccf8kgSy.s:22892: Error: unsupported instruction `vpdpbusd'
  /tmp/ccf8kgSy.s:22903: Error: unsupported instruction `vpdpbusd'
  /tmp/ccf8kgSy.s:23051: Error: unsupported instruction `vpdpbusd'
  /tmp/ccf8kgSy.s:23125: Error: unsupported instruction `vpdpbusd'
  /tmp/ccf8kgSy.s:23131: Error: unsupported instruction `vpdpbusd'
  /tmp/ccf8kgSy.s:23186: Error: unsupported instruction `vpdpbusd'
  /tmp/ccf8kgSy.s:23198: Error: unsupported instruction `vpdpbusd'
  /tmp/ccf8kgSy.s:23210: Error: unsupported instruction `vpdpbusd'
  /tmp/ccf8kgSy.s:23221: Error: unsupported instruction `vpdpbusd'
  /tmp/ccf8kgSy.s:23398: Error: unsupported instruction `vpdpbusd'
  /tmp/ccf8kgSy.s:23419: Error: unsupported instruction `vpdpbusd'
  /tmp/ccf8kgSy.s:23433: Error: unsupported instruction `vpdpbusd'
  /tmp/ccf8kgSy.s:23616: Error: unsupported instruction `vpdpbusd'
  /tmp/ccf8kgSy.s:23669: Error: unsupported instruction `vpdpbusd'
  /tmp/ccf8kgSy.s:23675: Error: unsupported instruction `vpdpbusd'
  /tmp/ccf8kgSy.s:23724: Error: unsupported instruction `vpdpbusd'
  /tmp/ccf8kgSy.s:23736: Error: unsupported instruction `vpdpbusd'
  /tmp/ccf8kgSy.s:23749: Error: unsupported instruction `vpdpbusd'
  /tmp/ccf8kgSy.s:23874: Error: unsupported instruction `vpdpbusd'
  /tmp/ccf8kgSy.s:23893: Error: unsupported instruction `vpdpbusd'
  /tmp/ccf8kgSy.s:23905: Error: unsupported instruction `vpdpbusd'
  /tmp/ccf8kgSy.s:23914: Error: unsupported instruction `vpdpbusd'
  /tmp/ccf8kgSy.s:23926: Error: unsupported instruction `vpdpbusd'
  /tmp/ccf8kgSy.s:23932: Error: unsupported instruction `vpdpbusd'
  /tmp/ccf8kgSy.s:23980: Error: unsupported instruction `vpdpbusd'
  /tmp/ccf8kgSy.s:23992: Error: unsupported instruction `vpdpbusd'
  /tmp/ccf8kgSy.s:24005: Error: unsupported instruction `vpdpbusd'
  /tmp/ccf8kgSy.s:24160: Error: unsupported instruction `vpdpbusd'
  /tmp/ccf8kgSy.s:24181: Error: unsupported instruction `vpdpbusd'
  /tmp/ccf8kgSy.s:24344: Error: unsupported instruction `vpdpbusd'
  /tmp/ccf8kgSy.s:24363: Error: unsupported instruction `vpdpbusd'
  /tmp/ccf8kgSy.s:24374: Error: unsupported instruction `vpdpbusd'
  /tmp/ccf8kgSy.s:24384: Error: unsupported instruction `vpdpbusd'
  /tmp/ccf8kgSy.s:24423: Error: unsupported instruction `vpdpbusd'
  /tmp/ccf8kgSy.s:24436: Error: unsupported instruction `vpdpbusd'
  /tmp/ccf8kgSy.s:24544: Error: unsupported instruction `vpdpbusd'
  /tmp/ccf8kgSy.s:24563: Error: unsupported instruction `vpdpbusd'
  /tmp/ccf8kgSy.s:24574: Error: unsupported instruction `vpdpbusd'
  /tmp/ccf8kgSy.s:24583: Error: unsupported instruction `vpdpbusd'
  /tmp/ccf8kgSy.s:24623: Error: unsupported instruction `vpdpbusd'
  /tmp/ccf8kgSy.s:24636: Error: unsupported instruction `vpdpbusd'
  /tmp/ccf8kgSy.s:24770: Error: unsupported instruction `vpdpbusd'
  [36/56] /usr/bin/c++ -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -DGGML_SCHED_MAX_COPIES=4 -DGGML_USE_CUDA -DGGML_USE_LLAMAFILE -DK_QUANTS_PER_ITERATION=2 -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -I/tmp/pip-install-uojkipms/llama-cpp-python_33cccd461278400b9375678f6012915d/vendor/llama.cpp/common/. -I/tmp/pip-install-uojkipms/llama-cpp-python_33cccd461278400b9375678f6012915d/vendor/llama.cpp/. -O3 -DNDEBUG -std=gnu++11 -fPIC -Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wno-format-truncation -Wextra-semi -march=native -MD -MT vendor/llama.cpp/common/CMakeFiles/common.dir/ngram-cache.cpp.o -MF vendor/llama.cpp/common/CMakeFiles/common.dir/ngram-cache.cpp.o.d -o vendor/llama.cpp/common/CMakeFiles/common.dir/ngram-cache.cpp.o -c /tmp/pip-install-uojkipms/llama-cpp-python_33cccd461278400b9375678f6012915d/vendor/llama.cpp/common/ngram-cache.cpp
  [37/56] /usr/bin/c++ -DGGML_USE_CUBLAS -DGGML_USE_CUDA -I/tmp/pip-install-uojkipms/llama-cpp-python_33cccd461278400b9375678f6012915d/vendor/llama.cpp/common/. -I/tmp/pip-install-uojkipms/llama-cpp-python_33cccd461278400b9375678f6012915d/vendor/llama.cpp/. -I/tmp/pip-install-uojkipms/llama-cpp-python_33cccd461278400b9375678f6012915d/vendor/llama.cpp/examples/llava/. -I/tmp/pip-install-uojkipms/llama-cpp-python_33cccd461278400b9375678f6012915d/vendor/llama.cpp/examples/llava/../.. -I/tmp/pip-install-uojkipms/llama-cpp-python_33cccd461278400b9375678f6012915d/vendor/llama.cpp/examples/llava/../../common -O3 -DNDEBUG -MD -MT vendor/llama.cpp/examples/llava/CMakeFiles/llava-cli.dir/llava-cli.cpp.o -MF vendor/llama.cpp/examples/llava/CMakeFiles/llava-cli.dir/llava-cli.cpp.o.d -o vendor/llama.cpp/examples/llava/CMakeFiles/llava-cli.dir/llava-cli.cpp.o -c /tmp/pip-install-uojkipms/llama-cpp-python_33cccd461278400b9375678f6012915d/vendor/llama.cpp/examples/llava/llava-cli.cpp
  [38/56] /usr/bin/c++ -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -DGGML_SCHED_MAX_COPIES=4 -DGGML_USE_CUDA -DGGML_USE_LLAMAFILE -DK_QUANTS_PER_ITERATION=2 -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -I/tmp/pip-install-uojkipms/llama-cpp-python_33cccd461278400b9375678f6012915d/vendor/llama.cpp/common/. -I/tmp/pip-install-uojkipms/llama-cpp-python_33cccd461278400b9375678f6012915d/vendor/llama.cpp/. -O3 -DNDEBUG -std=gnu++11 -fPIC -Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wno-format-truncation -Wextra-semi -march=native -MD -MT vendor/llama.cpp/common/CMakeFiles/common.dir/train.cpp.o -MF vendor/llama.cpp/common/CMakeFiles/common.dir/train.cpp.o.d -o vendor/llama.cpp/common/CMakeFiles/common.dir/train.cpp.o -c /tmp/pip-install-uojkipms/llama-cpp-python_33cccd461278400b9375678f6012915d/vendor/llama.cpp/common/train.cpp
  [39/56] /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -DGGML_SCHED_MAX_COPIES=4 -DGGML_USE_CUDA -DGGML_USE_LLAMAFILE -DK_QUANTS_PER_ITERATION=2 -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -I/tmp/pip-install-uojkipms/llama-cpp-python_33cccd461278400b9375678f6012915d/vendor/llama.cpp/. -isystem /usr/local/cuda/targets/x86_64-linux/include -O3 -DNDEBUG -std=c++11 "--generate-code=arch=compute_52,code=[compute_52,sm_52]" "--generate-code=arch=compute_61,code=[compute_61,sm_61]" "--generate-code=arch=compute_70,code=[compute_70,sm_70]" -Xcompiler=-fPIC -use_fast_math -Xcompiler "-Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Wno-pedantic -march=native" -MD -MT vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-cuda.cu.o -MF vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-cuda.cu.o.d -x cu -c /tmp/pip-install-uojkipms/llama-cpp-python_33cccd461278400b9375678f6012915d/vendor/llama.cpp/ggml-cuda.cu -o vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-cuda.cu.o
  [40/56] /usr/bin/c++ -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -DGGML_SCHED_MAX_COPIES=4 -DGGML_USE_CUDA -DGGML_USE_LLAMAFILE -DK_QUANTS_PER_ITERATION=2 -DLLAMA_BUILD -DLLAMA_SHARED -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dllama_EXPORTS -I/tmp/pip-install-uojkipms/llama-cpp-python_33cccd461278400b9375678f6012915d/vendor/llama.cpp/. -isystem /usr/local/cuda/targets/x86_64-linux/include -O3 -DNDEBUG -std=gnu++11 -fPIC -Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wno-format-truncation -Wextra-semi -march=native -MD -MT vendor/llama.cpp/CMakeFiles/llama.dir/unicode.cpp.o -MF vendor/llama.cpp/CMakeFiles/llama.dir/unicode.cpp.o.d -o vendor/llama.cpp/CMakeFiles/llama.dir/unicode.cpp.o -c /tmp/pip-install-uojkipms/llama-cpp-python_33cccd461278400b9375678f6012915d/vendor/llama.cpp/unicode.cpp
  [41/56] /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -DGGML_SCHED_MAX_COPIES=4 -DGGML_USE_CUDA -DGGML_USE_LLAMAFILE -DK_QUANTS_PER_ITERATION=2 -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -I/tmp/pip-install-uojkipms/llama-cpp-python_33cccd461278400b9375678f6012915d/vendor/llama.cpp/. -isystem /usr/local/cuda/targets/x86_64-linux/include -O3 -DNDEBUG -std=c++11 "--generate-code=arch=compute_52,code=[compute_52,sm_52]" "--generate-code=arch=compute_61,code=[compute_61,sm_61]" "--generate-code=arch=compute_70,code=[compute_70,sm_70]" -Xcompiler=-fPIC -use_fast_math -Xcompiler "-Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Wno-pedantic -march=native" -MD -MT vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-cuda/mmq.cu.o -MF vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-cuda/mmq.cu.o.d -x cu -c /tmp/pip-install-uojkipms/llama-cpp-python_33cccd461278400b9375678f6012915d/vendor/llama.cpp/ggml-cuda/mmq.cu -o vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-cuda/mmq.cu.o
  [42/56] /usr/bin/c++ -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -DGGML_SCHED_MAX_COPIES=4 -DGGML_USE_CUDA -DGGML_USE_LLAMAFILE -DK_QUANTS_PER_ITERATION=2 -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -I/tmp/pip-install-uojkipms/llama-cpp-python_33cccd461278400b9375678f6012915d/vendor/llama.cpp/common/. -I/tmp/pip-install-uojkipms/llama-cpp-python_33cccd461278400b9375678f6012915d/vendor/llama.cpp/. -O3 -DNDEBUG -std=gnu++11 -fPIC -Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wno-format-truncation -Wextra-semi -march=native -MD -MT vendor/llama.cpp/common/CMakeFiles/common.dir/json-schema-to-grammar.cpp.o -MF vendor/llama.cpp/common/CMakeFiles/common.dir/json-schema-to-grammar.cpp.o.d -o vendor/llama.cpp/common/CMakeFiles/common.dir/json-schema-to-grammar.cpp.o -c /tmp/pip-install-uojkipms/llama-cpp-python_33cccd461278400b9375678f6012915d/vendor/llama.cpp/common/json-schema-to-grammar.cpp
  [43/56] /usr/bin/c++ -DGGML_USE_CUBLAS -DGGML_USE_CUDA -DLLAMA_BUILD -DLLAMA_SHARED -I/tmp/pip-install-uojkipms/llama-cpp-python_33cccd461278400b9375678f6012915d/vendor/llama.cpp/examples/llava/. -I/tmp/pip-install-uojkipms/llama-cpp-python_33cccd461278400b9375678f6012915d/vendor/llama.cpp/examples/llava/../.. -I/tmp/pip-install-uojkipms/llama-cpp-python_33cccd461278400b9375678f6012915d/vendor/llama.cpp/examples/llava/../../common -I/tmp/pip-install-uojkipms/llama-cpp-python_33cccd461278400b9375678f6012915d/vendor/llama.cpp/. -isystem /usr/local/cuda/targets/x86_64-linux/include -O3 -DNDEBUG -fPIC -Wno-cast-qual -MD -MT vendor/llama.cpp/examples/llava/CMakeFiles/llava.dir/clip.cpp.o -MF vendor/llama.cpp/examples/llava/CMakeFiles/llava.dir/clip.cpp.o.d -o vendor/llama.cpp/examples/llava/CMakeFiles/llava.dir/clip.cpp.o -c /tmp/pip-install-uojkipms/llama-cpp-python_33cccd461278400b9375678f6012915d/vendor/llama.cpp/examples/llava/clip.cpp
  [44/56] /usr/bin/c++ -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -DGGML_SCHED_MAX_COPIES=4 -DGGML_USE_CUDA -DGGML_USE_LLAMAFILE -DK_QUANTS_PER_ITERATION=2 -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -I/tmp/pip-install-uojkipms/llama-cpp-python_33cccd461278400b9375678f6012915d/vendor/llama.cpp/common/. -I/tmp/pip-install-uojkipms/llama-cpp-python_33cccd461278400b9375678f6012915d/vendor/llama.cpp/. -O3 -DNDEBUG -std=gnu++11 -fPIC -Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wno-format-truncation -Wextra-semi -march=native -MD -MT vendor/llama.cpp/common/CMakeFiles/common.dir/common.cpp.o -MF vendor/llama.cpp/common/CMakeFiles/common.dir/common.cpp.o.d -o vendor/llama.cpp/common/CMakeFiles/common.dir/common.cpp.o -c /tmp/pip-install-uojkipms/llama-cpp-python_33cccd461278400b9375678f6012915d/vendor/llama.cpp/common/common.cpp
  [45/56] /usr/bin/c++ -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -DGGML_SCHED_MAX_COPIES=4 -DGGML_USE_CUDA -DGGML_USE_LLAMAFILE -DK_QUANTS_PER_ITERATION=2 -DLLAMA_BUILD -DLLAMA_SHARED -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dllama_EXPORTS -I/tmp/pip-install-uojkipms/llama-cpp-python_33cccd461278400b9375678f6012915d/vendor/llama.cpp/. -isystem /usr/local/cuda/targets/x86_64-linux/include -O3 -DNDEBUG -std=gnu++11 -fPIC -Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wno-format-truncation -Wextra-semi -march=native -MD -MT vendor/llama.cpp/CMakeFiles/llama.dir/llama.cpp.o -MF vendor/llama.cpp/CMakeFiles/llama.dir/llama.cpp.o.d -o vendor/llama.cpp/CMakeFiles/llama.dir/llama.cpp.o -c /tmp/pip-install-uojkipms/llama-cpp-python_33cccd461278400b9375678f6012915d/vendor/llama.cpp/llama.cpp
  [46/56] /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -DGGML_SCHED_MAX_COPIES=4 -DGGML_USE_CUDA -DGGML_USE_LLAMAFILE -DK_QUANTS_PER_ITERATION=2 -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -I/tmp/pip-install-uojkipms/llama-cpp-python_33cccd461278400b9375678f6012915d/vendor/llama.cpp/. -isystem /usr/local/cuda/targets/x86_64-linux/include -O3 -DNDEBUG -std=c++11 "--generate-code=arch=compute_52,code=[compute_52,sm_52]" "--generate-code=arch=compute_61,code=[compute_61,sm_61]" "--generate-code=arch=compute_70,code=[compute_70,sm_70]" -Xcompiler=-fPIC -use_fast_math -Xcompiler "-Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Wno-pedantic -march=native" -MD -MT vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-cuda/fattn-vec-f16.cu.o -MF vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-cuda/fattn-vec-f16.cu.o.d -x cu -c /tmp/pip-install-uojkipms/llama-cpp-python_33cccd461278400b9375678f6012915d/vendor/llama.cpp/ggml-cuda/fattn-vec-f16.cu -o vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-cuda/fattn-vec-f16.cu.o
  [47/56] /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -DGGML_SCHED_MAX_COPIES=4 -DGGML_USE_CUDA -DGGML_USE_LLAMAFILE -DK_QUANTS_PER_ITERATION=2 -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -I/tmp/pip-install-uojkipms/llama-cpp-python_33cccd461278400b9375678f6012915d/vendor/llama.cpp/. -isystem /usr/local/cuda/targets/x86_64-linux/include -O3 -DNDEBUG -std=c++11 "--generate-code=arch=compute_52,code=[compute_52,sm_52]" "--generate-code=arch=compute_61,code=[compute_61,sm_61]" "--generate-code=arch=compute_70,code=[compute_70,sm_70]" -Xcompiler=-fPIC -use_fast_math -Xcompiler "-Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Wno-pedantic -march=native" -MD -MT vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-cuda/fattn.cu.o -MF vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-cuda/fattn.cu.o.d -x cu -c /tmp/pip-install-uojkipms/llama-cpp-python_33cccd461278400b9375678f6012915d/vendor/llama.cpp/ggml-cuda/fattn.cu -o vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-cuda/fattn.cu.o
  [48/56] /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -DGGML_SCHED_MAX_COPIES=4 -DGGML_USE_CUDA -DGGML_USE_LLAMAFILE -DK_QUANTS_PER_ITERATION=2 -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -I/tmp/pip-install-uojkipms/llama-cpp-python_33cccd461278400b9375678f6012915d/vendor/llama.cpp/. -isystem /usr/local/cuda/targets/x86_64-linux/include -O3 -DNDEBUG -std=c++11 "--generate-code=arch=compute_52,code=[compute_52,sm_52]" "--generate-code=arch=compute_61,code=[compute_61,sm_61]" "--generate-code=arch=compute_70,code=[compute_70,sm_70]" -Xcompiler=-fPIC -use_fast_math -Xcompiler "-Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Wno-pedantic -march=native" -MD -MT vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-cuda/fattn-vec-f32.cu.o -MF vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-cuda/fattn-vec-f32.cu.o.d -x cu -c /tmp/pip-install-uojkipms/llama-cpp-python_33cccd461278400b9375678f6012915d/vendor/llama.cpp/ggml-cuda/fattn-vec-f32.cu -o vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-cuda/fattn-vec-f32.cu.o
  [49/56] /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -DGGML_SCHED_MAX_COPIES=4 -DGGML_USE_CUDA -DGGML_USE_LLAMAFILE -DK_QUANTS_PER_ITERATION=2 -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -I/tmp/pip-install-uojkipms/llama-cpp-python_33cccd461278400b9375678f6012915d/vendor/llama.cpp/. -isystem /usr/local/cuda/targets/x86_64-linux/include -O3 -DNDEBUG -std=c++11 "--generate-code=arch=compute_52,code=[compute_52,sm_52]" "--generate-code=arch=compute_61,code=[compute_61,sm_61]" "--generate-code=arch=compute_70,code=[compute_70,sm_70]" -Xcompiler=-fPIC -use_fast_math -Xcompiler "-Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Wno-pedantic -march=native" -MD -MT vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-cuda/mmvq.cu.o -MF vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-cuda/mmvq.cu.o.d -x cu -c /tmp/pip-install-uojkipms/llama-cpp-python_33cccd461278400b9375678f6012915d/vendor/llama.cpp/ggml-cuda/mmvq.cu -o vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-cuda/mmvq.cu.o
  ninja: build stopped: subcommand failed.

  *** CMake build failed
  [end of output]

note: This error originates from a subprocess, and is likely not a problem with pip. ERROR: Failed building wheel for llama_cpp_python Failed to build llama_cpp_python ERROR: Could not build wheels for llama_cpp_python, which is required to install pyproject.toml-based projects

[notice] A new release of pip available: 22.3.1 -> 24.0 [notice] To update, run: pip install --upgrade pip Error: error building at STEP "RUN CMAKE_ARGS="-DLLAMA_CUBLAS=on" CFLAGS="-mno-avx" python3.11 -m pip install -r https://raw.githubusercontent.com/instructlab/instructlab/${GIT_TAG}/requirements.txt --force-reinstall --no-cache-dir llama-cpp-python": error while running runtime: exit status 1 make[1]: [Makefile:19: nvidia] Error 1 make[1]: Leaving directory '/opt/rhelai-dev-preview/training/instructlab' make: [Makefile:48: instruct-nvidia] Error 2

RedHatOfficial / rhelai-dev-preview

error running 'make instruct-nvidia` #19