Unable to install llama-cpp-python with CUBLAS or CUDA enabled under tensorflow-gpu docker image.

I'm attempting to install llama-cpp-python under the tensorflow-gpu docker image (nightly build) . When I attempt to do so, I get the following error messages.

root@a1f1e127514b:/tf# CMAKE_ARGS="-DLLAMA_CUDA=on" FORCE_CMAKE=1 pip install llama-cpp-python==0.2.67 --force-reinstall --upgrade --no-cache-dir
Collecting llama-cpp-python==0.2.67
  Downloading llama_cpp_python-0.2.67.tar.gz (42.4 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 42.4/42.4 MB 21.5 MB/s eta 0:00:00
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Installing backend dependencies ... done
  Preparing metadata (pyproject.toml) ... done
Collecting typing-extensions>=4.5.0 (from llama-cpp-python==0.2.67)
  Downloading typing_extensions-4.11.0-py3-none-any.whl.metadata (3.0 kB)
Collecting numpy>=1.20.0 (from llama-cpp-python==0.2.67)
  Downloading numpy-1.26.4-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (61 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 61.0/61.0 kB 212.4 MB/s eta 0:00:00
Collecting diskcache>=5.6.1 (from llama-cpp-python==0.2.67)
  Downloading diskcache-5.6.3-py3-none-any.whl.metadata (20 kB)
Collecting jinja2>=2.11.3 (from llama-cpp-python==0.2.67)
  Downloading jinja2-3.1.4-py3-none-any.whl.metadata (2.6 kB)
Collecting MarkupSafe>=2.0 (from jinja2>=2.11.3->llama-cpp-python==0.2.67)
  Downloading MarkupSafe-2.1.5-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (3.0 kB)
Downloading diskcache-5.6.3-py3-none-any.whl (45 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 45.5/45.5 kB 192.3 MB/s eta 0:00:00
Downloading jinja2-3.1.4-py3-none-any.whl (133 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 133.3/133.3 kB 46.2 MB/s eta 0:00:00
Downloading numpy-1.26.4-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (18.3 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 18.3/18.3 MB 56.5 MB/s eta 0:00:00
Downloading typing_extensions-4.11.0-py3-none-any.whl (34 kB)
Downloading MarkupSafe-2.1.5-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (28 kB)
Building wheels for collected packages: llama-cpp-python
  Building wheel for llama-cpp-python (pyproject.toml) ... error
  error: subprocess-exited-with-error

  × Building wheel for llama-cpp-python (pyproject.toml) did not run successfully.
  │ exit code: 1
  ╰─> [85 lines of output]
      *** scikit-build-core 0.9.3 using CMake 3.29.2 (wheel)
      *** Configuring CMake...
      loading initial cache file /tmp/tmpkpujo6gx/build/CMakeInit.txt
      -- The C compiler identification is GNU 11.4.0
      -- The CXX compiler identification is GNU 11.4.0
      -- Detecting C compiler ABI info
      -- Detecting C compiler ABI info - done
      -- Check for working C compiler: /usr/bin/cc - skipped
      -- Detecting C compile features
      -- Detecting C compile features - done
      -- Detecting CXX compiler ABI info
      -- Detecting CXX compiler ABI info - done
      -- Check for working CXX compiler: /usr/bin/c++ - skipped
      -- Detecting CXX compile features
      -- Detecting CXX compile features - done
      -- Found Git: /usr/bin/git (found version "2.34.1")
      -- Performing Test CMAKE_HAVE_LIBC_PTHREAD
      -- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
      -- Found Threads: TRUE
      -- Found CUDAToolkit: /usr/local/cuda-12.3/targets/x86_64-linux/include (found version "12.3.107")
      -- CUDA found
      -- The CUDA compiler identification is NVIDIA 12.3.107
      -- Detecting CUDA compiler ABI info
      -- Detecting CUDA compiler ABI info - done
      -- Check for working CUDA compiler: /usr/local/cuda-12.3/bin/nvcc - skipped
      -- Detecting CUDA compile features
      -- Detecting CUDA compile features - done
      -- Using CUDA architectures: 52;61;70
      -- CUDA host compiler is GNU 11.4.0

      -- Warning: ccache not found - consider installing it for faster compilation or disable this warning with LLAMA_CCACHE=OFF
      -- CMAKE_SYSTEM_PROCESSOR: x86_64
      -- x86 detected
      CMake Warning (dev) at CMakeLists.txt:26 (install):
        Target llama has PUBLIC_HEADER files but no PUBLIC_HEADER DESTINATION.
      This warning is for project developers.  Use -Wno-dev to suppress it.

      CMake Warning (dev) at CMakeLists.txt:35 (install):
        Target llama has PUBLIC_HEADER files but no PUBLIC_HEADER DESTINATION.
      This warning is for project developers.  Use -Wno-dev to suppress it.

      -- Configuring done (3.6s)
      CMake Error at vendor/llama.cpp/CMakeLists.txt:1180 (target_link_libraries):
        Target "ggml" links to:

          CUDA::cublas

        but the target was not found.  Possible reasons include:

          * There is a typo in the target name.
          * A find_package call is missing for an IMPORTED target.
          * An ALIAS target is missing.

      CMake Error at vendor/llama.cpp/CMakeLists.txt:1187 (target_link_libraries):
        Target "ggml_shared" links to:

          CUDA::cublas

        but the target was not found.  Possible reasons include:

          * There is a typo in the target name.
          * A find_package call is missing for an IMPORTED target.
          * An ALIAS target is missing.

      CMake Error at vendor/llama.cpp/CMakeLists.txt:1204 (target_link_libraries):
        Target "llama" links to:

          CUDA::cublas

        but the target was not found.  Possible reasons include:

          * There is a typo in the target name.
          * A find_package call is missing for an IMPORTED target.
          * An ALIAS target is missing.

      -- Generating done (0.0s)
      CMake Generate step failed.  Build files cannot be regenerated correctly.

      *** CMake configuration failed
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
  ERROR: Failed building wheel for llama-cpp-python
Failed to build llama-cpp-python
ERROR: Could not build wheels for llama-cpp-python, which is required to install pyproject.toml-based projects

I attempted to make sure the CUDA bin files were in the path & LD_LIBRARY_PATH variables, but this doesn't seem to have mitigated the issue.

root@a1f1e127514b:/tf# printenv LD_LIBRARY_PATH
/usr/local/cuda-12.3/lib64:/usr/local/cuda/lib64:/usr/local/cuda-12.3/lib64:/usr/local/nvidia/lib:/usr/local/nvidia/lib64
root@a1f1e127514b:/tf# printenv PATH
/usr/local/cuda-12.3/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin

Here's the docker command I use to create the image.

sudo docker run --runtime=nvidia --gpus all -it \
    --device=/dev/nvidia-uvm \
    --device=/dev/nvidia-uvm-tools \
    --device=/dev/nvidia-modeset \
    --device=/dev/nvidiactl \
    --device=/dev/nvidia0 \
    --device=/dev/nvidia1 \
    tensorflow/tensorflow:nightly-gpu-jupyter bash

Any advice as to what I should do differently would be greatly appreciated.

abetlen / llama-cpp-python

Unable to install llama-cpp-python with CUBLAS or CUDA enabled under tensorflow-gpu docker image. #1431