bonuschild commented 1 year ago

Env

WSL 2
Nvidia driver installed
CUDA support installed by pip install torch torchvison torchaudio, which will install nvidia-cuda-xxx as well.
llama-cpp-python build command: CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install llama-cpp-python[server] --force-reinstall --upgrade --no-cache-dir

Problem Reproduce

Execute python -m llama_cpp.server --model yarn-mistral-7b-128k.Q5_K_M.gguf with error:

CUDA error 100 at /tmp/pip-install-hjlvezud/llama-cpp-python_b986d017976f49d0bf4e93e3963398af/vendor/llama.cpp/ggml-cuda.cu:5823: no CUDA-capable device is detected
current device: 0

`nvidia-smi` output

Mon Nov  6 23:21:17 2023
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.86.05              Driver Version: 536.23       CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce RTX 3060        On  | 00000000:07:00.0  On |                  N/A |
| 30%   40C    P8              13W / 170W |   1860MiB / 12288MiB |     11%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|  No running processes found                                                           |
+---------------------------------------------------------------------------------------+

CUDA verifies

with torch:

Python 3.8.18 (default, Sep 11 2023, 13:40:15)
Type 'copyright', 'credits' or 'license' for more information
IPython 8.12.3 -- An enhanced Interactive Python. Type '?' for help.

In [1]: import torch

In [2]: torch.cuda.is_available()
Out[2]: True

In [3]: torch.cuda.get_device_properties(0)
Out[3]: _CudaDeviceProperties(name='NVIDIA GeForce RTX 3060', major=8, minor=6, total_memory=12287MB, multi_processor_count=28)

In [4]:

with pip:

# pip list | grep cublas
nvidia-cublas-cu12       12.1.3.1

petehunt commented 1 year ago

I had this issue and it was because a second nvidia cuda driver was installed over the one provided by WSL. I had to run apt-get purge nvidia-cuda-toolkit and then reset my path to point at the nvcc binary located at /usr/local/cuda/bin/nvcc

Hope this helps

Slayery777 commented 1 year ago

I have the same problem with versions newer than 0.2.10

bonuschild commented 1 year ago

@petehunt I will try it later, Thanks!

KevinGage commented 1 year ago

I believe that I am experiencing the same problem. @petehunt could you please provide a little more detail on how you "reset my path to point at the nvcc binary located at /usr/local/cuda/bin/nvcc"? I see the nvcc file in the location that you mention but I'm not sure how I should edit my path to point at it. Thanks in advanced for any help.

Here are the same troubleshooting steps that @bonuschild performed in case it's helpful.

Issue

Unlike the original post I can start the llama_cpp.server, but I do not end up with cuda/cublas support enabled.

Install with pip

CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install llama-cpp-python --force-reinstall --upgrade --no-cache-dir
Collecting llama-cpp-python
  Downloading llama_cpp_python-0.2.16.tar.gz (7.8 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 7.8/7.8 MB 31.5 MB/s eta 0:00:00
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Installing backend dependencies ... done
  Preparing metadata (pyproject.toml) ... done
Collecting typing-extensions>=4.5.0 (from llama-cpp-python)
  Downloading typing_extensions-4.8.0-py3-none-any.whl.metadata (3.0 kB)
Collecting numpy>=1.20.0 (from llama-cpp-python)
  Downloading numpy-1.26.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (61 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 61.2/61.2 kB 150.3 MB/s eta 0:00:00
Collecting diskcache>=5.6.1 (from llama-cpp-python)
  Downloading diskcache-5.6.3-py3-none-any.whl.metadata (20 kB)
Downloading diskcache-5.6.3-py3-none-any.whl (45 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 45.5/45.5 kB 132.6 MB/s eta 0:00:00
Downloading numpy-1.26.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (18.2 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 18.2/18.2 MB 47.0 MB/s eta 0:00:00
Downloading typing_extensions-4.8.0-py3-none-any.whl (31 kB)
Building wheels for collected packages: llama-cpp-python
  Building wheel for llama-cpp-python (pyproject.toml) ... done
  Created wheel for llama-cpp-python: filename=llama_cpp_python-0.2.16-cp310-cp310-manylinux_2_35_x86_64.whl size=1899547 sha256=5edc3e0287a274697d04a5a24f5d4d114a22c4a0741be90082771099167a7e2f
  Stored in directory: /tmp/pip-ephem-wheel-cache-oa78qcxu/wheels/5b/2d/75/aea44211650edc2984c799575c2572b6677e561dad9f969257
Successfully built llama-cpp-python
Installing collected packages: typing-extensions, numpy, diskcache, llama-cpp-python
  Attempting uninstall: typing-extensions
    Found existing installation: typing_extensions 4.8.0
    Uninstalling typing_extensions-4.8.0:
      Successfully uninstalled typing_extensions-4.8.0
  Attempting uninstall: numpy
    Found existing installation: numpy 1.26.1
    Uninstalling numpy-1.26.1:
      Successfully uninstalled numpy-1.26.1
  Attempting uninstall: diskcache
    Found existing installation: diskcache 5.6.3
    Uninstalling diskcache-5.6.3:
      Successfully uninstalled diskcache-5.6.3
  Attempting uninstall: llama-cpp-python
    Found existing installation: llama_cpp_python 0.2.16
    Uninstalling llama_cpp_python-0.2.16:
      Successfully uninstalled llama_cpp_python-0.2.16
Successfully installed diskcache-5.6.3 llama-cpp-python-0.2.16 numpy-1.26.1 typing-extensions-4.8.0

Start Server

python -m llama_cpp.server --model llama-2-7b-chat.Q5_K_M.gguf
llama_model_loader: loaded meta data with 19 key-value pairs and 291 tensors from llama-2-7b-chat.Q5_K_M.gguf (version GGUF V2)

...

llama_new_context_with_model: n_ctx      = 2048
llama_new_context_with_model: freq_base  = 10000.0
llama_new_context_with_model: freq_scale = 1
llama_new_context_with_model: kv self size  = 1024.00 MB
llama_build_graph: non-view tensors processed: 740/740
llama_new_context_with_model: compute buffer total size = 162.63 MB
AVX = 1 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 0 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 |
INFO:     Started server process [32769]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://localhost:8000 (Press CTRL+C to quit)

Check with GGML_USE_CUBLAS

python -c "from llama_cpp import GGML_USE_CUBLAS; print(GGML_USE_CUBLAS)"
False

nvidia-smi output

Fri Nov 10 11:06:35 2023       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 545.29.01              Driver Version: 546.01       CUDA Version: 12.3     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce GTX 980 Ti      On  | 00000000:02:00.0  On |                  N/A |
|  0%   39C    P2              64W / 275W |    944MiB /  6144MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|  No running processes found                                                           |
+---------------------------------------------------------------------------------------+

CUDA verifies with torch

python
Python 3.10.12 (main, Jun 11 2023, 05:26:28) [GCC 11.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> torch.cuda.is_available()
True
>>> torch.cuda.get_device_properties(0)
_CudaDeviceProperties(name='NVIDIA GeForce GTX 980 Ti', major=5, minor=2, total_memory=6143MB, multi_processor_count=22)
>>>

Cuda found in pip

pip list | grep cublas
nvidia-cublas-cu12       12.1.3.1

abetlen commented 1 year ago

@KevinGage that looks different, you're installation did not install with CUDA, can you share the pip logs but run with the --verbose flag, it should tell us where / if there's an issue finding the path to cuda.

abetlen commented 1 year ago

@petehunt and just confirming, if you cmake build llama.cpp standalone it doesn't encounter this error?

KevinGage commented 1 year ago

@KevinGage that looks different, you're installation did not install with CUDA, can you share the pip logs but run with the --verbose flag, it should tell us where / if there's an issue finding the path to cuda.

I'm really sorry. I hate hijacking issues that aren't related. Let me know if you would prefer that I open a new issue for this.

As requested here is the output using the --verbose flag, and yes it confirms CUDA wasn't found. Could NOT find CUDAToolkit (missing: CUDA_CUDART) (found version "12.3.52") CMake Warning at vendor/llama.cpp/CMakeLists.txt:305 (message): cuBLAS not found

Full output

CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install llama-cpp-python --force-reinstall --upgrade --no-cache-dir --verbose
Using pip 23.3.1 from /mnt/f/source/repos/LangchainTest/.venv/lib/python3.10/site-packages/pip (python 3.10)
Collecting llama-cpp-python
  Downloading llama_cpp_python-0.2.16.tar.gz (7.8 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 7.8/7.8 MB 24.6 MB/s eta 0:00:00
  Running command pip subprocess to install build dependencies
  Collecting scikit-build-core>=0.5.1 (from scikit-build-core[pyproject]>=0.5.1)
    Using cached scikit_build_core-0.6.1-py3-none-any.whl.metadata (17 kB)
  Collecting exceptiongroup (from scikit-build-core>=0.5.1->scikit-build-core[pyproject]>=0.5.1)
    Using cached exceptiongroup-1.1.3-py3-none-any.whl.metadata (6.1 kB)
  Collecting packaging>=20.9 (from scikit-build-core>=0.5.1->scikit-build-core[pyproject]>=0.5.1)
    Using cached packaging-23.2-py3-none-any.whl.metadata (3.2 kB)
  Collecting tomli>=1.1 (from scikit-build-core>=0.5.1->scikit-build-core[pyproject]>=0.5.1)
    Using cached tomli-2.0.1-py3-none-any.whl (12 kB)
  Collecting pathspec>=0.10.1 (from scikit-build-core[pyproject]>=0.5.1)
    Using cached pathspec-0.11.2-py3-none-any.whl.metadata (19 kB)
  Collecting pyproject-metadata>=0.5 (from scikit-build-core[pyproject]>=0.5.1)
    Using cached pyproject_metadata-0.7.1-py3-none-any.whl (7.4 kB)
  Using cached scikit_build_core-0.6.1-py3-none-any.whl (134 kB)
  Using cached packaging-23.2-py3-none-any.whl (53 kB)
  Using cached pathspec-0.11.2-py3-none-any.whl (29 kB)
  Using cached exceptiongroup-1.1.3-py3-none-any.whl (14 kB)
  Installing collected packages: tomli, pathspec, packaging, exceptiongroup, scikit-build-core, pyproject-metadata
  Successfully installed exceptiongroup-1.1.3 packaging-23.2 pathspec-0.11.2 pyproject-metadata-0.7.1 scikit-build-core-0.6.1 tomli-2.0.1
  Installing build dependencies ... done
  Running command Getting requirements to build wheel
  Getting requirements to build wheel ... done
  Running command pip subprocess to install backend dependencies
  Collecting cmake>=3.21
    Using cached cmake-3.27.7-py2.py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (6.7 kB)
  Collecting ninja>=1.5
    Using cached ninja-1.11.1.1-py2.py3-none-manylinux1_x86_64.manylinux_2_5_x86_64.whl.metadata (5.3 kB)
  Using cached cmake-3.27.7-py2.py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (26.0 MB)
  Using cached ninja-1.11.1.1-py2.py3-none-manylinux1_x86_64.manylinux_2_5_x86_64.whl (307 kB)
  Installing collected packages: ninja, cmake
  Successfully installed cmake-3.27.7 ninja-1.11.1.1
  Installing backend dependencies ... done
  Running command Preparing metadata (pyproject.toml)
  *** scikit-build-core 0.6.1 using CMake 3.27.7 (metadata_wheel)
  Preparing metadata (pyproject.toml) ... done
Collecting typing-extensions>=4.5.0 (from llama-cpp-python)
  Obtaining dependency information for typing-extensions>=4.5.0 from https://files.pythonhosted.org/packages/24/21/7d397a4b7934ff4028987914ac1044d3b7d52712f30e2ac7a2ae5bc86dd0/typing_extensions-4.8.0-py3-none-any.whl.metadata
  Downloading typing_extensions-4.8.0-py3-none-any.whl.metadata (3.0 kB)
Collecting numpy>=1.20.0 (from llama-cpp-python)
  Obtaining dependency information for numpy>=1.20.0 from https://files.pythonhosted.org/packages/2d/5e/cb38e3d1916cc29880c84a9332a9122a8f49a7b57ec7aea63e0f678587a2/numpy-1.26.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata
  Downloading numpy-1.26.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (61 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 61.2/61.2 kB 132.5 MB/s eta 0:00:00
Collecting diskcache>=5.6.1 (from llama-cpp-python)
  Obtaining dependency information for diskcache>=5.6.1 from https://files.pythonhosted.org/packages/3f/27/4570e78fc0bf5ea0ca45eb1de3818a23787af9b390c0b0a0033a1b8236f9/diskcache-5.6.3-py3-none-any.whl.metadata
  Downloading diskcache-5.6.3-py3-none-any.whl.metadata (20 kB)
Downloading diskcache-5.6.3-py3-none-any.whl (45 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 45.5/45.5 kB 120.3 MB/s eta 0:00:00
Downloading numpy-1.26.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (18.2 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 18.2/18.2 MB 60.5 MB/s eta 0:00:00
Downloading typing_extensions-4.8.0-py3-none-any.whl (31 kB)
Building wheels for collected packages: llama-cpp-python
  Running command Building wheel for llama-cpp-python (pyproject.toml)
  *** scikit-build-core 0.6.1 using CMake 3.27.7 (wheel)
  *** Configuring CMake...
  2023-11-10 11:26:24,783 - scikit_build_core - WARNING - libdir/ldlibrary: /usr/lib/x86_64-linux-gnu/libpython3.10.so is not a real file!
  2023-11-10 11:26:24,783 - scikit_build_core - WARNING - Can't find a Python library, got libdir=/usr/lib/x86_64-linux-gnu, ldlibrary=libpython3.10.so, multiarch=x86_64-linux-gnu, masd=x86_64-linux-gnu
  loading initial cache file /tmp/tmprlt5hbz7/build/CMakeInit.txt
  -- The C compiler identification is GNU 11.4.0
  -- The CXX compiler identification is GNU 11.4.0
  -- Detecting C compiler ABI info
  -- Detecting C compiler ABI info - done
  -- Check for working C compiler: /usr/bin/cc - skipped
  -- Detecting C compile features
  -- Detecting C compile features - done
  -- Detecting CXX compiler ABI info
  -- Detecting CXX compiler ABI info - done
  -- Check for working CXX compiler: /usr/bin/c++ - skipped
  -- Detecting CXX compile features
  -- Detecting CXX compile features - done
  -- Performing Test CMAKE_HAVE_LIBC_PTHREAD
  -- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
  -- Found Threads: TRUE
  -- Unable to find cudart library.
  -- Could NOT find CUDAToolkit (missing: CUDA_CUDART) (found version "12.3.52")
  CMake Warning at vendor/llama.cpp/CMakeLists.txt:305 (message):
    cuBLAS not found

  -- CMAKE_SYSTEM_PROCESSOR: x86_64
  -- x86 detected
  CMake Warning (dev) at CMakeLists.txt:18 (install):
    Target llama has PUBLIC_HEADER files but no PUBLIC_HEADER DESTINATION.
  This warning is for project developers.  Use -Wno-dev to suppress it.

  CMake Warning (dev) at CMakeLists.txt:27 (install):
    Target llama has PUBLIC_HEADER files but no PUBLIC_HEADER DESTINATION.
  This warning is for project developers.  Use -Wno-dev to suppress it.

  -- Configuring done (3.8s)
  -- Generating done (0.0s)
  -- Build files have been written to: /tmp/tmprlt5hbz7/build
  *** Building project with Ninja...
  Change Dir: '/tmp/tmprlt5hbz7/build'

  Run Build Command(s): /tmp/pip-build-env-f1vzwrup/normal/lib/python3.10/site-packages/ninja/data/bin/ninja -v
  [1/22] cd /tmp/pip-install-koypzbi9/llama-cpp-python_4eb0914d5d2f4adc9780baa1fbbb1d84/vendor/llama.cpp && /tmp/pip-build-env-f1vzwrup/normal/lib/python3.10/site-packages/cmake/data/bin/cmake -DMSVC= -DCMAKE_C_COMPILER_VERSION=11.4.0 -DCMAKE_C_COMPILER_ID=GNU -DCMAKE_VS_PLATFORM_NAME= -DCMAKE_C_COMPILER=/usr/bin/cc -P /tmp/pip-install-koypzbi9/llama-cpp-python_4eb0914d5d2f4adc9780baa1fbbb1d84/vendor/llama.cpp/common/../scripts/build-info.cmake
  -- Found Git: /usr/bin/git (found version "2.34.1")
  [2/22] /usr/bin/c++ -D_GNU_SOURCE -D_XOPEN_SOURCE=600  -O3 -DNDEBUG -std=gnu++11 -fPIC -Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wno-format-truncation -Wextra-semi -march=native -MD -MT vendor/llama.cpp/common/CMakeFiles/build_info.dir/build-info.cpp.o -MF vendor/llama.cpp/common/CMakeFiles/build_info.dir/build-info.cpp.o.d -o vendor/llama.cpp/common/CMakeFiles/build_info.dir/build-info.cpp.o -c /tmp/pip-install-koypzbi9/llama-cpp-python_4eb0914d5d2f4adc9780baa1fbbb1d84/vendor/llama.cpp/common/build-info.cpp
  [3/22] /usr/bin/cc -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -I/tmp/pip-install-koypzbi9/llama-cpp-python_4eb0914d5d2f4adc9780baa1fbbb1d84/vendor/llama.cpp/. -O3 -DNDEBUG -std=gnu11 -fPIC -Wshadow -Wstrict-prototypes -Wpointer-arith -Wmissing-prototypes -Werror=implicit-int -Werror=implicit-function-declaration -Wdouble-promotion -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -march=native -MD -MT vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-backend.c.o -MF vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-backend.c.o.d -o vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-backend.c.o -c /tmp/pip-install-koypzbi9/llama-cpp-python_4eb0914d5d2f4adc9780baa1fbbb1d84/vendor/llama.cpp/ggml-backend.c
  [4/22] /usr/bin/cc -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -I/tmp/pip-install-koypzbi9/llama-cpp-python_4eb0914d5d2f4adc9780baa1fbbb1d84/vendor/llama.cpp/. -O3 -DNDEBUG -std=gnu11 -fPIC -Wshadow -Wstrict-prototypes -Wpointer-arith -Wmissing-prototypes -Werror=implicit-int -Werror=implicit-function-declaration -Wdouble-promotion -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -march=native -MD -MT vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-alloc.c.o -MF vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-alloc.c.o.d -o vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-alloc.c.o -c /tmp/pip-install-koypzbi9/llama-cpp-python_4eb0914d5d2f4adc9780baa1fbbb1d84/vendor/llama.cpp/ggml-alloc.c
  [5/22] /usr/bin/c++ -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -I/tmp/pip-install-koypzbi9/llama-cpp-python_4eb0914d5d2f4adc9780baa1fbbb1d84/vendor/llama.cpp/common/. -I/tmp/pip-install-koypzbi9/llama-cpp-python_4eb0914d5d2f4adc9780baa1fbbb1d84/vendor/llama.cpp/. -O3 -DNDEBUG -std=gnu++11 -fPIC -Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wno-format-truncation -Wextra-semi -march=native -MD -MT vendor/llama.cpp/common/CMakeFiles/common.dir/console.cpp.o -MF vendor/llama.cpp/common/CMakeFiles/common.dir/console.cpp.o.d -o vendor/llama.cpp/common/CMakeFiles/common.dir/console.cpp.o -c /tmp/pip-install-koypzbi9/llama-cpp-python_4eb0914d5d2f4adc9780baa1fbbb1d84/vendor/llama.cpp/common/console.cpp
  [6/22] /usr/bin/c++ -DLLAMA_BUILD -DLLAMA_SHARED -I/tmp/pip-install-koypzbi9/llama-cpp-python_4eb0914d5d2f4adc9780baa1fbbb1d84/vendor/llama.cpp/examples/llava/. -I/tmp/pip-install-koypzbi9/llama-cpp-python_4eb0914d5d2f4adc9780baa1fbbb1d84/vendor/llama.cpp/examples/llava/../.. -I/tmp/pip-install-koypzbi9/llama-cpp-python_4eb0914d5d2f4adc9780baa1fbbb1d84/vendor/llama.cpp/examples/llava/../../common -I/tmp/pip-install-koypzbi9/llama-cpp-python_4eb0914d5d2f4adc9780baa1fbbb1d84/vendor/llama.cpp/. -O3 -DNDEBUG -fPIC -Wno-cast-qual -MD -MT vendor/llama.cpp/examples/llava/CMakeFiles/llava.dir/llava.cpp.o -MF vendor/llama.cpp/examples/llava/CMakeFiles/llava.dir/llava.cpp.o.d -o vendor/llama.cpp/examples/llava/CMakeFiles/llava.dir/llava.cpp.o -c /tmp/pip-install-koypzbi9/llama-cpp-python_4eb0914d5d2f4adc9780baa1fbbb1d84/vendor/llama.cpp/examples/llava/llava.cpp
  /tmp/pip-install-koypzbi9/llama-cpp-python_4eb0914d5d2f4adc9780baa1fbbb1d84/vendor/llama.cpp/examples/llava/llava.cpp: In function ‘bool load_file_to_bytes(const char*, unsigned char**, long int*)’:
  /tmp/pip-install-koypzbi9/llama-cpp-python_4eb0914d5d2f4adc9780baa1fbbb1d84/vendor/llama.cpp/examples/llava/llava.cpp:130:10: warning: ignoring return value of ‘size_t fread(void*, size_t, size_t, FILE*)’ declared with attribute ‘warn_unused_result’ [-Wunused-result]
    130 |     fread(buffer, 1, fileSize, file); // Read the file into the buffer
        |     ~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~
  [7/22] /usr/bin/c++ -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -I/tmp/pip-install-koypzbi9/llama-cpp-python_4eb0914d5d2f4adc9780baa1fbbb1d84/vendor/llama.cpp/common/. -I/tmp/pip-install-koypzbi9/llama-cpp-python_4eb0914d5d2f4adc9780baa1fbbb1d84/vendor/llama.cpp/. -O3 -DNDEBUG -std=gnu++11 -fPIC -Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wno-format-truncation -Wextra-semi -march=native -MD -MT vendor/llama.cpp/common/CMakeFiles/common.dir/sampling.cpp.o -MF vendor/llama.cpp/common/CMakeFiles/common.dir/sampling.cpp.o.d -o vendor/llama.cpp/common/CMakeFiles/common.dir/sampling.cpp.o -c /tmp/pip-install-koypzbi9/llama-cpp-python_4eb0914d5d2f4adc9780baa1fbbb1d84/vendor/llama.cpp/common/sampling.cpp
  [8/22] /usr/bin/c++ -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -I/tmp/pip-install-koypzbi9/llama-cpp-python_4eb0914d5d2f4adc9780baa1fbbb1d84/vendor/llama.cpp/common/. -I/tmp/pip-install-koypzbi9/llama-cpp-python_4eb0914d5d2f4adc9780baa1fbbb1d84/vendor/llama.cpp/. -O3 -DNDEBUG -std=gnu++11 -fPIC -Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wno-format-truncation -Wextra-semi -march=native -MD -MT vendor/llama.cpp/common/CMakeFiles/common.dir/grammar-parser.cpp.o -MF vendor/llama.cpp/common/CMakeFiles/common.dir/grammar-parser.cpp.o.d -o vendor/llama.cpp/common/CMakeFiles/common.dir/grammar-parser.cpp.o -c /tmp/pip-install-koypzbi9/llama-cpp-python_4eb0914d5d2f4adc9780baa1fbbb1d84/vendor/llama.cpp/common/grammar-parser.cpp
  [9/22] /usr/bin/cc -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -I/tmp/pip-install-koypzbi9/llama-cpp-python_4eb0914d5d2f4adc9780baa1fbbb1d84/vendor/llama.cpp/. -O3 -DNDEBUG -std=gnu11 -fPIC -Wshadow -Wstrict-prototypes -Wpointer-arith -Wmissing-prototypes -Werror=implicit-int -Werror=implicit-function-declaration -Wdouble-promotion -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -march=native -MD -MT vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-quants.c.o -MF vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-quants.c.o.d -o vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-quants.c.o -c /tmp/pip-install-koypzbi9/llama-cpp-python_4eb0914d5d2f4adc9780baa1fbbb1d84/vendor/llama.cpp/ggml-quants.c
  [10/22] /usr/bin/c++  -I/tmp/pip-install-koypzbi9/llama-cpp-python_4eb0914d5d2f4adc9780baa1fbbb1d84/vendor/llama.cpp/common/. -I/tmp/pip-install-koypzbi9/llama-cpp-python_4eb0914d5d2f4adc9780baa1fbbb1d84/vendor/llama.cpp/. -I/tmp/pip-install-koypzbi9/llama-cpp-python_4eb0914d5d2f4adc9780baa1fbbb1d84/vendor/llama.cpp/examples/llava/. -I/tmp/pip-install-koypzbi9/llama-cpp-python_4eb0914d5d2f4adc9780baa1fbbb1d84/vendor/llama.cpp/examples/llava/../.. -I/tmp/pip-install-koypzbi9/llama-cpp-python_4eb0914d5d2f4adc9780baa1fbbb1d84/vendor/llama.cpp/examples/llava/../../common -O3 -DNDEBUG -MD -MT vendor/llama.cpp/examples/llava/CMakeFiles/llava-cli.dir/llava-cli.cpp.o -MF vendor/llama.cpp/examples/llava/CMakeFiles/llava-cli.dir/llava-cli.cpp.o.d -o vendor/llama.cpp/examples/llava/CMakeFiles/llava-cli.dir/llava-cli.cpp.o -c /tmp/pip-install-koypzbi9/llama-cpp-python_4eb0914d5d2f4adc9780baa1fbbb1d84/vendor/llama.cpp/examples/llava/llava-cli.cpp
  [11/22] /usr/bin/c++ -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -I/tmp/pip-install-koypzbi9/llama-cpp-python_4eb0914d5d2f4adc9780baa1fbbb1d84/vendor/llama.cpp/common/. -I/tmp/pip-install-koypzbi9/llama-cpp-python_4eb0914d5d2f4adc9780baa1fbbb1d84/vendor/llama.cpp/. -O3 -DNDEBUG -std=gnu++11 -fPIC -Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wno-format-truncation -Wextra-semi -march=native -MD -MT vendor/llama.cpp/common/CMakeFiles/common.dir/train.cpp.o -MF vendor/llama.cpp/common/CMakeFiles/common.dir/train.cpp.o.d -o vendor/llama.cpp/common/CMakeFiles/common.dir/train.cpp.o -c /tmp/pip-install-koypzbi9/llama-cpp-python_4eb0914d5d2f4adc9780baa1fbbb1d84/vendor/llama.cpp/common/train.cpp
  [12/22] /usr/bin/c++ -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -I/tmp/pip-install-koypzbi9/llama-cpp-python_4eb0914d5d2f4adc9780baa1fbbb1d84/vendor/llama.cpp/common/. -I/tmp/pip-install-koypzbi9/llama-cpp-python_4eb0914d5d2f4adc9780baa1fbbb1d84/vendor/llama.cpp/. -O3 -DNDEBUG -std=gnu++11 -fPIC -Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wno-format-truncation -Wextra-semi -march=native -MD -MT vendor/llama.cpp/common/CMakeFiles/common.dir/common.cpp.o -MF vendor/llama.cpp/common/CMakeFiles/common.dir/common.cpp.o.d -o vendor/llama.cpp/common/CMakeFiles/common.dir/common.cpp.o -c /tmp/pip-install-koypzbi9/llama-cpp-python_4eb0914d5d2f4adc9780baa1fbbb1d84/vendor/llama.cpp/common/common.cpp
  [13/22] /usr/bin/cc -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -I/tmp/pip-install-koypzbi9/llama-cpp-python_4eb0914d5d2f4adc9780baa1fbbb1d84/vendor/llama.cpp/. -O3 -DNDEBUG -std=gnu11 -fPIC -Wshadow -Wstrict-prototypes -Wpointer-arith -Wmissing-prototypes -Werror=implicit-int -Werror=implicit-function-declaration -Wdouble-promotion -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -march=native -MD -MT vendor/llama.cpp/CMakeFiles/ggml.dir/ggml.c.o -MF vendor/llama.cpp/CMakeFiles/ggml.dir/ggml.c.o.d -o vendor/llama.cpp/CMakeFiles/ggml.dir/ggml.c.o -c /tmp/pip-install-koypzbi9/llama-cpp-python_4eb0914d5d2f4adc9780baa1fbbb1d84/vendor/llama.cpp/ggml.c
  [14/22] : && /tmp/pip-build-env-f1vzwrup/normal/lib/python3.10/site-packages/cmake/data/bin/cmake -E rm -f vendor/llama.cpp/libggml_static.a && /usr/bin/ar qc vendor/llama.cpp/libggml_static.a  vendor/llama.cpp/CMakeFiles/ggml.dir/ggml.c.o vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-alloc.c.o vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-backend.c.o vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-quants.c.o && /usr/bin/ranlib vendor/llama.cpp/libggml_static.a && :
  [15/22] : && /usr/bin/cc -fPIC -O3 -DNDEBUG   -shared -Wl,-soname,libggml_shared.so -o vendor/llama.cpp/libggml_shared.so vendor/llama.cpp/CMakeFiles/ggml.dir/ggml.c.o vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-alloc.c.o vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-backend.c.o vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-quants.c.o   && :
  [16/22] /usr/bin/c++ -DLLAMA_BUILD -DLLAMA_SHARED -I/tmp/pip-install-koypzbi9/llama-cpp-python_4eb0914d5d2f4adc9780baa1fbbb1d84/vendor/llama.cpp/examples/llava/. -I/tmp/pip-install-koypzbi9/llama-cpp-python_4eb0914d5d2f4adc9780baa1fbbb1d84/vendor/llama.cpp/examples/llava/../.. -I/tmp/pip-install-koypzbi9/llama-cpp-python_4eb0914d5d2f4adc9780baa1fbbb1d84/vendor/llama.cpp/examples/llava/../../common -I/tmp/pip-install-koypzbi9/llama-cpp-python_4eb0914d5d2f4adc9780baa1fbbb1d84/vendor/llama.cpp/. -O3 -DNDEBUG -fPIC -Wno-cast-qual -MD -MT vendor/llama.cpp/examples/llava/CMakeFiles/llava.dir/clip.cpp.o -MF vendor/llama.cpp/examples/llava/CMakeFiles/llava.dir/clip.cpp.o.d -o vendor/llama.cpp/examples/llava/CMakeFiles/llava.dir/clip.cpp.o -c /tmp/pip-install-koypzbi9/llama-cpp-python_4eb0914d5d2f4adc9780baa1fbbb1d84/vendor/llama.cpp/examples/llava/clip.cpp
  [17/22] : && /tmp/pip-build-env-f1vzwrup/normal/lib/python3.10/site-packages/cmake/data/bin/cmake -E rm -f vendor/llama.cpp/examples/llava/libllava_static.a && /usr/bin/ar qc vendor/llama.cpp/examples/llava/libllava_static.a  vendor/llama.cpp/examples/llava/CMakeFiles/llava.dir/llava.cpp.o vendor/llama.cpp/examples/llava/CMakeFiles/llava.dir/clip.cpp.o && /usr/bin/ranlib vendor/llama.cpp/examples/llava/libllava_static.a && :
  [18/22] /usr/bin/c++ -DLLAMA_BUILD -DLLAMA_SHARED -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dllama_EXPORTS -I/tmp/pip-install-koypzbi9/llama-cpp-python_4eb0914d5d2f4adc9780baa1fbbb1d84/vendor/llama.cpp/. -O3 -DNDEBUG -std=gnu++11 -fPIC -Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wno-format-truncation -Wextra-semi -march=native -MD -MT vendor/llama.cpp/CMakeFiles/llama.dir/llama.cpp.o -MF vendor/llama.cpp/CMakeFiles/llama.dir/llama.cpp.o.d -o vendor/llama.cpp/CMakeFiles/llama.dir/llama.cpp.o -c /tmp/pip-install-koypzbi9/llama-cpp-python_4eb0914d5d2f4adc9780baa1fbbb1d84/vendor/llama.cpp/llama.cpp
  [19/22] : && /usr/bin/c++ -fPIC -O3 -DNDEBUG   -shared -Wl,-soname,libllama.so -o vendor/llama.cpp/libllama.so vendor/llama.cpp/CMakeFiles/ggml.dir/ggml.c.o vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-alloc.c.o vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-backend.c.o vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-quants.c.o vendor/llama.cpp/CMakeFiles/llama.dir/llama.cpp.o   && :
  [20/22] : && /tmp/pip-build-env-f1vzwrup/normal/lib/python3.10/site-packages/cmake/data/bin/cmake -E rm -f vendor/llama.cpp/common/libcommon.a && /usr/bin/ar qc vendor/llama.cpp/common/libcommon.a  vendor/llama.cpp/common/CMakeFiles/build_info.dir/build-info.cpp.o vendor/llama.cpp/common/CMakeFiles/common.dir/common.cpp.o vendor/llama.cpp/common/CMakeFiles/common.dir/sampling.cpp.o vendor/llama.cpp/common/CMakeFiles/common.dir/console.cpp.o vendor/llama.cpp/common/CMakeFiles/common.dir/grammar-parser.cpp.o vendor/llama.cpp/common/CMakeFiles/common.dir/train.cpp.o && /usr/bin/ranlib vendor/llama.cpp/common/libcommon.a && :
  [21/22] : && /usr/bin/c++ -fPIC -O3 -DNDEBUG   -shared -Wl,-soname,libllava.so -o vendor/llama.cpp/examples/llava/libllava.so vendor/llama.cpp/examples/llava/CMakeFiles/llava.dir/llava.cpp.o vendor/llama.cpp/examples/llava/CMakeFiles/llava.dir/clip.cpp.o vendor/llama.cpp/CMakeFiles/ggml.dir/ggml.c.o vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-alloc.c.o vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-backend.c.o vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-quants.c.o  -Wl,-rpath,/tmp/tmprlt5hbz7/build/vendor/llama.cpp:  vendor/llama.cpp/libllama.so && :
  [22/22] : && /usr/bin/c++ -O3 -DNDEBUG  vendor/llama.cpp/examples/llava/CMakeFiles/llava.dir/llava.cpp.o vendor/llama.cpp/examples/llava/CMakeFiles/llava.dir/clip.cpp.o vendor/llama.cpp/examples/llava/CMakeFiles/llava-cli.dir/llava-cli.cpp.o -o vendor/llama.cpp/examples/llava/llava-cli  -Wl,-rpath,/tmp/tmprlt5hbz7/build/vendor/llama.cpp:  vendor/llama.cpp/common/libcommon.a  vendor/llama.cpp/libllama.so && :

  *** Installing project into wheel...
  -- Install configuration: "Release"
  -- Installing: /tmp/tmprlt5hbz7/wheel/platlib/lib/libggml_shared.so
  -- Installing: /tmp/tmprlt5hbz7/wheel/platlib/lib/cmake/Llama/LlamaConfig.cmake
  -- Installing: /tmp/tmprlt5hbz7/wheel/platlib/lib/cmake/Llama/LlamaConfigVersion.cmake
  -- Installing: /tmp/tmprlt5hbz7/wheel/platlib/include/ggml.h
  -- Installing: /tmp/tmprlt5hbz7/wheel/platlib/lib/libllama.so
  -- Installing: /tmp/tmprlt5hbz7/wheel/platlib/include/llama.h
  -- Installing: /tmp/tmprlt5hbz7/wheel/platlib/bin/convert.py
  -- Installing: /tmp/tmprlt5hbz7/wheel/platlib/bin/convert-lora-to-ggml.py
  -- Installing: /tmp/tmprlt5hbz7/wheel/platlib/llama_cpp/libllama.so
  -- Installing: /tmp/pip-install-koypzbi9/llama-cpp-python_4eb0914d5d2f4adc9780baa1fbbb1d84/llama_cpp/libllama.so
  -- Installing: /tmp/tmprlt5hbz7/wheel/platlib/lib/libllava.so
  -- Set runtime path of "/tmp/tmprlt5hbz7/wheel/platlib/lib/libllava.so" to ""
  -- Installing: /tmp/tmprlt5hbz7/wheel/platlib/bin/llava-cli
  -- Set runtime path of "/tmp/tmprlt5hbz7/wheel/platlib/bin/llava-cli" to ""
  -- Installing: /tmp/tmprlt5hbz7/wheel/platlib/llama_cpp/libllava.so
  -- Set runtime path of "/tmp/tmprlt5hbz7/wheel/platlib/llama_cpp/libllava.so" to ""
  -- Installing: /tmp/pip-install-koypzbi9/llama-cpp-python_4eb0914d5d2f4adc9780baa1fbbb1d84/llama_cpp/libllava.so
  -- Set runtime path of "/tmp/pip-install-koypzbi9/llama-cpp-python_4eb0914d5d2f4adc9780baa1fbbb1d84/llama_cpp/libllava.so" to ""
  *** Making wheel...
  *** Created llama_cpp_python-0.2.16-cp310-cp310-manylinux_2_35_x86_64.whl...
  Building wheel for llama-cpp-python (pyproject.toml) ... done
  Created wheel for llama-cpp-python: filename=llama_cpp_python-0.2.16-cp310-cp310-manylinux_2_35_x86_64.whl size=1899543 sha256=b098205373e31cb0f5ab79c1df928972e31607c472e0ecee8f1a08505f9f7ca7
  Stored in directory: /tmp/pip-ephem-wheel-cache-inyrp17p/wheels/5b/2d/75/aea44211650edc2984c799575c2572b6677e561dad9f969257
Successfully built llama-cpp-python
Installing collected packages: typing-extensions, numpy, diskcache, llama-cpp-python
  Attempting uninstall: typing-extensions
    Found existing installation: typing_extensions 4.8.0
    Uninstalling typing_extensions-4.8.0:
      Removing file or directory /mnt/f/source/repos/LangchainTest/.venv/lib/python3.10/site-packages/__pycache__/typing_extensions.cpython-310.pyc
      Removing file or directory /mnt/f/source/repos/LangchainTest/.venv/lib/python3.10/site-packages/typing_extensions-4.8.0.dist-info/
      Removing file or directory /mnt/f/source/repos/LangchainTest/.venv/lib/python3.10/site-packages/typing_extensions.py
      Successfully uninstalled typing_extensions-4.8.0
  Attempting uninstall: numpy
    Found existing installation: numpy 1.26.1
    Uninstalling numpy-1.26.1:
      Removing file or directory /mnt/f/source/repos/LangchainTest/.venv/bin/f2py
      Removing file or directory /mnt/f/source/repos/LangchainTest/.venv/lib/python3.10/site-packages/numpy-1.26.1.dist-info/
      Removing file or directory /mnt/f/source/repos/LangchainTest/.venv/lib/python3.10/site-packages/numpy.libs/
      Removing file or directory /mnt/f/source/repos/LangchainTest/.venv/lib/python3.10/site-packages/numpy/
      Successfully uninstalled numpy-1.26.1
  changing mode of /mnt/f/source/repos/LangchainTest/.venv/bin/f2py to 777
  Attempting uninstall: diskcache
    Found existing installation: diskcache 5.6.3
    Uninstalling diskcache-5.6.3:
      Removing file or directory /mnt/f/source/repos/LangchainTest/.venv/lib/python3.10/site-packages/diskcache-5.6.3.dist-info/
      Removing file or directory /mnt/f/source/repos/LangchainTest/.venv/lib/python3.10/site-packages/diskcache/
      Successfully uninstalled diskcache-5.6.3
  Attempting uninstall: llama-cpp-python
    Found existing installation: llama_cpp_python 0.2.16
    Uninstalling llama_cpp_python-0.2.16:
      Removing file or directory /mnt/f/source/repos/LangchainTest/.venv/lib/python3.10/site-packages/bin/
      Removing file or directory /mnt/f/source/repos/LangchainTest/.venv/lib/python3.10/site-packages/include/
      Removing file or directory /mnt/f/source/repos/LangchainTest/.venv/lib/python3.10/site-packages/lib/
      Removing file or directory /mnt/f/source/repos/LangchainTest/.venv/lib/python3.10/site-packages/llama_cpp/
      Removing file or directory /mnt/f/source/repos/LangchainTest/.venv/lib/python3.10/site-packages/llama_cpp_python-0.2.16.dist-info/
      Successfully uninstalled llama_cpp_python-0.2.16
Successfully installed diskcache-5.6.3 llama-cpp-python-0.2.16 numpy-1.26.1 typing-extensions-4.8.0

abetlen commented 1 year ago

@KevinGage no problem, so it looks like the issue is

  -- Unable to find cudart library.
  -- Could NOT find CUDAToolkit (missing: CUDA_CUDART) (found version "12.3.52")
  CMake Warning at vendor/llama.cpp/CMakeLists.txt:305 (message):
    cuBLAS not found

Try searching the closed issues here or on llama.cpp github, if you can't find a solution please open a new issue.

KevinGage commented 1 year ago

Thanks for helping me identify the problem. I found a similar issue here https://github.com/abetlen/llama-cpp-python/issues/627 and modified my pip install command to look like this

CUDACXX=/usr/local/cuda-12.3/bin/nvcc CMAKE_ARGS="-DLLAMA_CUBLAS=on -DCUDA_PATH=/usr/local/cuda-12.3 -DCUDAToolkit_ROOT=/usr/local/cuda-12.3 -DCUDAToolkit_INCLUDE_DIR=/usr/local/cuda-12.3/include -DCUDAToolkit_LIBRARY_DIR=/usr/local/cuda-12.3/lib64 -DCMAKE_CUDA_ARCHITECTURES=native" FORCE_CMAKE=1  pip install llama-cpp-python --force-reinstall --upgrade --no-cache-dir --verbose

FINALLY after 3 days of messing with windows and WSL I am able to run with CUDA!

python -c "from llama_cpp import GGML_USE_CUBLAS; print(GGML_USE_CUBLAS)"
True

I suspect the issue is with environment variables in either windows or WSL due to multiple visual studio installs, and multiple CUDA installs on Windows. I could maybe prune back a couple of the flags in the pip command but it's working and I'm happy. Again sorry to OP for mucking up this thread. I thought I should leave my solution here for anyone in the future that stumbles on it.

bonuschild commented 1 year ago

@KevinGage But you succeed in a way that not learned from the official manual, which added with more ENV settings:

CMAKE_ARGS="-DLLAMA_CUBLAS=on \ 
CUDACXX=/usr/local/cuda-12.3/bin/nvcc \
-DCUDA_PATH=/usr/local/cuda-12.3 \
-DCUDAToolkit_ROOT=/usr/local/cuda-12.3 \
-DCUDAToolkit_INCLUDE_DIR=/usr/local/cuda-12.3/include \
-DCUDAToolkit_LIBRARY_DIR=/usr/local/cuda-12.3/lib64 \
-DCMAKE_CUDA_ARCHITECTURES=native"

From above, ONLY the LLAMA_CUBLAS=on is required as official manual, while other is not specified in official manual.

So should official maual update itself or we all need to do a pre-check of environment? @abetlen

abetlen commented 1 year ago

@bonuschild yes, we should add that to the docs with a note in case the default installation fails

bonuschild commented 1 year ago

@abetlen Thanks for hard work.

mjwweb commented 11 months ago

@KevinGage This solution worked for me. Thanks for sharing.

CUDACXX=/usr/local/cuda-12.3/bin/nvcc CMAKE_ARGS="-DLLAMA_CUBLAS=on -DCUDA_PATH=/usr/local/cuda-12.3 -DCUDAToolkit_ROOT=/usr/local/cuda-12.3 -DCUDAToolkit_INCLUDE_DIR=/usr/local/cuda-12.3/include -DCUDAToolkit_LIBRARY_DIR=/usr/local/cuda-12.3/lib64 -DCMAKE_CUDA_ARCHITECTURES=native" FORCE_CMAKE=1 pip install llama-cpp-python --force-reinstall --upgrade --no-cache-dir --verbose

abetlen / llama-cpp-python

Raise Error "no CUDA-capable device is detected" when running a `CUBLAS` compiled version of llama-cpp-python #880

Env

Problem Reproduce

`nvidia-smi` output

CUDA verifies

Issue

nvidia-smi output

CUDA verifies with torch

Cuda found in pip