Open bonuschild opened 1 year ago
I had this issue and it was because a second nvidia cuda driver was installed over the one provided by WSL. I had to run apt-get purge nvidia-cuda-toolkit
and then reset my path to point at the nvcc
binary located at /usr/local/cuda/bin/nvcc
Hope this helps
I have the same problem with versions newer than 0.2.10
@petehunt I will try it later, Thanks!
I believe that I am experiencing the same problem. @petehunt could you please provide a little more detail on how you "reset my path to point at the nvcc binary located at /usr/local/cuda/bin/nvcc"? I see the nvcc file in the location that you mention but I'm not sure how I should edit my path to point at it. Thanks in advanced for any help.
Here are the same troubleshooting steps that @bonuschild performed in case it's helpful.
Unlike the original post I can start the llama_cpp.server, but I do not end up with cuda/cublas support enabled.
Install with pip
CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install llama-cpp-python --force-reinstall --upgrade --no-cache-dir
Collecting llama-cpp-python
Downloading llama_cpp_python-0.2.16.tar.gz (7.8 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 7.8/7.8 MB 31.5 MB/s eta 0:00:00
Installing build dependencies ... done
Getting requirements to build wheel ... done
Installing backend dependencies ... done
Preparing metadata (pyproject.toml) ... done
Collecting typing-extensions>=4.5.0 (from llama-cpp-python)
Downloading typing_extensions-4.8.0-py3-none-any.whl.metadata (3.0 kB)
Collecting numpy>=1.20.0 (from llama-cpp-python)
Downloading numpy-1.26.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (61 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 61.2/61.2 kB 150.3 MB/s eta 0:00:00
Collecting diskcache>=5.6.1 (from llama-cpp-python)
Downloading diskcache-5.6.3-py3-none-any.whl.metadata (20 kB)
Downloading diskcache-5.6.3-py3-none-any.whl (45 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 45.5/45.5 kB 132.6 MB/s eta 0:00:00
Downloading numpy-1.26.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (18.2 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 18.2/18.2 MB 47.0 MB/s eta 0:00:00
Downloading typing_extensions-4.8.0-py3-none-any.whl (31 kB)
Building wheels for collected packages: llama-cpp-python
Building wheel for llama-cpp-python (pyproject.toml) ... done
Created wheel for llama-cpp-python: filename=llama_cpp_python-0.2.16-cp310-cp310-manylinux_2_35_x86_64.whl size=1899547 sha256=5edc3e0287a274697d04a5a24f5d4d114a22c4a0741be90082771099167a7e2f
Stored in directory: /tmp/pip-ephem-wheel-cache-oa78qcxu/wheels/5b/2d/75/aea44211650edc2984c799575c2572b6677e561dad9f969257
Successfully built llama-cpp-python
Installing collected packages: typing-extensions, numpy, diskcache, llama-cpp-python
Attempting uninstall: typing-extensions
Found existing installation: typing_extensions 4.8.0
Uninstalling typing_extensions-4.8.0:
Successfully uninstalled typing_extensions-4.8.0
Attempting uninstall: numpy
Found existing installation: numpy 1.26.1
Uninstalling numpy-1.26.1:
Successfully uninstalled numpy-1.26.1
Attempting uninstall: diskcache
Found existing installation: diskcache 5.6.3
Uninstalling diskcache-5.6.3:
Successfully uninstalled diskcache-5.6.3
Attempting uninstall: llama-cpp-python
Found existing installation: llama_cpp_python 0.2.16
Uninstalling llama_cpp_python-0.2.16:
Successfully uninstalled llama_cpp_python-0.2.16
Successfully installed diskcache-5.6.3 llama-cpp-python-0.2.16 numpy-1.26.1 typing-extensions-4.8.0
Start Server
python -m llama_cpp.server --model llama-2-7b-chat.Q5_K_M.gguf
llama_model_loader: loaded meta data with 19 key-value pairs and 291 tensors from llama-2-7b-chat.Q5_K_M.gguf (version GGUF V2)
...
llama_new_context_with_model: n_ctx = 2048
llama_new_context_with_model: freq_base = 10000.0
llama_new_context_with_model: freq_scale = 1
llama_new_context_with_model: kv self size = 1024.00 MB
llama_build_graph: non-view tensors processed: 740/740
llama_new_context_with_model: compute buffer total size = 162.63 MB
AVX = 1 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 0 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 |
INFO: Started server process [32769]
INFO: Waiting for application startup.
INFO: Application startup complete.
INFO: Uvicorn running on http://localhost:8000 (Press CTRL+C to quit)
Check with GGML_USE_CUBLAS
python -c "from llama_cpp import GGML_USE_CUBLAS; print(GGML_USE_CUBLAS)"
False
Fri Nov 10 11:06:35 2023
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 545.29.01 Driver Version: 546.01 CUDA Version: 12.3 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA GeForce GTX 980 Ti On | 00000000:02:00.0 On | N/A |
| 0% 39C P2 64W / 275W | 944MiB / 6144MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| No running processes found |
+---------------------------------------------------------------------------------------+
python
Python 3.10.12 (main, Jun 11 2023, 05:26:28) [GCC 11.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> torch.cuda.is_available()
True
>>> torch.cuda.get_device_properties(0)
_CudaDeviceProperties(name='NVIDIA GeForce GTX 980 Ti', major=5, minor=2, total_memory=6143MB, multi_processor_count=22)
>>>
pip list | grep cublas
nvidia-cublas-cu12 12.1.3.1
@KevinGage that looks different, you're installation did not install with CUDA, can you share the pip logs but run with the --verbose
flag, it should tell us where / if there's an issue finding the path to cuda.
@petehunt and just confirming, if you cmake build llama.cpp
standalone it doesn't encounter this error?
@KevinGage that looks different, you're installation did not install with CUDA, can you share the pip logs but run with the
--verbose
flag, it should tell us where / if there's an issue finding the path to cuda.
I'm really sorry. I hate hijacking issues that aren't related. Let me know if you would prefer that I open a new issue for this.
As requested here is the output using the --verbose flag, and yes it confirms CUDA wasn't found. Could NOT find CUDAToolkit (missing: CUDA_CUDART) (found version "12.3.52") CMake Warning at vendor/llama.cpp/CMakeLists.txt:305 (message): cuBLAS not found
Full output
CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install llama-cpp-python --force-reinstall --upgrade --no-cache-dir --verbose
Using pip 23.3.1 from /mnt/f/source/repos/LangchainTest/.venv/lib/python3.10/site-packages/pip (python 3.10)
Collecting llama-cpp-python
Downloading llama_cpp_python-0.2.16.tar.gz (7.8 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 7.8/7.8 MB 24.6 MB/s eta 0:00:00
Running command pip subprocess to install build dependencies
Collecting scikit-build-core>=0.5.1 (from scikit-build-core[pyproject]>=0.5.1)
Using cached scikit_build_core-0.6.1-py3-none-any.whl.metadata (17 kB)
Collecting exceptiongroup (from scikit-build-core>=0.5.1->scikit-build-core[pyproject]>=0.5.1)
Using cached exceptiongroup-1.1.3-py3-none-any.whl.metadata (6.1 kB)
Collecting packaging>=20.9 (from scikit-build-core>=0.5.1->scikit-build-core[pyproject]>=0.5.1)
Using cached packaging-23.2-py3-none-any.whl.metadata (3.2 kB)
Collecting tomli>=1.1 (from scikit-build-core>=0.5.1->scikit-build-core[pyproject]>=0.5.1)
Using cached tomli-2.0.1-py3-none-any.whl (12 kB)
Collecting pathspec>=0.10.1 (from scikit-build-core[pyproject]>=0.5.1)
Using cached pathspec-0.11.2-py3-none-any.whl.metadata (19 kB)
Collecting pyproject-metadata>=0.5 (from scikit-build-core[pyproject]>=0.5.1)
Using cached pyproject_metadata-0.7.1-py3-none-any.whl (7.4 kB)
Using cached scikit_build_core-0.6.1-py3-none-any.whl (134 kB)
Using cached packaging-23.2-py3-none-any.whl (53 kB)
Using cached pathspec-0.11.2-py3-none-any.whl (29 kB)
Using cached exceptiongroup-1.1.3-py3-none-any.whl (14 kB)
Installing collected packages: tomli, pathspec, packaging, exceptiongroup, scikit-build-core, pyproject-metadata
Successfully installed exceptiongroup-1.1.3 packaging-23.2 pathspec-0.11.2 pyproject-metadata-0.7.1 scikit-build-core-0.6.1 tomli-2.0.1
Installing build dependencies ... done
Running command Getting requirements to build wheel
Getting requirements to build wheel ... done
Running command pip subprocess to install backend dependencies
Collecting cmake>=3.21
Using cached cmake-3.27.7-py2.py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (6.7 kB)
Collecting ninja>=1.5
Using cached ninja-1.11.1.1-py2.py3-none-manylinux1_x86_64.manylinux_2_5_x86_64.whl.metadata (5.3 kB)
Using cached cmake-3.27.7-py2.py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (26.0 MB)
Using cached ninja-1.11.1.1-py2.py3-none-manylinux1_x86_64.manylinux_2_5_x86_64.whl (307 kB)
Installing collected packages: ninja, cmake
Successfully installed cmake-3.27.7 ninja-1.11.1.1
Installing backend dependencies ... done
Running command Preparing metadata (pyproject.toml)
*** scikit-build-core 0.6.1 using CMake 3.27.7 (metadata_wheel)
Preparing metadata (pyproject.toml) ... done
Collecting typing-extensions>=4.5.0 (from llama-cpp-python)
Obtaining dependency information for typing-extensions>=4.5.0 from https://files.pythonhosted.org/packages/24/21/7d397a4b7934ff4028987914ac1044d3b7d52712f30e2ac7a2ae5bc86dd0/typing_extensions-4.8.0-py3-none-any.whl.metadata
Downloading typing_extensions-4.8.0-py3-none-any.whl.metadata (3.0 kB)
Collecting numpy>=1.20.0 (from llama-cpp-python)
Obtaining dependency information for numpy>=1.20.0 from https://files.pythonhosted.org/packages/2d/5e/cb38e3d1916cc29880c84a9332a9122a8f49a7b57ec7aea63e0f678587a2/numpy-1.26.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata
Downloading numpy-1.26.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (61 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 61.2/61.2 kB 132.5 MB/s eta 0:00:00
Collecting diskcache>=5.6.1 (from llama-cpp-python)
Obtaining dependency information for diskcache>=5.6.1 from https://files.pythonhosted.org/packages/3f/27/4570e78fc0bf5ea0ca45eb1de3818a23787af9b390c0b0a0033a1b8236f9/diskcache-5.6.3-py3-none-any.whl.metadata
Downloading diskcache-5.6.3-py3-none-any.whl.metadata (20 kB)
Downloading diskcache-5.6.3-py3-none-any.whl (45 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 45.5/45.5 kB 120.3 MB/s eta 0:00:00
Downloading numpy-1.26.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (18.2 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 18.2/18.2 MB 60.5 MB/s eta 0:00:00
Downloading typing_extensions-4.8.0-py3-none-any.whl (31 kB)
Building wheels for collected packages: llama-cpp-python
Running command Building wheel for llama-cpp-python (pyproject.toml)
*** scikit-build-core 0.6.1 using CMake 3.27.7 (wheel)
*** Configuring CMake...
2023-11-10 11:26:24,783 - scikit_build_core - WARNING - libdir/ldlibrary: /usr/lib/x86_64-linux-gnu/libpython3.10.so is not a real file!
2023-11-10 11:26:24,783 - scikit_build_core - WARNING - Can't find a Python library, got libdir=/usr/lib/x86_64-linux-gnu, ldlibrary=libpython3.10.so, multiarch=x86_64-linux-gnu, masd=x86_64-linux-gnu
loading initial cache file /tmp/tmprlt5hbz7/build/CMakeInit.txt
-- The C compiler identification is GNU 11.4.0
-- The CXX compiler identification is GNU 11.4.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
-- Found Threads: TRUE
-- Unable to find cudart library.
-- Could NOT find CUDAToolkit (missing: CUDA_CUDART) (found version "12.3.52")
CMake Warning at vendor/llama.cpp/CMakeLists.txt:305 (message):
cuBLAS not found
-- CMAKE_SYSTEM_PROCESSOR: x86_64
-- x86 detected
CMake Warning (dev) at CMakeLists.txt:18 (install):
Target llama has PUBLIC_HEADER files but no PUBLIC_HEADER DESTINATION.
This warning is for project developers. Use -Wno-dev to suppress it.
CMake Warning (dev) at CMakeLists.txt:27 (install):
Target llama has PUBLIC_HEADER files but no PUBLIC_HEADER DESTINATION.
This warning is for project developers. Use -Wno-dev to suppress it.
-- Configuring done (3.8s)
-- Generating done (0.0s)
-- Build files have been written to: /tmp/tmprlt5hbz7/build
*** Building project with Ninja...
Change Dir: '/tmp/tmprlt5hbz7/build'
Run Build Command(s): /tmp/pip-build-env-f1vzwrup/normal/lib/python3.10/site-packages/ninja/data/bin/ninja -v
[1/22] cd /tmp/pip-install-koypzbi9/llama-cpp-python_4eb0914d5d2f4adc9780baa1fbbb1d84/vendor/llama.cpp && /tmp/pip-build-env-f1vzwrup/normal/lib/python3.10/site-packages/cmake/data/bin/cmake -DMSVC= -DCMAKE_C_COMPILER_VERSION=11.4.0 -DCMAKE_C_COMPILER_ID=GNU -DCMAKE_VS_PLATFORM_NAME= -DCMAKE_C_COMPILER=/usr/bin/cc -P /tmp/pip-install-koypzbi9/llama-cpp-python_4eb0914d5d2f4adc9780baa1fbbb1d84/vendor/llama.cpp/common/../scripts/build-info.cmake
-- Found Git: /usr/bin/git (found version "2.34.1")
[2/22] /usr/bin/c++ -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -O3 -DNDEBUG -std=gnu++11 -fPIC -Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wno-format-truncation -Wextra-semi -march=native -MD -MT vendor/llama.cpp/common/CMakeFiles/build_info.dir/build-info.cpp.o -MF vendor/llama.cpp/common/CMakeFiles/build_info.dir/build-info.cpp.o.d -o vendor/llama.cpp/common/CMakeFiles/build_info.dir/build-info.cpp.o -c /tmp/pip-install-koypzbi9/llama-cpp-python_4eb0914d5d2f4adc9780baa1fbbb1d84/vendor/llama.cpp/common/build-info.cpp
[3/22] /usr/bin/cc -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -I/tmp/pip-install-koypzbi9/llama-cpp-python_4eb0914d5d2f4adc9780baa1fbbb1d84/vendor/llama.cpp/. -O3 -DNDEBUG -std=gnu11 -fPIC -Wshadow -Wstrict-prototypes -Wpointer-arith -Wmissing-prototypes -Werror=implicit-int -Werror=implicit-function-declaration -Wdouble-promotion -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -march=native -MD -MT vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-backend.c.o -MF vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-backend.c.o.d -o vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-backend.c.o -c /tmp/pip-install-koypzbi9/llama-cpp-python_4eb0914d5d2f4adc9780baa1fbbb1d84/vendor/llama.cpp/ggml-backend.c
[4/22] /usr/bin/cc -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -I/tmp/pip-install-koypzbi9/llama-cpp-python_4eb0914d5d2f4adc9780baa1fbbb1d84/vendor/llama.cpp/. -O3 -DNDEBUG -std=gnu11 -fPIC -Wshadow -Wstrict-prototypes -Wpointer-arith -Wmissing-prototypes -Werror=implicit-int -Werror=implicit-function-declaration -Wdouble-promotion -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -march=native -MD -MT vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-alloc.c.o -MF vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-alloc.c.o.d -o vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-alloc.c.o -c /tmp/pip-install-koypzbi9/llama-cpp-python_4eb0914d5d2f4adc9780baa1fbbb1d84/vendor/llama.cpp/ggml-alloc.c
[5/22] /usr/bin/c++ -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -I/tmp/pip-install-koypzbi9/llama-cpp-python_4eb0914d5d2f4adc9780baa1fbbb1d84/vendor/llama.cpp/common/. -I/tmp/pip-install-koypzbi9/llama-cpp-python_4eb0914d5d2f4adc9780baa1fbbb1d84/vendor/llama.cpp/. -O3 -DNDEBUG -std=gnu++11 -fPIC -Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wno-format-truncation -Wextra-semi -march=native -MD -MT vendor/llama.cpp/common/CMakeFiles/common.dir/console.cpp.o -MF vendor/llama.cpp/common/CMakeFiles/common.dir/console.cpp.o.d -o vendor/llama.cpp/common/CMakeFiles/common.dir/console.cpp.o -c /tmp/pip-install-koypzbi9/llama-cpp-python_4eb0914d5d2f4adc9780baa1fbbb1d84/vendor/llama.cpp/common/console.cpp
[6/22] /usr/bin/c++ -DLLAMA_BUILD -DLLAMA_SHARED -I/tmp/pip-install-koypzbi9/llama-cpp-python_4eb0914d5d2f4adc9780baa1fbbb1d84/vendor/llama.cpp/examples/llava/. -I/tmp/pip-install-koypzbi9/llama-cpp-python_4eb0914d5d2f4adc9780baa1fbbb1d84/vendor/llama.cpp/examples/llava/../.. -I/tmp/pip-install-koypzbi9/llama-cpp-python_4eb0914d5d2f4adc9780baa1fbbb1d84/vendor/llama.cpp/examples/llava/../../common -I/tmp/pip-install-koypzbi9/llama-cpp-python_4eb0914d5d2f4adc9780baa1fbbb1d84/vendor/llama.cpp/. -O3 -DNDEBUG -fPIC -Wno-cast-qual -MD -MT vendor/llama.cpp/examples/llava/CMakeFiles/llava.dir/llava.cpp.o -MF vendor/llama.cpp/examples/llava/CMakeFiles/llava.dir/llava.cpp.o.d -o vendor/llama.cpp/examples/llava/CMakeFiles/llava.dir/llava.cpp.o -c /tmp/pip-install-koypzbi9/llama-cpp-python_4eb0914d5d2f4adc9780baa1fbbb1d84/vendor/llama.cpp/examples/llava/llava.cpp
/tmp/pip-install-koypzbi9/llama-cpp-python_4eb0914d5d2f4adc9780baa1fbbb1d84/vendor/llama.cpp/examples/llava/llava.cpp: In function ‘bool load_file_to_bytes(const char*, unsigned char**, long int*)’:
/tmp/pip-install-koypzbi9/llama-cpp-python_4eb0914d5d2f4adc9780baa1fbbb1d84/vendor/llama.cpp/examples/llava/llava.cpp:130:10: warning: ignoring return value of ‘size_t fread(void*, size_t, size_t, FILE*)’ declared with attribute ‘warn_unused_result’ [-Wunused-result]
130 | fread(buffer, 1, fileSize, file); // Read the file into the buffer
| ~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~
[7/22] /usr/bin/c++ -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -I/tmp/pip-install-koypzbi9/llama-cpp-python_4eb0914d5d2f4adc9780baa1fbbb1d84/vendor/llama.cpp/common/. -I/tmp/pip-install-koypzbi9/llama-cpp-python_4eb0914d5d2f4adc9780baa1fbbb1d84/vendor/llama.cpp/. -O3 -DNDEBUG -std=gnu++11 -fPIC -Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wno-format-truncation -Wextra-semi -march=native -MD -MT vendor/llama.cpp/common/CMakeFiles/common.dir/sampling.cpp.o -MF vendor/llama.cpp/common/CMakeFiles/common.dir/sampling.cpp.o.d -o vendor/llama.cpp/common/CMakeFiles/common.dir/sampling.cpp.o -c /tmp/pip-install-koypzbi9/llama-cpp-python_4eb0914d5d2f4adc9780baa1fbbb1d84/vendor/llama.cpp/common/sampling.cpp
[8/22] /usr/bin/c++ -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -I/tmp/pip-install-koypzbi9/llama-cpp-python_4eb0914d5d2f4adc9780baa1fbbb1d84/vendor/llama.cpp/common/. -I/tmp/pip-install-koypzbi9/llama-cpp-python_4eb0914d5d2f4adc9780baa1fbbb1d84/vendor/llama.cpp/. -O3 -DNDEBUG -std=gnu++11 -fPIC -Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wno-format-truncation -Wextra-semi -march=native -MD -MT vendor/llama.cpp/common/CMakeFiles/common.dir/grammar-parser.cpp.o -MF vendor/llama.cpp/common/CMakeFiles/common.dir/grammar-parser.cpp.o.d -o vendor/llama.cpp/common/CMakeFiles/common.dir/grammar-parser.cpp.o -c /tmp/pip-install-koypzbi9/llama-cpp-python_4eb0914d5d2f4adc9780baa1fbbb1d84/vendor/llama.cpp/common/grammar-parser.cpp
[9/22] /usr/bin/cc -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -I/tmp/pip-install-koypzbi9/llama-cpp-python_4eb0914d5d2f4adc9780baa1fbbb1d84/vendor/llama.cpp/. -O3 -DNDEBUG -std=gnu11 -fPIC -Wshadow -Wstrict-prototypes -Wpointer-arith -Wmissing-prototypes -Werror=implicit-int -Werror=implicit-function-declaration -Wdouble-promotion -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -march=native -MD -MT vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-quants.c.o -MF vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-quants.c.o.d -o vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-quants.c.o -c /tmp/pip-install-koypzbi9/llama-cpp-python_4eb0914d5d2f4adc9780baa1fbbb1d84/vendor/llama.cpp/ggml-quants.c
[10/22] /usr/bin/c++ -I/tmp/pip-install-koypzbi9/llama-cpp-python_4eb0914d5d2f4adc9780baa1fbbb1d84/vendor/llama.cpp/common/. -I/tmp/pip-install-koypzbi9/llama-cpp-python_4eb0914d5d2f4adc9780baa1fbbb1d84/vendor/llama.cpp/. -I/tmp/pip-install-koypzbi9/llama-cpp-python_4eb0914d5d2f4adc9780baa1fbbb1d84/vendor/llama.cpp/examples/llava/. -I/tmp/pip-install-koypzbi9/llama-cpp-python_4eb0914d5d2f4adc9780baa1fbbb1d84/vendor/llama.cpp/examples/llava/../.. -I/tmp/pip-install-koypzbi9/llama-cpp-python_4eb0914d5d2f4adc9780baa1fbbb1d84/vendor/llama.cpp/examples/llava/../../common -O3 -DNDEBUG -MD -MT vendor/llama.cpp/examples/llava/CMakeFiles/llava-cli.dir/llava-cli.cpp.o -MF vendor/llama.cpp/examples/llava/CMakeFiles/llava-cli.dir/llava-cli.cpp.o.d -o vendor/llama.cpp/examples/llava/CMakeFiles/llava-cli.dir/llava-cli.cpp.o -c /tmp/pip-install-koypzbi9/llama-cpp-python_4eb0914d5d2f4adc9780baa1fbbb1d84/vendor/llama.cpp/examples/llava/llava-cli.cpp
[11/22] /usr/bin/c++ -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -I/tmp/pip-install-koypzbi9/llama-cpp-python_4eb0914d5d2f4adc9780baa1fbbb1d84/vendor/llama.cpp/common/. -I/tmp/pip-install-koypzbi9/llama-cpp-python_4eb0914d5d2f4adc9780baa1fbbb1d84/vendor/llama.cpp/. -O3 -DNDEBUG -std=gnu++11 -fPIC -Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wno-format-truncation -Wextra-semi -march=native -MD -MT vendor/llama.cpp/common/CMakeFiles/common.dir/train.cpp.o -MF vendor/llama.cpp/common/CMakeFiles/common.dir/train.cpp.o.d -o vendor/llama.cpp/common/CMakeFiles/common.dir/train.cpp.o -c /tmp/pip-install-koypzbi9/llama-cpp-python_4eb0914d5d2f4adc9780baa1fbbb1d84/vendor/llama.cpp/common/train.cpp
[12/22] /usr/bin/c++ -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -I/tmp/pip-install-koypzbi9/llama-cpp-python_4eb0914d5d2f4adc9780baa1fbbb1d84/vendor/llama.cpp/common/. -I/tmp/pip-install-koypzbi9/llama-cpp-python_4eb0914d5d2f4adc9780baa1fbbb1d84/vendor/llama.cpp/. -O3 -DNDEBUG -std=gnu++11 -fPIC -Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wno-format-truncation -Wextra-semi -march=native -MD -MT vendor/llama.cpp/common/CMakeFiles/common.dir/common.cpp.o -MF vendor/llama.cpp/common/CMakeFiles/common.dir/common.cpp.o.d -o vendor/llama.cpp/common/CMakeFiles/common.dir/common.cpp.o -c /tmp/pip-install-koypzbi9/llama-cpp-python_4eb0914d5d2f4adc9780baa1fbbb1d84/vendor/llama.cpp/common/common.cpp
[13/22] /usr/bin/cc -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -I/tmp/pip-install-koypzbi9/llama-cpp-python_4eb0914d5d2f4adc9780baa1fbbb1d84/vendor/llama.cpp/. -O3 -DNDEBUG -std=gnu11 -fPIC -Wshadow -Wstrict-prototypes -Wpointer-arith -Wmissing-prototypes -Werror=implicit-int -Werror=implicit-function-declaration -Wdouble-promotion -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -march=native -MD -MT vendor/llama.cpp/CMakeFiles/ggml.dir/ggml.c.o -MF vendor/llama.cpp/CMakeFiles/ggml.dir/ggml.c.o.d -o vendor/llama.cpp/CMakeFiles/ggml.dir/ggml.c.o -c /tmp/pip-install-koypzbi9/llama-cpp-python_4eb0914d5d2f4adc9780baa1fbbb1d84/vendor/llama.cpp/ggml.c
[14/22] : && /tmp/pip-build-env-f1vzwrup/normal/lib/python3.10/site-packages/cmake/data/bin/cmake -E rm -f vendor/llama.cpp/libggml_static.a && /usr/bin/ar qc vendor/llama.cpp/libggml_static.a vendor/llama.cpp/CMakeFiles/ggml.dir/ggml.c.o vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-alloc.c.o vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-backend.c.o vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-quants.c.o && /usr/bin/ranlib vendor/llama.cpp/libggml_static.a && :
[15/22] : && /usr/bin/cc -fPIC -O3 -DNDEBUG -shared -Wl,-soname,libggml_shared.so -o vendor/llama.cpp/libggml_shared.so vendor/llama.cpp/CMakeFiles/ggml.dir/ggml.c.o vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-alloc.c.o vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-backend.c.o vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-quants.c.o && :
[16/22] /usr/bin/c++ -DLLAMA_BUILD -DLLAMA_SHARED -I/tmp/pip-install-koypzbi9/llama-cpp-python_4eb0914d5d2f4adc9780baa1fbbb1d84/vendor/llama.cpp/examples/llava/. -I/tmp/pip-install-koypzbi9/llama-cpp-python_4eb0914d5d2f4adc9780baa1fbbb1d84/vendor/llama.cpp/examples/llava/../.. -I/tmp/pip-install-koypzbi9/llama-cpp-python_4eb0914d5d2f4adc9780baa1fbbb1d84/vendor/llama.cpp/examples/llava/../../common -I/tmp/pip-install-koypzbi9/llama-cpp-python_4eb0914d5d2f4adc9780baa1fbbb1d84/vendor/llama.cpp/. -O3 -DNDEBUG -fPIC -Wno-cast-qual -MD -MT vendor/llama.cpp/examples/llava/CMakeFiles/llava.dir/clip.cpp.o -MF vendor/llama.cpp/examples/llava/CMakeFiles/llava.dir/clip.cpp.o.d -o vendor/llama.cpp/examples/llava/CMakeFiles/llava.dir/clip.cpp.o -c /tmp/pip-install-koypzbi9/llama-cpp-python_4eb0914d5d2f4adc9780baa1fbbb1d84/vendor/llama.cpp/examples/llava/clip.cpp
[17/22] : && /tmp/pip-build-env-f1vzwrup/normal/lib/python3.10/site-packages/cmake/data/bin/cmake -E rm -f vendor/llama.cpp/examples/llava/libllava_static.a && /usr/bin/ar qc vendor/llama.cpp/examples/llava/libllava_static.a vendor/llama.cpp/examples/llava/CMakeFiles/llava.dir/llava.cpp.o vendor/llama.cpp/examples/llava/CMakeFiles/llava.dir/clip.cpp.o && /usr/bin/ranlib vendor/llama.cpp/examples/llava/libllava_static.a && :
[18/22] /usr/bin/c++ -DLLAMA_BUILD -DLLAMA_SHARED -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dllama_EXPORTS -I/tmp/pip-install-koypzbi9/llama-cpp-python_4eb0914d5d2f4adc9780baa1fbbb1d84/vendor/llama.cpp/. -O3 -DNDEBUG -std=gnu++11 -fPIC -Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wno-format-truncation -Wextra-semi -march=native -MD -MT vendor/llama.cpp/CMakeFiles/llama.dir/llama.cpp.o -MF vendor/llama.cpp/CMakeFiles/llama.dir/llama.cpp.o.d -o vendor/llama.cpp/CMakeFiles/llama.dir/llama.cpp.o -c /tmp/pip-install-koypzbi9/llama-cpp-python_4eb0914d5d2f4adc9780baa1fbbb1d84/vendor/llama.cpp/llama.cpp
[19/22] : && /usr/bin/c++ -fPIC -O3 -DNDEBUG -shared -Wl,-soname,libllama.so -o vendor/llama.cpp/libllama.so vendor/llama.cpp/CMakeFiles/ggml.dir/ggml.c.o vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-alloc.c.o vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-backend.c.o vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-quants.c.o vendor/llama.cpp/CMakeFiles/llama.dir/llama.cpp.o && :
[20/22] : && /tmp/pip-build-env-f1vzwrup/normal/lib/python3.10/site-packages/cmake/data/bin/cmake -E rm -f vendor/llama.cpp/common/libcommon.a && /usr/bin/ar qc vendor/llama.cpp/common/libcommon.a vendor/llama.cpp/common/CMakeFiles/build_info.dir/build-info.cpp.o vendor/llama.cpp/common/CMakeFiles/common.dir/common.cpp.o vendor/llama.cpp/common/CMakeFiles/common.dir/sampling.cpp.o vendor/llama.cpp/common/CMakeFiles/common.dir/console.cpp.o vendor/llama.cpp/common/CMakeFiles/common.dir/grammar-parser.cpp.o vendor/llama.cpp/common/CMakeFiles/common.dir/train.cpp.o && /usr/bin/ranlib vendor/llama.cpp/common/libcommon.a && :
[21/22] : && /usr/bin/c++ -fPIC -O3 -DNDEBUG -shared -Wl,-soname,libllava.so -o vendor/llama.cpp/examples/llava/libllava.so vendor/llama.cpp/examples/llava/CMakeFiles/llava.dir/llava.cpp.o vendor/llama.cpp/examples/llava/CMakeFiles/llava.dir/clip.cpp.o vendor/llama.cpp/CMakeFiles/ggml.dir/ggml.c.o vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-alloc.c.o vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-backend.c.o vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-quants.c.o -Wl,-rpath,/tmp/tmprlt5hbz7/build/vendor/llama.cpp: vendor/llama.cpp/libllama.so && :
[22/22] : && /usr/bin/c++ -O3 -DNDEBUG vendor/llama.cpp/examples/llava/CMakeFiles/llava.dir/llava.cpp.o vendor/llama.cpp/examples/llava/CMakeFiles/llava.dir/clip.cpp.o vendor/llama.cpp/examples/llava/CMakeFiles/llava-cli.dir/llava-cli.cpp.o -o vendor/llama.cpp/examples/llava/llava-cli -Wl,-rpath,/tmp/tmprlt5hbz7/build/vendor/llama.cpp: vendor/llama.cpp/common/libcommon.a vendor/llama.cpp/libllama.so && :
*** Installing project into wheel...
-- Install configuration: "Release"
-- Installing: /tmp/tmprlt5hbz7/wheel/platlib/lib/libggml_shared.so
-- Installing: /tmp/tmprlt5hbz7/wheel/platlib/lib/cmake/Llama/LlamaConfig.cmake
-- Installing: /tmp/tmprlt5hbz7/wheel/platlib/lib/cmake/Llama/LlamaConfigVersion.cmake
-- Installing: /tmp/tmprlt5hbz7/wheel/platlib/include/ggml.h
-- Installing: /tmp/tmprlt5hbz7/wheel/platlib/lib/libllama.so
-- Installing: /tmp/tmprlt5hbz7/wheel/platlib/include/llama.h
-- Installing: /tmp/tmprlt5hbz7/wheel/platlib/bin/convert.py
-- Installing: /tmp/tmprlt5hbz7/wheel/platlib/bin/convert-lora-to-ggml.py
-- Installing: /tmp/tmprlt5hbz7/wheel/platlib/llama_cpp/libllama.so
-- Installing: /tmp/pip-install-koypzbi9/llama-cpp-python_4eb0914d5d2f4adc9780baa1fbbb1d84/llama_cpp/libllama.so
-- Installing: /tmp/tmprlt5hbz7/wheel/platlib/lib/libllava.so
-- Set runtime path of "/tmp/tmprlt5hbz7/wheel/platlib/lib/libllava.so" to ""
-- Installing: /tmp/tmprlt5hbz7/wheel/platlib/bin/llava-cli
-- Set runtime path of "/tmp/tmprlt5hbz7/wheel/platlib/bin/llava-cli" to ""
-- Installing: /tmp/tmprlt5hbz7/wheel/platlib/llama_cpp/libllava.so
-- Set runtime path of "/tmp/tmprlt5hbz7/wheel/platlib/llama_cpp/libllava.so" to ""
-- Installing: /tmp/pip-install-koypzbi9/llama-cpp-python_4eb0914d5d2f4adc9780baa1fbbb1d84/llama_cpp/libllava.so
-- Set runtime path of "/tmp/pip-install-koypzbi9/llama-cpp-python_4eb0914d5d2f4adc9780baa1fbbb1d84/llama_cpp/libllava.so" to ""
*** Making wheel...
*** Created llama_cpp_python-0.2.16-cp310-cp310-manylinux_2_35_x86_64.whl...
Building wheel for llama-cpp-python (pyproject.toml) ... done
Created wheel for llama-cpp-python: filename=llama_cpp_python-0.2.16-cp310-cp310-manylinux_2_35_x86_64.whl size=1899543 sha256=b098205373e31cb0f5ab79c1df928972e31607c472e0ecee8f1a08505f9f7ca7
Stored in directory: /tmp/pip-ephem-wheel-cache-inyrp17p/wheels/5b/2d/75/aea44211650edc2984c799575c2572b6677e561dad9f969257
Successfully built llama-cpp-python
Installing collected packages: typing-extensions, numpy, diskcache, llama-cpp-python
Attempting uninstall: typing-extensions
Found existing installation: typing_extensions 4.8.0
Uninstalling typing_extensions-4.8.0:
Removing file or directory /mnt/f/source/repos/LangchainTest/.venv/lib/python3.10/site-packages/__pycache__/typing_extensions.cpython-310.pyc
Removing file or directory /mnt/f/source/repos/LangchainTest/.venv/lib/python3.10/site-packages/typing_extensions-4.8.0.dist-info/
Removing file or directory /mnt/f/source/repos/LangchainTest/.venv/lib/python3.10/site-packages/typing_extensions.py
Successfully uninstalled typing_extensions-4.8.0
Attempting uninstall: numpy
Found existing installation: numpy 1.26.1
Uninstalling numpy-1.26.1:
Removing file or directory /mnt/f/source/repos/LangchainTest/.venv/bin/f2py
Removing file or directory /mnt/f/source/repos/LangchainTest/.venv/lib/python3.10/site-packages/numpy-1.26.1.dist-info/
Removing file or directory /mnt/f/source/repos/LangchainTest/.venv/lib/python3.10/site-packages/numpy.libs/
Removing file or directory /mnt/f/source/repos/LangchainTest/.venv/lib/python3.10/site-packages/numpy/
Successfully uninstalled numpy-1.26.1
changing mode of /mnt/f/source/repos/LangchainTest/.venv/bin/f2py to 777
Attempting uninstall: diskcache
Found existing installation: diskcache 5.6.3
Uninstalling diskcache-5.6.3:
Removing file or directory /mnt/f/source/repos/LangchainTest/.venv/lib/python3.10/site-packages/diskcache-5.6.3.dist-info/
Removing file or directory /mnt/f/source/repos/LangchainTest/.venv/lib/python3.10/site-packages/diskcache/
Successfully uninstalled diskcache-5.6.3
Attempting uninstall: llama-cpp-python
Found existing installation: llama_cpp_python 0.2.16
Uninstalling llama_cpp_python-0.2.16:
Removing file or directory /mnt/f/source/repos/LangchainTest/.venv/lib/python3.10/site-packages/bin/
Removing file or directory /mnt/f/source/repos/LangchainTest/.venv/lib/python3.10/site-packages/include/
Removing file or directory /mnt/f/source/repos/LangchainTest/.venv/lib/python3.10/site-packages/lib/
Removing file or directory /mnt/f/source/repos/LangchainTest/.venv/lib/python3.10/site-packages/llama_cpp/
Removing file or directory /mnt/f/source/repos/LangchainTest/.venv/lib/python3.10/site-packages/llama_cpp_python-0.2.16.dist-info/
Successfully uninstalled llama_cpp_python-0.2.16
Successfully installed diskcache-5.6.3 llama-cpp-python-0.2.16 numpy-1.26.1 typing-extensions-4.8.0
@KevinGage no problem, so it looks like the issue is
-- Unable to find cudart library.
-- Could NOT find CUDAToolkit (missing: CUDA_CUDART) (found version "12.3.52")
CMake Warning at vendor/llama.cpp/CMakeLists.txt:305 (message):
cuBLAS not found
Try searching the closed issues here or on llama.cpp github, if you can't find a solution please open a new issue.
Thanks for helping me identify the problem. I found a similar issue here https://github.com/abetlen/llama-cpp-python/issues/627 and modified my pip install command to look like this
CUDACXX=/usr/local/cuda-12.3/bin/nvcc CMAKE_ARGS="-DLLAMA_CUBLAS=on -DCUDA_PATH=/usr/local/cuda-12.3 -DCUDAToolkit_ROOT=/usr/local/cuda-12.3 -DCUDAToolkit_INCLUDE_DIR=/usr/local/cuda-12.3/include -DCUDAToolkit_LIBRARY_DIR=/usr/local/cuda-12.3/lib64 -DCMAKE_CUDA_ARCHITECTURES=native" FORCE_CMAKE=1 pip install llama-cpp-python --force-reinstall --upgrade --no-cache-dir --verbose
FINALLY after 3 days of messing with windows and WSL I am able to run with CUDA!
python -c "from llama_cpp import GGML_USE_CUBLAS; print(GGML_USE_CUBLAS)"
True
I suspect the issue is with environment variables in either windows or WSL due to multiple visual studio installs, and multiple CUDA installs on Windows. I could maybe prune back a couple of the flags in the pip command but it's working and I'm happy. Again sorry to OP for mucking up this thread. I thought I should leave my solution here for anyone in the future that stumbles on it.
@KevinGage But you succeed in a way that not learned from the official manual, which added with more ENV settings:
CMAKE_ARGS="-DLLAMA_CUBLAS=on \
CUDACXX=/usr/local/cuda-12.3/bin/nvcc \
-DCUDA_PATH=/usr/local/cuda-12.3 \
-DCUDAToolkit_ROOT=/usr/local/cuda-12.3 \
-DCUDAToolkit_INCLUDE_DIR=/usr/local/cuda-12.3/include \
-DCUDAToolkit_LIBRARY_DIR=/usr/local/cuda-12.3/lib64 \
-DCMAKE_CUDA_ARCHITECTURES=native"
From above, ONLY the LLAMA_CUBLAS=on
is required as official manual, while other is not specified in official manual.
So should official maual update itself or we all need to do a pre-check of environment? @abetlen
@bonuschild yes, we should add that to the docs with a note in case the default installation fails
@abetlen Thanks for hard work.
@KevinGage This solution worked for me. Thanks for sharing.
CUDACXX=/usr/local/cuda-12.3/bin/nvcc CMAKE_ARGS="-DLLAMA_CUBLAS=on -DCUDA_PATH=/usr/local/cuda-12.3 -DCUDAToolkit_ROOT=/usr/local/cuda-12.3 -DCUDAToolkit_INCLUDE_DIR=/usr/local/cuda-12.3/include -DCUDAToolkit_LIBRARY_DIR=/usr/local/cuda-12.3/lib64 -DCMAKE_CUDA_ARCHITECTURES=native" FORCE_CMAKE=1 pip install llama-cpp-python --force-reinstall --upgrade --no-cache-dir --verbose
Env
pip install torch torchvison torchaudio
, which will installnvidia-cuda-xxx
as well.CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install llama-cpp-python[server] --force-reinstall --upgrade --no-cache-dir
Problem Reproduce
Execute
python -m llama_cpp.server --model yarn-mistral-7b-128k.Q5_K_M.gguf
with error:nvidia-smi
outputCUDA verifies
with torch:
with pip: