Trying to Install GPU Version - Getting CMake Error With _CMAKE_CUDA_WHOLE_FLAG

jpagitrdone commented 5 months ago

I'm attempting to install llama-cpp-python with GPU enabled on my Windows 11 work computer but am encountering some issues at the very end. The error that I am receiving is "CMake Error: Error required internal CMark variable not set, cmake may not be built correctly. Missing variable is: _CMAKE_CUDA_WHOLE_FLAG". I am able to get the CPU version to install just fine though. I am using MinGW64 since I am not able to install Visual Studio at work due to restrictions (very frustrating, but I have to work within the confines of what I can do). I do have Nvidia CUDA Toolkit 11.8 installed (that's the latest version we're allowed to install right now). Here are some of my path args for reference purposes, can someone take a peek into this and let me know if anything stands out that would cause this error to happen? I started experimenting with some of these in an attempt to resolve so you may see some usual/duplicative ones.

CC = C:\ProgramData\mingw64\mingw64\bin\gcc.exe CMAKE_C_COMPILER = C:\ProgramData\mingw64\mingw64\bin\gcc.exe CMAKE_CXX_COMPILER = C:\ProgramData\mingw64\mingw64\bin\g++.exe CMAKE_GENERATOR = MingGW Makefiles CUDA_PATH = C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.8 CUDA_PATH_V11_8 = C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.8 CUDAC = C:\ProgramData\mingw64\mingw64\bin\gcc.exe CUDACXX = C:\ProgramData\mingw64\mingw64\bin\g++.exe CXX = C:\ProgramData\mingw64\mingw64\bin\g++.exe CMARK_ARGS = -DLLAMA_CUDA=on -DCMAKE_CUDA_ARCHITECTURES=all-major -DCMAKE_C_COMPILER=C:/ProgramData/mingw64/mingw64/bin/gcc.exe -DCMAKE_CXX_COMPILER=C:/ProgramData/mingw64/mingw64/bin/g++.exe -G "MinGW Makefiles"

In PATH, I do have the following added too: C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.8\bin C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.8\libnvvp

Installation command that I am trying: pip install llama-cpp-python --force-reinstall --upgrade --no-cache-dir --verbose --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cu121

This seems to install without any errors during pip installation, but for some reason when I go to use llama_cpp library in Python, it gives me the error message of: "Could not find module 'C:\Users\USERNAME\Documents\VENVs\llama3\Lib\site-packages\llama_cpp\llama.dll' (or one of its dependencies). Try using the full path with constructor syntax". This is despite the fact that I can see this dll in that exact directory. I have tried the following with no luck based on other people having the same issue:

-using the os.add_dll_directory(os.path.join['CUDA_PATH'], 'bin')) workaround before importing llama_cpp -Also added the direct path to the llama.dll file itself and the directory using the same above os.add_dll_directory statement -Tried adding these to my system/user PATH variable along with the direct pathway to the nvcc.exe location -Tried modifying llama_cpp_python.py to augment the return ctypes.CDLL(str(_lib_path), **cdll_args) statement to only be return ctypes.CDLL(str(_lib_path))

and

pip install llama-cpp-python --force-reinstall --upgrade --no-cache-dir --verbose

Fail Log: Using pip 22.3 from C:\Users\USERNAME\Documents\VENVs\llama3\Lib\site-packages\pip (python 3.11) Collecting llama-cpp-python Downloading llama_cpp_python-0.2.77.tar.gz (50.2 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50.2/50.2 MB 50.3 MB/s eta 0:00:00 Running command pip subprocess to install build dependencies Collecting scikit-build-core[pyproject]>=0.9.2 Using cached scikit_build_core-0.9.5-py3-none-any.whl (152 kB) Collecting packaging>=21.3 Using cached packaging-24.0-py3-none-any.whl (53 kB) Collecting pathspec>=0.10.1 Using cached pathspec-0.12.1-py3-none-any.whl (31 kB) Installing collected packages: pathspec, packaging, scikit-build-core Successfully installed packaging-24.0 pathspec-0.12.1 scikit-build-core-0.9.5

[notice] A new release of pip available: 22.3 -> 24.0 [notice] To update, run: python.exe -m pip install --upgrade pip Installing build dependencies ... done Running command Getting requirements to build wheel Getting requirements to build wheel ... done Running command pip subprocess to install backend dependencies Collecting cmake>=3.21 Using cached cmake-3.29.3-py3-none-win_amd64.whl (36.2 MB) Installing collected packages: cmake Successfully installed cmake-3.29.3

[notice] A new release of pip available: 22.3 -> 24.0 [notice] To update, run: python.exe -m pip install --upgrade pip Installing backend dependencies ... done Running command Preparing metadata (pyproject.toml) scikit-build-core 0.9.5 using CMake 3.29.3 (metadata_wheel) Preparing metadata (pyproject.toml) ... done Collecting typing-extensions>=4.5.0 Downloading typing_extensions-4.12.1-py3-none-any.whl (37 kB) Link requires a different Python (3.11.0 not in: '>=3.7,<3.11'): https://files.pythonhosted.org/packages/3a/be/650f9c091ef71cb01d735775d554e068752d3ff63d7943b26316dc401749/numpy-1.21.2.zip (from https://pypi.org/simple/numpy/) (requires-python:>=3.7,<3.11) Link requires a different Python (3.11.0 not in: '>=3.7,<3.11'): https://files.pythonhosted.org/packages/5f/d6/ad58ded26556eaeaa8c971e08b6466f17c4ac4d786cd3d800e26ce59cc01/numpy-1.21.3.zip (from https://pypi.org/simple/numpy/) (requires-python:>=3.7,<3.11) Link requires a different Python (3.11.0 not in: '>=3.7,<3.11'): https://files.pythonhosted.org/packages/fb/48/b0708ebd7718a8933f0d3937513ef8ef2f4f04529f1f66ca86d873043921/numpy-1.21.4.zip (from https://pypi.org/simple/numpy/) (requires-python:>=3.7,<3.11) Link requires a different Python (3.11.0 not in: '>=3.7,<3.11'): https://files.pythonhosted.org/packages/c2/a8/a924a09492bdfee8c2ec3094d0a13f2799800b4fdc9c890738aeeb12c72e/numpy-1.21.5.zip (from https://pypi.org/simple/numpy/) (requires-python:>=3.7,<3.11) Link requires a different Python (3.11.0 not in: '>=3.7,<3.11'): https://files.pythonhosted.org/packages/45/b7/de7b8e67f2232c26af57c205aaad29fe17754f793404f59c8a730c7a191a/numpy-1.21.6.zip (from https://pypi.org/simple/numpy/) (requires-python:>=3.7,<3.11) Collecting numpy>=1.20.0 Downloading numpy-1.26.4-cp311-cp311-win_amd64.whl (15.8 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 15.8/15.8 MB 43.7 MB/s eta 0:00:00 Collecting diskcache>=5.6.1 Downloading diskcache-5.6.3-py3-none-any.whl (45 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 45.5/45.5 kB ? eta 0:00:00 Collecting jinja2>=2.11.3 Downloading jinja2-3.1.4-py3-none-any.whl (133 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 133.3/133.3 kB 8.2 MB/s eta 0:00:00 Collecting MarkupSafe>=2.0 Downloading MarkupSafe-2.1.5-cp311-cp311-win_amd64.whl (17 kB) Building wheels for collected packages: llama-cpp-python Running command Building wheel for llama-cpp-python (pyproject.toml) scikit-build-core 0.9.5 using CMake 3.29.3 (wheel) *** Configuring CMake... 2024-06-05 13:42:28,854 - scikit_build_core - WARNING - Can't find a Python library, got libdir=None, ldlibrary=None, multiarch=None, masd=None loading initial cache file C:\Users\USERNAME\AppData\Local\Temp\tmphce2sa7w\build\CMakeInit.txt -- The C compiler identification is GNU 13.2.0 -- The CXX compiler identification is GNU 13.2.0 -- Detecting C compiler ABI info -- Detecting C compiler ABI info - done -- Check for working C compiler: C:/ProgramData/mingw64/mingw64/bin/gcc.exe - skipped -- Detecting C compile features -- Detecting C compile features - done -- Detecting CXX compiler ABI info -- Detecting CXX compiler ABI info - done -- Check for working CXX compiler: C:/ProgramData/mingw64/mingw64/bin/g++.exe - skipped -- Detecting CXX compile features -- Detecting CXX compile features - done -- Found Git: C:/Program Files/Git/cmd/git.exe (found version "2.45.1.windows.1") -- Performing Test CMAKE_HAVE_LIBC_PTHREAD -- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success -- Found Threads: TRUE -- Found OpenMP_C: -fopenmp (found version "4.5") -- Found OpenMP_CXX: -fopenmp (found version "4.5") -- Found OpenMP: TRUE (found version "4.5") -- OpenMP found -- Found CUDAToolkit: C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v11.8/include (found version "11.8.89") -- CUDA found -- The CUDA compiler identification is unknown -- Detecting CUDA compiler ABI info CMake Error: Error required internal CMake variable not set, cmake may not be built correctly. Missing variable is: _CMAKE_CUDA_WHOLE_FLAG CMake Error at C:/Users/USERNAME/AppData/Local/Temp/pip-build-env-zsu2ytep/normal/Lib/site-packages/cmake/data/share/cmake-3.29/Modules/CMakeDetermineCompilerABI.cmake:67 (try_compile): Failed to generate test project build system. Call Stack (most recent call first): C:/Users/USERNAME/AppData/Local/Temp/pip-build-env-zsu2ytep/normal/Lib/site-packages/cmake/data/share/cmake-3.29/Modules/CMakeTestCUDACompiler.cmake:19 (CMAKE_DETERMINE_COMPILER_ABI) vendor/llama.cpp/CMakeLists.txt:412 (enable_language)

metal3d commented 5 months ago

Have you tried to not install the precompiled package ?

pip install llama-cpp-python --force-reinstall --upgrade --no-cache-dir --verbose

metal3d commented 5 months ago

Also, you are trying to install cu121 version (that is for CUDOA 12.1). If you only can use CUDA 11, there is, AFIAK, no precompiled package.

You error is also something similar I've got with bad GCC version. CUDA only accept old version of GCC.

It is also possible that some CUDA heareds / lib are missing. See the log file.

jpagitrdone commented 5 months ago

Hello, Thanks for your reply. I did try without the --extra-index-url flag and that's actually what's causing the CMAKE_CUDA_WHOLE_FLAG error now, it seems to "work" if I do include the --extra-index-url + link but it doesn't seem to enable GPU support for some reason, so I guess it's CPU only.

The CUDA version thing is a bit confusing though. If I go to command prompt and do nvidia-smi, I am told I have CUDA 12.2. However, the actual CUDA Toolkit download I am stuck with (for now) is 11.8. I don't know which takes priority over the other for determine which CUDA type that I have.

Upon further review, I found a chart online that shows which version of GCC each version of CUDA Toolkit is compatible with. I noticed my GCC version is 13.x and for CUDA Toolkit 11.8 the max it supports is 11, so I'm guessing this could very well be the issue after all. Will keep trying at it.

metal3d commented 5 months ago

Do you use the --n_gpu_layers to set part or all the layers in the GPU ?

-1 means all the layers... But sometimes it fails if the model is too heavy. Following the VRAM, and checking nvidia-smi command to see how much memory is fill, you may find the correct value to set.

With my RTX 3070 8Go VRAM, I sometimes can send all layers of a 7b model, sometimes I need to reduce the layers to 16 layers... Less, or more...

jpagitrdone commented 5 months ago

So if I try to install without using that --extra-index-url location the install fails entirely and I can't even use the library at all. If I use that --extra-index-url statement, it does appear to be completing the install and shows up in pip list, but is giving me an error about llama.dll not being found even though it's clearly there. I have updated my initial comment to reflect findings in more detail.

I have requested that we get an updated version of CUDA Toolkit made available to us in the interim, but that will probably take weeks for them to deploy. Will continue to try resolution to llama.dll not being found issue in the meantime.

jpagitrdone commented 5 months ago

I was able to solve half the problem - I figured out why this installation method failed: pip install llama-cpp-python --force-reinstall --upgrade --no-cache-dir --verbose --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cu121

So it turns out that the cu121 is related to the CUDA Toolkit you actually have installed on your machine, which is different than the CUDA output you get from checking nvidia-smi. I searched around and found a wheel published by jllllll (site below) that has this older version. So those of you struggling trying to get the precompiled cuda version working because you have an old version of CUDA Toolkit installed, this shows you how to work around it. Pending approval to get CUDA Toolkit 12.5 installed on my machine, I speculate this will solve the first issue with me not being able to compile it on my own.

Older precompiled CUDA releases: https://github.com/jllllll/llama-cpp-python-cuBLAS-wheels/releases

paolovic commented 4 months ago

you probably have to set export CUDACXX=/usr/local/cuda/bin/nvcc

abetlen / llama-cpp-python

Trying to Install GPU Version - Getting CMake Error With _CMAKE_CUDA_WHOLE_FLAG #1508