Problem to install llama-cpp-python on Windows 10 with GPU NVidia Support CUBlast, BLAS = 0

ForwardForward commented 1 year ago

Hi everyone !

I have spent a lot of time trying to install llama-cpp-python with GPU support.

I need your help. I'll keep monitoring the thread and if I need to try other options and provide info post and I'll send everything quickly.

I use: Windows 10 Home Intel Processor GPU NVidia RTX 3060

I installed it on my computer: Cuda_12.2.2_537.13_windows Anaconda fully new enviroment with Python 3.10.12 Visual Stidio Community 2022 Visual Studio Build Tools 2022 Cmake-3.27.4-windows-x86_64

Path to Cuda I got a command: echo %CUDA_PATH%

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.2

I use cmd.exe

To get the environment variables I use the command: set

Try to install by that instruction:

Local Language Models with GPU Support https://github.com/KillianLucas/open-interpreter/blob/main/docs/GPU.md

After entering the first line: set FORCE_CMAKE=1 && set CMAKE_ARGS=-DLLAMA_CUBLAS=on

Got the following environment variables:

ALLUSERSPROFILE=C:\ProgramData APPDATA=C:\Users\igorb\AppData\Roaming CMAKE_ARGS=-DLLAMA_CUBLAS=on COMMONPROGRAMFILES=C:\Program Files\Common Files COMMONPROGRAMFILES(X86)=C:\Program Files (x86)\Common Files COMMONPROGRAMW6432=C:\Program Files\Common Files COMPUTERNAME=DESKTOP-K5FCPTT COMSPEC=C:\Windows\system32\cmd.exe CONDA_BAT=E:\miniconda3\condabin\conda.bat CONDA_DEFAULT_ENV=llamanew CONDA_EXE=C:\Users\igorb\anaconda3\Scripts\conda.exe CONDA_PROMPT_MODIFIER=(llamanew) CONDA_PYTHON_EXE=C:\Users\igorb\anaconda3\python.exe CONDA_SHLVL=1 CUDA_PATH=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.2 CUDA_PATH_V12_2=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.2 DRIVERDATA=C:\Windows\System32\Drivers\DriverData FORCE_CMAKE=1 HOMEDRIVE=C:

And others below.

After running this command:

pip install llama-cpp-python --force-reinstall --upgrade --no-cache-dir -vv

Got the following error in the listing:

Building wheels for collected packages: llama-cpp-python Created temporary directory: C:\Users\igorb\AppData\Local\Temp\pip-wheel-glrwuh6k Destination directory: C:\Users\igorb\AppData\Local\Temp\pip-wheel-glrwuh6k Running command Building wheel for llama-cpp-python (pyproject.toml) scikit-build-core 0.5.0 using CMake 3.27.4 (wheel) Configuring CMake... 2023-09-15 18:22:31,570 - scikit_build_core - WARNING - Can't find a Python library, got libdir=None, ldlibrary=None, multiarch=None, masd=None loading initial cache file C:\Users\igorb\AppData\Local\Temp\tmp8bwuhpey\build\CMakeInit.txt -- Building for: Visual Studio 17 2022 -- Selecting Windows SDK version 10.0.22621.0 to target Windows 10.0.19045. -- The C compiler identification is MSVC 19.37.32824.0 -- The CXX compiler identification is MSVC 19.37.32824.0 -- Detecting C compiler ABI info -- Detecting C compiler ABI info - done -- Check for working C compiler: C:/Program Files (x86)/Microsoft Visual Studio/2022/BuildTools/VC/Tools/MSVC/14.37.32822/bin/Hostx64/x64/cl.exe - skipped -- Detecting C compile features -- Detecting C compile features - done -- Detecting CXX compiler ABI info -- Detecting CXX compiler ABI info - done -- Check for working CXX compiler: C:/Program Files (x86)/Microsoft Visual Studio/2022/BuildTools/VC/Tools/MSVC/14.37.32822/bin/Hostx64/x64/cl.exe - skipped -- Detecting CXX compile features -- Detecting CXX compile features - done -- Found Git: C:/Program Files/Git/cmd/git.exe (found version "2.41.0.windows.1") fatal: not a git repository (or any of the parent directories): .git fatal: not a git repository (or any of the parent directories): .git CMake Warning at vendor/llama.cpp/CMakeLists.txt:125 (message): Git repository not found; to enable automatic generation of build info, make sure Git is installed and the project is a Git repository.

-- Performing Test CMAKE_HAVE_LIBC_PTHREAD -- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Failed -- Looking for pthread_create in pthreads -- Looking for pthread_create in pthreads - not found -- Looking for pthread_create in pthread -- Looking for pthread_create in pthread - not found -- Found Threads: TRUE -- Found CUDAToolkit: C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v12.2/include (found version "12.2.140") -- cuBLAS found CMake Error at C:/Program Files/CMake/share/cmake-3.27/Modules/CMakeDetermineCompilerId.cmake:503 (message): No CUDA toolset found. Call Stack (most recent call first): C:/Program Files/CMake/share/cmake-3.27/Modules/CMakeDetermineCompilerId.cmake:8 (CMAKE_DETERMINE_COMPILER_ID_BUILD) C:/Program Files/CMake/share/cmake-3.27/Modules/CMakeDetermineCompilerId.cmake:53 (__determine_compiler_id_test) C:/Program Files/CMake/share/cmake-3.27/Modules/CMakeDetermineCUDACompiler.cmake:307 (CMAKE_DETERMINE_COMPILER_ID) vendor/llama.cpp/CMakeLists.txt:286 (enable_language)

-- Configuring incomplete, errors occurred!

*** CMake configuration failed error: subprocess-exited-with-error

× Building wheel for llama-cpp-python (pyproject.toml) did not run successfully. │ exit code: 1 ╰─> See above for output.

note: This error originates from a subprocess, and is likely not a problem with pip. full command: 'C:\Users\igorb\anaconda3\envs\llaman\python.exe' 'C:\Users\igorb\anaconda3\envs\llaman\lib\site-packages\pip_vendor\pyproject_hooks_in_process_in_process.py' build_wheel 'C:\Users\igorb\AppData\Local\Temp\tmpc963p54s' cwd: C:\Users\igorb\AppData\Local\Temp\pip-install-1obq29et\llama-cpp-python_475e6a59f42648fab37fac85854af94a Building wheel for llama-cpp-python (pyproject.toml) ... error ERROR: Failed building wheel for llama-cpp-python Failed to build llama-cpp-python ERROR: Could not build wheels for llama-cpp-python, which is required to install pyproject.toml-based projects Exception information: Traceback (most recent call last): File "C:\Users\igorb\anaconda3\envs\llaman\lib\site-packages\pip_internal\cli\base_command.py", line 180, in exc_logging_wrapper status = run_func(*args) File "C:\Users\igorb\anaconda3\envs\llaman\lib\site-packages\pip_internal\cli\req_command.py", line 248, in wrapper return func(self, options, args) File "C:\Users\igorb\anaconda3\envs\llaman\lib\site-packages\pip_internal\commands\install.py", line 429, in run raise InstallationError( pip._internal.exceptions.InstallationError: Could not build wheels for llama-cpp-python, which is required to install pyproject.toml-based projects

In Windows, you can set environment variables via Control Panel --> System --> About --> Advanced system settings --> Anvanced --> Environment variables --> System variables --> New

If you add environment variables there, they will automatically appear in the list of environment variables on command when Anaconda is rebooted: set

If you add environment variables via the set command, they disappear in the environment variables after exiting Anaconda and then starting it.

More info: When I set two system variables:

FORCE_CMAKE=1 LLAMA_CUBLAS=1

The llama-cpp-python installation goes without error, but after running it with the commands in cmd:

python

from llama_cpp import Llama model = Llama("E:\LLM\LLaMA2-Chat-7B\llama-2-7b.Q4_0.gguf", verbose=True, n_threads=8, n_gpu_layers=40)

I'm getting data on a running model with a parameter: BLAS = 0

A more complete listing:

Try also: python -c "from llama_cpp import GGML_USE_CUBLAS; print(GGML_USE_CUBLAS)"

And receive: False

And another piece of information: Если Cmake установлен на компьютере, то по нему можно получить помощь по командам:

cmake --help

Part of the listing:

cmake [options] cmake [options] cmake [options] -S -B

Specify a source directory to (re-)generate a build system for it in the current working directory. Specify an existing build directory to re-generate its build system.

Options -S = Explicitly specify a source directory. -B = Explicitly specify a build directory. -C = Pre-load a script to populate the cache. -D [:]= = Create or update a cmake cache entry.

I also tried installing with the following environment variables:

set FORCE_CMAKE=1 && set CMAKE_ARGS=LLAMA_CUBLAS=ON

Also, the llama-cpp-python installation goes through without error, but the result is the same:

BLAS = 0

If I use that instruction:

How To Install Llama-2 Locally On Windows Computer – llama.cpp, Exllama, KoboltCpp https://www.hardware-corner.net/guides/install-llama-2-windows-pc/

from the point: Installing cuBLAS version for NVIDIA GPU

File: cudart-llama-bin-win-cu12.1.0-x64

Contains 3 dlls: cublas64_12.dll cublasLt64_12.dll cudart64_12.dll

And start llama as main.exe by command:

main.exe -m E:\LLM\LLaMA2-Chat-7B\llama-2-7b-chat.ggmlv3.q4_0.bin --in-prefix " [INST] " --in-suffix " [/INST]" -i -p "[INST] <> You are a helpful, respectful, and honest assistant. <> [/INST]" --n-gpu-layers 40 -ins --color

It is start fine with BLAS = 1 (GPU Support) and at a faster rate.

But I need use it with Python.

IsaacDynamo commented 1 year ago

@abetlen Can LLAMA_BUILD=OFF be used to get install llama-cpp-python without compilation on Windows. And grabbing the .dll for there https://github.com/ggerganov/llama.cpp/releases

https://github.com/abetlen/llama-cpp-python/issues/484#issuecomment-1718346467

Would that work? I will give it a try, and report back if I get something working.

IsaacDynamo commented 1 year ago

Nevermind, llama.cpp releases don't contain the .dll

ForwardForward commented 1 year ago

@abetlen Can LLAMA_BUILD=OFF be used to get install llama-cpp-python without compilation on Windows. And grabbing the .dll for there https://github.com/ggerganov/llama.cpp/releases

Do you recommend using only one variable set LLAMA_BUILD=OFF? Not using these variables set FORCE_CMAKE=1 && set CMAKE_ARGS=-DLLAMA_CUBLAS=on ? Correct?

And what path should I put the DLL in my case? The path to my virtual environment is C:\Users\igorb\anaconda3\envs\llaman\Lib\site-packages\llama_cpp_python-0.2.6.dist-info.

ForwardForward commented 1 year ago

Tried llama in this project and there is definitely CUBLAS support here. And it is work fine, https://github.com/oobabooga/text-generation-webui

Here some folders name from their enviroment: llama_cpp llama_cpp_cuda llama_cpp_python_cuda-0.1.85+cu117.dist-info llama_cpp_python-0.1.85.dist-info

wdlq commented 1 year ago

i got stuck in the same issue

IsaacDynamo commented 1 year ago

Made a PR so windows builds include the .dll in de release .zip. See https://github.com/ggerganov/llama.cpp/pull/3215

For now you can use the artifacts from my branch. https://github.com/IsaacDynamo/llama.cpp/actions/runs/6206987247

With the .dll from the .zip I was able to run the llama-cpp server with cuBLAS, without compiling it myself.

Installed llama-cpp-python as follow. ~Not sure that set CMAKE_ARGS="-DLLAMA_BUILD=OFF" changed anything, because it build a llama.cpp with a CPU backend anyway.~ Update: With set CMAKE_ARGS=-DLLAMA_BUILD=OFF, so without "'s llama-cpp-python skips building the CPU backend .dll.

set CMAKE_ARGS=-DLLAMA_BUILD=OFF
pip install llama-cpp-python[server]

Used the following commands to use the prebuild .dll with llama_cpp server.

set LLAMA_CPP_LIB=C:\llama-release-dll-b1248-11eb9fb-bin-win-cublas-cu12.2.0-x64\llama.dll
set PATH=C:\llama-release-dll-b1248-11eb9fb-bin-win-cublas-cu12.2.0-x64;%PATH%
python.exe -m llama_cpp.server --model llama-2-7b-chat.Q5_K_M.gguf

abetlen commented 1 year ago

@ForwardForward @IsaacDynamo I think we might be found a solution in #563 by adding

install(
    FILES $<TARGET_RUNTIME_DLLS:llama>
    DESTINATION llama_cpp
)

To cmake to force it to copy the cublas DLLs from the build step to the install directory.

jllllll commented 1 year ago

No CUDA toolset found.

This error is due to Windows CMake with MSVC requiring CUDA Visual Studio integration to be installed through the CUDA installer. This doesn't necessarily install to all of the MSVC versions you have, especially if you install a newer one after installing CUDA. You can check these paths to see if they are there:

C:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\MSBuild\Microsoft\VC\v1*\BuildCustomizations

CUDA 12.2.props
CUDA 12.2.targets
CUDA 12.2.xml
Nvda.Build.CudaTasks.v12.2.dll

If they aren't there, then the easiest way to get them is to download the CUDA installer, in your case here, and then open the installer in 7zip and extract the files to all the BuildCustomizations folders in that path from this path in the installer's files:

\visual_studio_integration\CUDAVisualStudioIntegration\extras\visual_studio_integration\MSBuildExtensions\

You can also just run the installer again and only select the Visual Studio integration, but it may not install it to the specific version of MSVC that CMake is wanting to use.

jllllll commented 1 year ago

In the past, this wasn't as much of an issue as scikit-build would handle the CUDA stuff in the background and eliminate the need for the VS integrations. scikit-build-core does not seem to have this functionality.

abetlen commented 1 year ago

@jllllll do you know if instructing users to install the nvidia provided cuda pip wheels (https://docs.nvidia.com/cuda/cuda-installation-guide-microsoft-windows/index.html#pip-wheels) would help here?

jllllll commented 1 year ago

I don't think it will make a difference as the issue is specifically due to missing the VS integrations provided by the CUDA Toolkit installer. I've tried looking for other sources of those integration files multiple times in the past and could not find anything.

This is more of an issue with how NVIDIA has decided to implement NVCC on Windows. They could have just included CMake configuration files as part of the main NVCC package for Windows, but decided instead to distribute Visual Studio integration through the main exe installer. CMake devs themselves could also implement solutions for this and not rely on those integration files similar to how scikit-build and Pytorch did it. Not sure why they haven't.

jllllll commented 1 year ago

Then again, I'm not sure how scikit-build and Pytorch avoided that issue. It may have actually been setuptools that implemented a solution. I'll look into it and see if it is something that can be implemented in this project.

jllllll commented 1 year ago

@ForwardForward Does using Ninja to build avoid the No CUDA toolset found. issue? It seems to for me. I removed all of my VS integration files and ran the commands below successfully.

python -m pip install ninja scikit-build-core[pyproject]

call "C:/Program Files/Microsoft Visual Studio/2022/BuildTools/VC/Auxiliary/Build/vcvars64.bat"
set FORCE_CMAKE=1 && set CMAKE_ARGS=-GNinja -DLLAMA_CUBLAS=on
python -m pip install llama-cpp-python --force-reinstall --upgrade --no-cache-dir --no-build-isolation -v

--no-build-isolation seems to be needed here as llama-cpp-python does not list Ninja as a build dependency.

The call command activates VS's build environment and is needed here due to Ninja not automatically finding MSVC. The old scikit-build had a solution for finding vcvars64.bat and activating the environment automatically. Hopefully scikit-build-core gets that ported over soon.

wdlq commented 1 year ago

I follow this instruction https://github.com/KillianLucas/open-interpreter/blob/main/docs/GPU.md

set FORCE_CMAKE=1 && set CMAKE_ARGS=-DLLAMA_CUBLAS=on pip install llama-cpp-python --force-reinstall --upgrade --no-cache-dir -vv

but it cause an error :

error MSB3721: The command ""C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v12.1\bin\nvcc.exe" -gencode=arch=compute_52,code=\"sm_52,compute_52\" --use-local-env -ccbin "C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\bin\amd64" -x cu -I"C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v12.1\include" -I"C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v12.1\include" -G --keep-dir x64\Debug -maxrregcount=0 --machine 64 --compile -cudart static -v -g -D_MBCS -Xcompiler "/EHsc /W0 /nologo /Od /FS /Zi /RTC1 /MDd " -Xcompiler "/FdDebug\vc140.pdb" -o C:\Users\sun-home\AppData\Local\Temp\tmpx7i939il\build\CMakeFiles\3.27.4\CompilerIdCUDA\x64\Debug\CMakeCUDACompilerId.cu.obj "C:\Users\sun-home\AppData\Local\Temp\tmpx7i939il\build\CMakeFiles\3.27.4\CompilerIdCUDA\CMakeCUDACompilerId.cu"" exited with code 1. [C:\Users\sun-home\AppData\Local\Temp\tmpx7i939il\build\CMakeFiles\3.27.4\CompilerIdCUDA\CompilerIdCUDA.vcxproj]

I think it may be solved by two methods:

method 1：

method 2:

but i don't know how to change the above two variables in "x64 Native Tools Command Prompt for VS 2022". could you please help me?

jllllll commented 1 year ago

@wdlq That error message shows that it is trying to use Visual Studio 2015.

I would be pretty surprised if a CUDA Toolkit version as new as 12.1 supported a compiler that old on Windows. Make sure that you have Visual Studio 2022 installed and make sure that you have selected the Desktop development with C++ option during installation.

wdlq commented 1 year ago

@wdlq That error message shows that it is trying to use Visual Studio 2015.

I would be pretty surprised if a CUDA Toolkit version as new as 12.1 supported a compiler that old on Windows. Make sure that you have Visual Studio 2022 installed and make sure that you have selected the Desktop development with C++ option during installation.

Yes，I am sure I installed VS2022 and had 'Desktop development with C++' option selected. I compiled cuda sample code in VS2022 successfully，both in VC 14.3 and VC 17.

jllllll commented 1 year ago

@wdlq Maybe I misunderstood your initial post.

If you installed VS 2022 to a different path than the default, then you are going to have problems using it with Python packages like this one. The default install locations for Visual Studio tend to work best with stuff like this. I have my own VS installation on another drive and have created a junction linking it to the default location to avoid issues.

You can try manually defining the correct VS version like this:

set CMAKE_GENERATOR=Visual Studio 17 2022
set FORCE_CMAKE=1 && set CMAKE_ARGS=-DLLAMA_CUBLAS=on
python -m pip install llama-cpp-python --no-cache-dir -v

You can list the generators that CMake recognizes on your system with cmake --help.

DaveScream commented 1 year ago

Whole evening battle.

cmake for visual studio 2022 or visual studio build tools 2022 even if you add to the path is not working. Need to use cmake from official standalone distribution (google cmake)
after compiling okay. I had another error "CUDA error: the provided PTX was compiled with an unsupported toolchain" maybe it is because I used latest cuda 12.2. To overcome this you need latest nvidia driver.
I dunno how oogabooga make llama_for_cpp working with new format without all this headache. Maybe they use thick with precompiled dll but I dont understand how to use it, where to put precompiled dll.
Speed increased from 10 tokens/s in oogabooga 13b model (35 layers on gpu) to 14 tokens/s (same 35 layers on gpu). GPU load increased too from 13% to 35% load (rtx 3060 12gb)

jllllll commented 1 year ago

I prefer to use the pip installed cmake since it is convenient to install alongside other pip packages.
All CUDA versions have a minimum required driver version that it supports.
text-generation-webui uses pre-compiled wheels for Windows provided by me from this repo: https://github.com/jllllll/llama-cpp-python-cuBLAS-wheels Linux users use the standard installation method from pip for CPU-only builds. Both Windows and Linux use pre-compiled wheels with renamed packages to allow for simultaneous support of both cuBLAS and CPU-only builds in the webui. You can see the specific wheels used in the requirements.txt. By default, CUDA 11.7 wheels are used, but I also have CUDA 12.2 wheels available for those who want them.

The biggest issue I've found on Windows so far is that the latest versions of llama-cpp-python seem to override what nvcc version is used for compiling instead of just using the one in CUDA_PATH like previous versions did. To work around this, I have to use CMAKE_ARGS like this:

set "CMAKE_ARGS=-Tv143,cuda=11.7 -DLLAMA_CUBLAS=on"

Otherwise, it just always uses the newest nvcc on my system.

wdlq commented 1 year ago

@jllllll I uninstall VS2022 in disk H, and reinstall it in disk C, it works well. Thank you!

ForwardForward commented 1 year ago

I prefer to use the pip installed cmake since it is convenient to install alongside other pip packages.

All CUDA versions have a minimum required driver version that it supports.

text-generation-webui uses pre-compiled wheels for Windows provided by me from this repo: https://github.com/jllllll/llama-cpp-python-cuBLAS-wheels Linux users use the standard installation method from pip for CPU-only builds. Both Windows and Linux use pre-compiled wheels with renamed packages to allow for simultaneous support of both cuBLAS and CPU-only builds in the webui. You can see the specific wheels used in the requirements.txt. By default, CUDA 11.7 wheels are used, but I also have CUDA 12.2 wheels available for those who want them.

The biggest issue I've found on Windows so far is that the latest versions of llama-cpp-python seem to override what nvcc version is used for compiling instead of just using the one in CUDA_PATH like previous versions did. To work around this, I have to use CMAKE_ARGS like this:
set "CMAKE_ARGS=-Tv143,cuda=11.7 -DLLAMA_CUBLAS=on"
Otherwise, it just always uses the newest nvcc on my system.

Thank you for your answer.

I described the complete configuration of the computer and installed software in the starting post.

According to your advice I uninstalled Cmake from Windows and installed Cmake via pip install cmake in the current virtual environment.

I have an Intel Core i3-8100 processor

As far as I checked it supports AVX2, but not AVX-512.

Next I try to install llama-cpp-python with the following parameters: python -m pip install llama-cpp-python --prefer-binary --no-cache-dir --extra-index-url=https://jllllll.github.io/llama-cpp-python-cuBLAS-wheels/AVX2/cu122/llama-cpp-python/llama_cpp_python-0.2.6+cu122-cp310-cp310-win_amd64.whl

First tried installing without using variable setting: set "CMAKE_ARGS=-Tv143,cuda=11.7 -DLLAMA_CUBLAS=on"

Everything installed without errors, but after testing the model with python: python from llama_cpp import Llama model = Llama("E:\LLM\LLaMA2-Chat-7B\llama-2-7b.Q4_0.gguf", verbose=True, n_threads=8, n_gpu_layers=40)

After startup it shows BLAS=0 flag (no GPU support)

After that I tried to do the installation with the flags. Since I have CUDA version 12.2, I corrected CUDA to 12.2 in the flags: set "CMAKE_ARGS=-Tv143,cuda=12.2 -DLLAMA_CUBLAS=on".

After uninstalling and using pip uninstall llama-cpp-python

I re-run the command: python -m pip install llama-cpp-python --prefer-binary --no-cache-dir --extra-index-url=https://jllllll.github.io/llama-cpp-python-cuBLAS-wheels/AVX2/cu122/llama-cpp-python/llama_cpp_python-0.2.6+cu122-cp310-cp310-win_amd64.whl

Can you tell me what I am doing wrong and how to fix it?

And got the following error during installation:

Building wheels for collected packages: llama-cpp-python Building wheel for llama-cpp-python (pyproject.toml) ... error error: subprocess-exited-with-error

× Building wheel for llama-cpp-python (pyproject.toml) did not run successfully. │ exit code: 1 ╰─> [112 lines of output] scikit-build-core 0.5.0 using CMake 3.27.5 (wheel) Configuring CMake... 2023-09-18 17:11:42,309 - scikit_build_core - WARNING - Can't find a Python library, got libdir=None, ldlibrary=None, multiarch=None, masd=None loading initial cache file C:\Users\igorb\AppData\Local\Temp\tmp7nicpnoe\build\CMakeInit.txt -- Building for: Visual Studio 17 2022 -- Selecting Windows SDK version 10.0.22621.0 to target Windows 10.0.19045. -- The C compiler identification is MSVC 19.37.32824.0 -- The CXX compiler identification is MSVC 19.37.32824.0 -- Detecting C compiler ABI info -- Detecting C compiler ABI info - done -- Check for working C compiler: C:/Program Files (x86)/Microsoft Visual Studio/2022/BuildTools/VC/Tools/MSVC/14.37.32822/bin/Hostx64/x64/cl.exe - skipped -- Detecting C compile features -- Detecting C compile features - done -- Detecting CXX compiler ABI info -- Detecting CXX compiler ABI info - done -- Check for working CXX compiler: C:/Program Files (x86)/Microsoft Visual Studio/2022/BuildTools/VC/Tools/MSVC/14.37.32822/bin/Hostx64/x64/cl.exe - skipped -- Detecting CXX compile features -- Detecting CXX compile features - done -- Found Git: C:/Program Files/Git/cmd/git.exe (found version "2.41.0.windows.1") fatal: not a git repository (or any of the parent directories): .git fatal: not a git repository (or any of the parent directories): .git CMake Warning at vendor/llama.cpp/CMakeLists.txt:125 (message): Git repository not found; to enable automatic generation of build info, make sure Git is installed and the project is a Git repository.

jllllll commented 1 year ago

Don't include the full URL to the wheel in the index URL. pip is designed to search for the wheel itself in package indices:

python -m pip install llama-cpp-python --prefer-binary --no-cache-dir --extra-index-url=https://jllllll.github.io/llama-cpp-python-cuBLAS-wheels/AVX2/cu122

ForwardForward commented 1 year ago

Don't include the full URL to the wheel in the index URL. pip is designed to search for the wheel itself in package indices:
python -m pip install llama-cpp-python --prefer-binary --no-cache-dir --extra-index-url=https://jllllll.github.io/llama-cpp-python-cuBLAS-wheels/AVX2/cu122

Thank you so much ! I finally got BLAS=1.

abetlen commented 1 year ago

@jllllll I think I found the root of the issue with msvc-based cmakes, it looks like scikit-buld-core has a bug where it can't parse the version link

If anyone wants to submit a PR there, unfortunately I don't have access to a Windows machine so can't test this.

jllllll commented 1 year ago

@abetlen That would certainly help. Another potential solution that can happen more quickly could be to include cmake in the build-system requirements:

[build-system]
requires = [
    "scikit-build-core[pyproject]>=0.5.0",
    "cmake>=3.21.0"
]
build-backend = "scikit_build_core.build"

A simple test on my system showed that this pip installed version of cmake was immediately used to build llama-cpp-python with that addition to pyproject.toml. This can also allow more direct control over what version of CMake is used for building and can even make installation easier by removing the need for installing CMake beforehand.

jllllll commented 1 year ago

@abetlen Made a PR for the issue you linked: https://github.com/scikit-build/scikit-build-core/pull/508

Code is simple enough that it doesn't really need in-depth testing on a relevant CMake version as it is simply parsing strings. Using the version string from that issue with a Python interpreter and recreating the subprocess code for retrieving and parsing the string is enough to verify that it works.

In case you were wondering, this is the string output from CMake that was being parsed and causing the error:

"cmake version 3.26.4-msvc4\n\nCMake suite maintained and supported by Kitware (kitware.com/cmake).\n"

abetlen commented 1 year ago

@jllllll thanks for resolving that PR, I've bumped the required scikit-build-core version to 0.5.1 which includes your fix. Sould I also add the explicit cmake build dependency in the pyproject.toml?

jllllll commented 1 year ago

It isn't necessary now. It might be useful in the future? Not sure. I almost exclusively use the pip distribution of cmake anyway.

earonesty commented 1 year ago

this happens if you install cuda before you install visual studio.

ramzeta commented 1 year ago

Where can I find the solution to this in discord? False

tk-master commented 11 months ago

Check this guide and let me know if it helped https://github.com/abetlen/llama-cpp-python/discussions/871

Talhaz commented 10 months ago

After spending 3 hours on installation . let me tell you what mistakes i make what should you do

step :1 download visual studion 2022 install desktop development with C++

step 2: downlaod Cuda toolkit and isntall with Express installation this will overwrite your drivers and installed correctly. Go to cmd type nvcc --version to check if cuda is installed or not .

step 3: install CUDNN Follow this process https://medium.com/analytics-vidhya/installing-cuda-and-cudnn-on-windows-d44b8e9876b5 in this blog you have to follow all the path variables guide .

Step 4: What i did was open visual studio and go to extensions and then install nvidia nsight monitor extension i don;t know if it is related to Cuda or anythig but i did to complete installation of everything .

step 5: i used pip for installation so i use these commands set "CMAKE_ARGS=-Tv143,cuda=12.3 -DLLAMA_CUBLAS=on" python -m pip install llama-cpp-python --force-reinstall --no-cache-dir -v

Thanks to this man @jllllll for helping a lot

znelson32 commented 9 months ago

After spending 3 hours on installation . let me tell you what mistakes i make what should you do

step :1 download visual studion 2022 install desktop development with C++

step 2: downlaod Cuda toolkit and isntall with Express installation this will overwrite your drivers and installed correctly. Go to cmd type nvcc --version to check if cuda is installed or not .

step 3: install CUDNN Follow this process https://medium.com/analytics-vidhya/installing-cuda-and-cudnn-on-windows-d44b8e9876b5 in this blog you have to follow all the path variables guide .

Step 4: What i did was open visual studio and go to extensions and then install nvidia nsight monitor extension i don;t know if it is related to Cuda or anythig but i did to complete installation of everything .

step 5: i used pip for installation so i use these commands set "CMAKE_ARGS=-Tv143,cuda=12.3 -DLLAMA_CUBLAS=on" python -m pip install llama-cpp-python --force-reinstall --no-cache-dir -v

Thanks to this man @jllllll for helping a lot

I also got stuck on running the following command: _(privateGPT) D:\AI\PrivateGPT\privateGPT>$env:CMAKE_ARGS='-DLLAMACUBLAS=on'; The filename, directory name, or volume label syntax is incorrect.

Running the following fixed the issue:

python -m pip install llama-cpp-python --force-reinstall --no-cache-dir -v poetry run pip install --force-reinstall --no-cache-dir llama-cpp-python

Then when starting the server the offload to GPU is set, 33/33 layers.

Mooth34 commented 9 months ago

Don't include the full URL to the wheel in the index URL. pip is designed to search for the wheel itself in package indices:
python -m pip install llama-cpp-python --prefer-binary --no-cache-dir --extra-index-url=https://jllllll.github.io/llama-cpp-python-cuBLAS-wheels/AVX2/cu122
Thank you so much ! I finally got BLAS=1.

FINALLY python -m pip install llama-cpp-python --prefer-binary --no-cache-dir --force-reinstall --extra-index-url=https://jllllll.github.io/llama-cpp-python-cuBLAS-wheels/AVX2/cu121

AlbertL7 commented 9 months ago

I was running into this error trying to Build PrivateGPT on Windows using VS code for "llama-cpp-python"

 poetry install --with ui,local
Installing dependencies from lock file

Package operations: 0 installs, 1 update, 0 removals

  • Downgrading llama-cpp-python (0.2.26+cu122 -> 0.2.23): Failed

  ChefBuildError

  Backend subprocess exited when trying to invoke build_wheel

  *** scikit-build-core 0.8.0 using CMake 3.28.1 (wheel)
  *** Configuring CMake...
  2024-01-25 08:41:10,191 - scikit_build_core - WARNING - Can't find a Python library, got libdir=None, ldlibrary=None, multiarch=None, masd=None
  loading initial cache file C:\Users\HOUSE-~1\AppData\Local\Temp\tmpuyzt0kb2\build\CMakeInit.txt
  -- Building for: NMake Makefiles
  CMake Error at CMakeLists.txt:3 (project):
    Running

     'nmake' '-?'

    failed with:

     no such file or directory

  CMake Error: CMAKE_C_COMPILER not set, after EnableLanguage
  CMake Error: CMAKE_CXX_COMPILER not set, after EnableLanguage
  -- Configuring incomplete, errors occurred!

  *** CMake configuration failed

  at ~\AppData\Local\Programs\Python\Python311\Lib\site-packages\poetry\installation\chef.py:164 in _prepare
      160│
      161│                 error = ChefBuildError("\n\n".join(message_parts))
      162│
      163│             if error is not None:
    → 164│                 raise error from None
      165│
      166│             return path
      167│
      168│     def _prepare_sdist(self, archive: Path, destination: Path | None = None) -> Path:

Note: This error originates from the build backend, and is likely not a problem with poetry but with llama-cpp-python (0.2.23) not supporting PEP 517 builds. You can verify this by running 'pip wheel --no-cache-dir --use-pep517 "llama-cpp-python (==0.2.23)"'.

I solved it by downloading Build Tools for Visual Code at this link https://visualstudio.microsoft.com/downloads/?cid=learn-onpage-download-cta

After downloading and installing I ran "poetry install --with ui,local" and it worked.

abetlen / llama-cpp-python

Problem to install llama-cpp-python on Windows 10 with GPU NVidia Support CUBlast, BLAS = 0 #721