Closed kot197 closed 1 month ago
Still stuck on installation
Uninstalled everything and reinstalling
I have NVIDIA CUDA 12.5.1 I have Visual Studio 2022
now this is the error i got:
Building wheels for collected packages: llama-cpp-python
Building wheel for llama-cpp-python (pyproject.toml) ... error
error: subprocess-exited-with-error
× Building wheel for llama-cpp-python (pyproject.toml) did not run successfully.
│ exit code: 1
╰─> [17 lines of output]
*** scikit-build-core 0.9.8 using CMake 3.30.1 (wheel)
*** Configuring CMake...
2024-07-28 23:19:17,161 - scikit_build_core - WARNING - Can't find a Python library, got libdir=None, ldlibrary=None, multiarch=None, masd=None
loading initial cache file C:\Users\J\AppData\Local\Temp\tmpcuu0ov3w\build\CMakeInit.txt
-- Building for: Visual Studio 15 2017 Win64
CMake Error at CMakeLists.txt:3 (project):
Generator
Visual Studio 15 2017 Win64
could not find any instance of Visual Studio.
-- Configuring incomplete, errors occurred!
*** CMake configuration failed
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
ERROR: Failed building wheel for llama-cpp-python
Failed to build llama-cpp-python
ERROR: Could not build wheels for llama-cpp-python, which is required to install pyproject.toml-based projects
I ran the following commands
(base) C:\Users\J>set CMAKE_ARGS=-DLLAMA_CUBLAS=on
(base) C:\Users\J>set FORCE_CMAKE=1
(base) C:\Users\J>pip install llama-cpp-python --force-reinstall --upgrade --no-cache-dir
why is it saying:
-- Building for: Visual Studio 15 2017 Win64
CMake Error at CMakeLists.txt:3 (project):
Generator
Visual Studio 15 2017 Win64
could not find any instance of Visual Studio.
could this be the problem
After a lot of things... I'm stuck at 'Building wheels for collected packages: llama-cpp-python' and I can't troubleshoot anymore because there is no error messages.
I think I'm giving up...is there another package as an alternative to this Wish there is more comprehensive guide on this, sad. I spent my entire day on installing this package.
EDIT: It's just keep printing these non-stop when I change to --verbose
(base) C:\Users\J\AppData\Local\Temp\tmpv0s5ys_x\build\vendor\llama.cpp\ggml\src>"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\bin\nvcc.exe" --use-local-env -ccbin "C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.40.33807\bin\HostX64\x64" -x cu -I"C:\Users\J\AppData\Local\Temp\pip-install-soxh_yjh\llama-cpp-python_e04e8cfebc0d46d38d6c4b3c28bb6bb4\vendor\llama.cpp\ggml\src\..\include" -I"C:\Users\J\AppData\Local\Temp\pip-install-soxh_yjh\llama-cpp-python_e04e8cfebc0d46d38d6c4b3c28bb6bb4\vendor\llama.cpp\ggml\src\." -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\include" -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\include" --keep-dir ggml\x64\Release -use_fast_math -maxrregcount=0 --machine 64 --compile -cudart static --generate-code=arch=compute_52,code=[compute_52,sm_52] --generate-code=arch=compute_61,code=[compute_61,sm_61] --generate-code=arch=compute_70,code=[compute_70,sm_70] --generate-code=arch=compute_75,code=[compute_75,sm_75] -Xcompiler="/EHsc -Ob2 /arch:AVX2" -D_WINDOWS -DNDEBUG -DGGML_USE_CUDA -DGGML_SHARED -DGGML_BUILD -D_CRT_SECURE_NO_WARNINGS -DGGML_SCHED_MAX_COPIES=4 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DK_QUANTS_PER_ITERATION=2 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -D_XOPEN_SOURCE=600 -D"CMAKE_INTDIR=\"Release\"" -Dggml_EXPORTS -D_WINDLL -D_MBCS -DWIN32 -D_WINDOWS -DNDEBUG -DGGML_USE_CUDA -DGGML_SHARED -DGGML_BUILD -D_CRT_SECURE_NO_WARNINGS -DGGML_SCHED_MAX_COPIES=4 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DK_QUANTS_PER_ITERATION=2 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -D_XOPEN_SOURCE=600 -D"CMAKE_INTDIR=\"Release\"" -Dggml_EXPORTS -Xcompiler "/EHsc /W1 /nologo /O2 /FS /MD " -Xcompiler "/Fdggml.dir\Release\vc143.pdb" -o ggml.dir\Release\mmq-instance-q2_k.obj "C:\Users\J\AppData\Local\Temp\pip-install-soxh_yjh\llama-cpp-python_e04e8cfebc0d46d38d6c4b3c28bb6bb4\vendor\llama.cpp\ggml\src\ggml-cuda\template-instances\mmq-instance-q2_k.cu"
mmq-instance-q2_k.cu
tmpxft_000010e8_00000000-7_mmq-instance-q2_k.compute_75.cudafe1.cpp
Done Building Project "C:\Users\J\AppData\Local\Temp\tmpv0s5ys_x\build\vendor\llama.cpp\ggml\src\ggml.vcxproj" (CudaBuildCore target(s)).
Project "C:\Users\J\AppData\Local\Temp\tmpv0s5ys_x\build\vendor\llama.cpp\ggml\src\ggml.vcxproj" (5) is building "C:\Users\J\AppData\Local\Temp\tmpv0s5ys_x\build\vendor\llama.cpp\ggml\src\ggml.vcxproj" (5:46) on node 1 (CudaBuildCore target(s)).
CudaBuildCore:
Compiling CUDA source file ..\..\..\..\..\..\pip-install-soxh_yjh\llama-cpp-python_e04e8cfebc0d46d38d6c4b3c28bb6bb4\vendor\llama.cpp\ggml\src\ggml-cuda\template-instances\mmq-instance-q3_k.cu...
cmd.exe /C "C:\Users\J\AppData\Local\Temp\tmp53333b2d379d496490439ae428dbbe15.cmd"
"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\bin\nvcc.exe" --use-local-env -ccbin "C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.40.33807\bin\HostX64\x64" -x cu -I"C:\Users\J\AppData\Local\Temp\pip-install-soxh_yjh\llama-cpp-python_e04e8cfebc0d46d38d6c4b3c28bb6bb4\vendor\llama.cpp\ggml\src\..\include" -I"C:\Users\J\AppData\Local\Temp\pip-install-soxh_yjh\llama-cpp-python_e04e8cfebc0d46d38d6c4b3c28bb6bb4\vendor\llama.cpp\ggml\src\." -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\include" -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\include" --keep-dir ggml\x64\Release -use_fast_math -maxrregcount=0 --machine 64 --compile -cudart static --generate-code=arch=compute_52,code=[compute_52,sm_52] --generate-code=arch=compute_61,code=[compute_61,sm_61] --generate-code=arch=compute_70,code=[compute_70,sm_70] --generate-code=arch=compute_75,code=[compute_75,sm_75] -Xcompiler="/EHsc -Ob2 /arch:AVX2" -D_WINDOWS -DNDEBUG -DGGML_USE_CUDA -DGGML_SHARED -DGGML_BUILD -D_CRT_SECURE_NO_WARNINGS -DGGML_SCHED_MAX_COPIES=4 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DK_QUANTS_PER_ITERATION=2 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -D_XOPEN_SOURCE=600 -D"CMAKE_INTDIR=\"Release\"" -Dggml_EXPORTS -D_WINDLL -D_MBCS -DWIN32 -D_WINDOWS -DNDEBUG -DGGML_USE_CUDA -DGGML_SHARED -DGGML_BUILD -D_CRT_SECURE_NO_WARNINGS -DGGML_SCHED_MAX_COPIES=4 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DK_QUANTS_PER_ITERATION=2 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -D_XOPEN_SOURCE=600 -D"CMAKE_INTDIR=\"Release\"" -Dggml_EXPORTS -Xcompiler "/EHsc /W1 /nologo /O2 /FS /MD " -Xcompiler "/Fdggml.dir\Release\vc143.pdb" -o ggml.dir\Release\mmq-instance-q3_k.obj "C:\Users\J\AppData\Local\Temp\pip-install-soxh_yjh\llama-cpp-python_e04e8cfebc0d46d38d6c4b3c28bb6bb4\vendor\llama.cpp\ggml\src\ggml-cuda\template-instances\mmq-instance-q3_k.cu"
(base) C:\Users\J\AppData\Local\Temp\tmpv0s5ys_x\build\vendor\llama.cpp\ggml\src>"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\bin\nvcc.exe" --use-local-env -ccbin "C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.40.33807\bin\HostX64\x64" -x cu -I"C:\Users\J\AppData\Local\Temp\pip-install-soxh_yjh\llama-cpp-python_e04e8cfebc0d46d38d6c4b3c28bb6bb4\vendor\llama.cpp\ggml\src\..\include" -I"C:\Users\J\AppData\Local\Temp\pip-install-soxh_yjh\llama-cpp-python_e04e8cfebc0d46d38d6c4b3c28bb6bb4\vendor\llama.cpp\ggml\src\." -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\include" -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\include" --keep-dir ggml\x64\Release -use_fast_math -maxrregcount=0 --machine 64 --compile -cudart static --generate-code=arch=compute_52,code=[compute_52,sm_52] --generate-code=arch=compute_61,code=[compute_61,sm_61] --generate-code=arch=compute_70,code=[compute_70,sm_70] --generate-code=arch=compute_75,code=[compute_75,sm_75] -Xcompiler="/EHsc -Ob2 /arch:AVX2" -D_WINDOWS -DNDEBUG -DGGML_USE_CUDA -DGGML_SHARED -DGGML_BUILD -D_CRT_SECURE_NO_WARNINGS -DGGML_SCHED_MAX_COPIES=4 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DK_QUANTS_PER_ITERATION=2 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -D_XOPEN_SOURCE=600 -D"CMAKE_INTDIR=\"Release\"" -Dggml_EXPORTS -D_WINDLL -D_MBCS -DWIN32 -D_WINDOWS -DNDEBUG -DGGML_USE_CUDA -DGGML_SHARED -DGGML_BUILD -D_CRT_SECURE_NO_WARNINGS -DGGML_SCHED_MAX_COPIES=4 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DK_QUANTS_PER_ITERATION=2 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -D_XOPEN_SOURCE=600 -D"CMAKE_INTDIR=\"Release\"" -Dggml_EXPORTS -Xcompiler "/EHsc /W1 /nologo /O2 /FS /MD " -Xcompiler "/Fdggml.dir\Release\vc143.pdb" -o ggml.dir\Release\mmq-instance-q3_k.obj "C:\Users\J\AppData\Local\Temp\pip-install-soxh_yjh\llama-cpp-python_e04e8cfebc0d46d38d6c4b3c28bb6bb4\vendor\llama.cpp\ggml\src\ggml-cuda\template-instances\mmq-instance-q3_k.cu"
Hi @kot197, As i can see you are using cuda 12.2 use the following command,
pip install llama-cpp-python \
--extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cu121
This will install a pre-built version on your pc.
Hi @gformcreation ,
Installation still stuck on infinite loop, the error keeps printing this:
(base) C:\Users\J\AppData\Local\Temp\tmp7ppzklvp\build\vendor\llama.cpp\ggml\src>"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4\bin\nvcc.exe" --use-local-env -ccbin "C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.40.33807\bin\HostX64\x64" -x cu -I"C:\Users\J\AppData\Local\Temp\pip-install-53hjbtws\llama-cpp-python_0984bb5db8f142cd8aa50adfdd94ba58\vendor\llama.cpp\ggml\src\..\include" -I"C:\Users\J\AppData\Local\Temp\pip-install-53hjbtws\llama-cpp-python_0984bb5db8f142cd8aa50adfdd94ba58\vendor\llama.cpp\ggml\src\." -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4\include" -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4\include" --keep-dir ggml\x64\Release -use_fast_math -maxrregcount=0 --machine 64 --compile -cudart static --generate-code=arch=compute_52,code=[compute_52,sm_52] --generate-code=arch=compute_61,code=[compute_61,sm_61] --generate-code=arch=compute_70,code=[compute_70,sm_70] --generate-code=arch=compute_75,code=[compute_75,sm_75] -Xcompiler="/EHsc -Ob2 /arch:AVX2" -D_WINDOWS -DNDEBUG -DGGML_USE_CUDA -DGGML_SHARED -DGGML_BUILD -D_CRT_SECURE_NO_WARNINGS -DGGML_SCHED_MAX_COPIES=4 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DK_QUANTS_PER_ITERATION=2 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -D_XOPEN_SOURCE=600 -D"CMAKE_INTDIR=\"Release\"" -Dggml_EXPORTS -D_WINDLL -D_MBCS -DWIN32 -D_WINDOWS -DNDEBUG -DGGML_USE_CUDA -DGGML_SHARED -DGGML_BUILD -D_CRT_SECURE_NO_WARNINGS -DGGML_SCHED_MAX_COPIES=4 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DK_QUANTS_PER_ITERATION=2 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -D_XOPEN_SOURCE=600 -D"CMAKE_INTDIR=\"Release\"" -Dggml_EXPORTS -Xcompiler "/EHsc /W1 /nologo /O2 /FS /MD " -Xcompiler "/Fdggml.dir\Release\vc143.pdb" -o ggml.dir\Release\fattn-wmma-f16-instance-kqfloat-cpb32.obj "C:\Users\J\AppData\Local\Temp\pip-install-53hjbtws\llama-cpp-python_0984bb5db8f142cd8aa50adfdd94ba58\vendor\llama.cpp\ggml\src\ggml-cuda\template-instances\fattn-wmma-f16-instance-kqfloat-cpb32.cu"
This happens after I installed CUDA 12.4
Can someone please tell me what's going on here?
I installed Visual Studio with C++ I installed CUDA toolkit from NVIDIA I installed cmake from Visual Studio
I get the following error:
I think this part could be the problem
Please be gentle with me as I'm a newcomer trying to figure this out