abetlen / llama-cpp-python

Python bindings for llama.cpp
https://llama-cpp-python.readthedocs.io
MIT License
7.65k stars 919 forks source link

ERROR: Failed building wheel for llama-cpp-python #1629

Closed kot197 closed 1 month ago

kot197 commented 1 month ago

Can someone please tell me what's going on here?

I installed Visual Studio with C++ I installed CUDA toolkit from NVIDIA I installed cmake from Visual Studio

I get the following error:

(base) C:\Users\J>set CMAKE_ARGS="-DGGML_BLAS=ON -DGGML_BLAS_VENDOR=OpenBLAS"

(base) C:\Users\J>set CMAKE_ARGS="-DGGML_CUDA=on"

(base) C:\Users\J>pip install llama-cpp-python --upgrade --force-reinstall --no-cache-dir
Collecting llama-cpp-python
  Downloading llama_cpp_python-0.2.83.tar.gz (49.4 MB)
     ---------------------------------------- 49.4/49.4 MB 54.7 MB/s eta 0:00:00
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Installing backend dependencies ... done
  Preparing metadata (pyproject.toml) ... done
Collecting typing-extensions>=4.5.0 (from llama-cpp-python)
  Downloading typing_extensions-4.12.2-py3-none-any.whl.metadata (3.0 kB)
Collecting numpy>=1.20.0 (from llama-cpp-python)
  Downloading numpy-2.0.1-cp312-cp312-win_amd64.whl.metadata (60 kB)
     ---------------------------------------- 60.9/60.9 kB ? eta 0:00:00
Collecting diskcache>=5.6.1 (from llama-cpp-python)
  Downloading diskcache-5.6.3-py3-none-any.whl.metadata (20 kB)
Collecting jinja2>=2.11.3 (from llama-cpp-python)
  Downloading jinja2-3.1.4-py3-none-any.whl.metadata (2.6 kB)
Collecting MarkupSafe>=2.0 (from jinja2>=2.11.3->llama-cpp-python)
  Downloading MarkupSafe-2.1.5-cp312-cp312-win_amd64.whl.metadata (3.1 kB)
Downloading diskcache-5.6.3-py3-none-any.whl (45 kB)
   ---------------------------------------- 45.5/45.5 kB ? eta 0:00:00
Downloading jinja2-3.1.4-py3-none-any.whl (133 kB)
   ---------------------------------------- 133.3/133.3 kB ? eta 0:00:00
Downloading numpy-2.0.1-cp312-cp312-win_amd64.whl (16.3 MB)
   ---------------------------------------- 16.3/16.3 MB 46.9 MB/s eta 0:00:00
Downloading typing_extensions-4.12.2-py3-none-any.whl (37 kB)
Downloading MarkupSafe-2.1.5-cp312-cp312-win_amd64.whl (17 kB)
Building wheels for collected packages: llama-cpp-python
  Building wheel for llama-cpp-python (pyproject.toml) ... error
  error: subprocess-exited-with-error

  × Building wheel for llama-cpp-python (pyproject.toml) did not run successfully.
  │ exit code: 1
  ╰─> [109 lines of output]
      *** scikit-build-core 0.9.8 using CMake 3.30.1 (wheel)
      *** Configuring CMake...
      2024-07-28 19:09:24,577 - scikit_build_core - WARNING - Can't find a Python library, got libdir=None, ldlibrary=None, multiarch=None, masd=None
      loading initial cache file C:\Users\J\AppData\Local\Temp\tmpduqojb02\build\CMakeInit.txt
      -- Building for: Visual Studio 17 2022
      -- Selecting Windows SDK version 10.0.22621.0 to target Windows 10.0.19044.
      -- The C compiler identification is MSVC 19.40.33813.0
      -- The CXX compiler identification is MSVC 19.40.33813.0
      -- Detecting C compiler ABI info
      -- Detecting C compiler ABI info - done
      -- Check for working C compiler: C:/Program Files/Microsoft Visual Studio/2022/Community/VC/Tools/MSVC/14.40.33807/bin/Hostx64/x64/cl.exe - skipped
      -- Detecting C compile features
      -- Detecting C compile features - done
      -- Detecting CXX compiler ABI info
      -- Detecting CXX compiler ABI info - done
      -- Check for working CXX compiler: C:/Program Files/Microsoft Visual Studio/2022/Community/VC/Tools/MSVC/14.40.33807/bin/Hostx64/x64/cl.exe - skipped
      -- Detecting CXX compile features
      -- Detecting CXX compile features - done
      -- Found Git: C:/Program Files/Git/cmd/git.exe (found version "2.43.0.windows.1")
      -- Performing Test CMAKE_HAVE_LIBC_PTHREAD
      -- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Failed
      -- Looking for pthread_create in pthreads
      -- Looking for pthread_create in pthreads - not found
      -- Looking for pthread_create in pthread
      -- Looking for pthread_create in pthread - not found
      -- Found Threads: TRUE
      -- Found OpenMP_C: -openmp (found version "2.0")
      -- Found OpenMP_CXX: -openmp (found version "2.0")
      -- Found OpenMP: TRUE (found version "2.0")
      -- OpenMP found
      -- Using llamafile
      -- Found CUDAToolkit: C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v12.2/include (found version "12.2.91")
      -- CUDA found
      -- Using CUDA architectures: 52;61;70;75
      -- The CUDA compiler identification is NVIDIA 12.2.91
      -- Detecting CUDA compiler ABI info
      -- Detecting CUDA compiler ABI info - failed
      -- Check for working CUDA compiler: C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v12.2/bin/nvcc.exe
      -- Check for working CUDA compiler: C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v12.2/bin/nvcc.exe - broken
      CMake Error at C:/Users/J/AppData/Local/Temp/pip-build-env-pgvyur3w/normal/Lib/site-packages/cmake/data/share/cmake-3.30/Modules/CMakeTestCUDACompiler.cmake:59 (message):
        The CUDA compiler

          "C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v12.2/bin/nvcc.exe"

        is not able to compile a simple test program.

        It fails with the following output:

          Change Dir: 'C:/Users/J/AppData/Local/Temp/tmpduqojb02/build/CMakeFiles/CMakeScratch/TryCompile-1vdtef'

          Run Build Command(s): "C:/Program Files/Microsoft Visual Studio/2022/Community/MSBuild/Current/Bin/amd64/MSBuild.exe" cmTC_001f7.vcxproj /p:Configuration=Debug /p:Platform=x64 /p:VisualStudioVersion=17.0 /v:n
          MSBuild version 17.10.4+10fbfbf2e for .NET Framework
          Build started 7/28/2024 19:09:35.

          Project "C:\Users\J\AppData\Local\Temp\tmpduqojb02\build\CMakeFiles\CMakeScratch\TryCompile-1vdtef\cmTC_001f7.vcxproj" on node 1 (default targets).
          PrepareForBuild:
            Creating directory "cmTC_001f7.dir\Debug\".
          C:\Program Files\Microsoft Visual Studio\2022\Community\MSBuild\Microsoft\VC\v170\Microsoft.CppBuild.targets(541,5): warning MSB8029: The Intermediate directory or Output directory cannot reside under the Temporary directory as it could lead to issues with incremental build. [C:\Users\J\AppData\Local\Temp\tmpduqojb02\build\CMakeFiles\CMakeScratch\TryCompile-1vdtef\cmTC_001f7.vcxproj]
            Structured output is enabled. The formatting of compiler diagnostics will reflect the error hierarchy. See https://aka.ms/cpp/structured-output for more details.
            Creating directory "C:\Users\J\AppData\Local\Temp\tmpduqojb02\build\CMakeFiles\CMakeScratch\TryCompile-1vdtef\Debug\".
            Creating directory "cmTC_001f7.dir\Debug\cmTC_001f7.tlog\".
          InitializeBuildStatus:
            Creating "cmTC_001f7.dir\Debug\cmTC_001f7.tlog\unsuccessfulbuild" because "AlwaysCreate" was specified.
            Touching "cmTC_001f7.dir\Debug\cmTC_001f7.tlog\unsuccessfulbuild".
          AddCudaCompileDeps:
            C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.40.33807\bin\HostX64\x64\cl.exe /E /nologo /showIncludes /TP /D__CUDACC__ /D__CUDACC_VER_MAJOR__=12 /D__CUDACC_VER_MINOR__=2 /D_WINDOWS /DCMAKE_INTDIR="Debug" /D_MBCS /DCMAKE_INTDIR="Debug" /I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.2\bin" /I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.2\include" /I. /FIcuda_runtime.h /c C:\Users\J\AppData\Local\Temp\tmpduqojb02\build\CMakeFiles\CMakeScratch\TryCompile-1vdtef\main.cu
          Project "C:\Users\J\AppData\Local\Temp\tmpduqojb02\build\CMakeFiles\CMakeScratch\TryCompile-1vdtef\cmTC_001f7.vcxproj" (1) is building "C:\Users\J\AppData\Local\Temp\tmpduqojb02\build\CMakeFiles\CMakeScratch\TryCompile-1vdtef\cmTC_001f7.vcxproj" (1:2) on node 1 (CudaBuildCore target(s)).
          CudaBuildCore:
            Compiling CUDA source file main.cu...
            cmd.exe /C "C:\Users\J\AppData\Local\Temp\tmp0225af6b1a3d4dd69ed296c0b0e89efa.cmd"
            "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.2\bin\nvcc.exe"  --use-local-env -ccbin "C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.40.33807\bin\HostX64\x64" -x cu    -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.2\include"     --keep-dir cmTC_001f7\x64\Debug  -maxrregcount=0   --machine 64 --compile -cudart static --generate-code=arch=compute_52,code=[compute_52,sm_52] --generate-code=arch=compute_61,code=[compute_61,sm_61] --generate-code=arch=compute_70,code=[compute_70,sm_70] --generate-code=arch=compute_75,code=[compute_75,sm_75] -Xcompiler="/EHsc -Zi -Ob0" -g  -D_WINDOWS -D"CMAKE_INTDIR=\"Debug\"" -D_MBCS -D"CMAKE_INTDIR=\"Debug\"" -Xcompiler "/EHsc /W1 /nologo /Od /FS /Zi /RTC1 /MDd " -Xcompiler "/FdcmTC_001f7.dir\Debug\vc143.pdb" -o cmTC_001f7.dir\Debug\main.obj "C:\Users\J\AppData\Local\Temp\tmpduqojb02\build\CMakeFiles\CMakeScratch\TryCompile-1vdtef\main.cu"

            (base) C:\Users\J\AppData\Local\Temp\tmpduqojb02\build\CMakeFiles\CMakeScratch\TryCompile-1vdtef>"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.2\bin\nvcc.exe"  --use-local-env -ccbin "C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.40.33807\bin\HostX64\x64" -x cu    -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.2\include"     --keep-dir cmTC_001f7\x64\Debug  -maxrregcount=0   --machine 64 --compile -cudart static --generate-code=arch=compute_52,code=[compute_52,sm_52] --generate-code=arch=compute_61,code=[compute_61,sm_61] --generate-code=arch=compute_70,code=[compute_70,sm_70] --generate-code=arch=compute_75,code=[compute_75,sm_75] -Xcompiler="/EHsc -Zi -Ob0" -g  -D_WINDOWS -D"CMAKE_INTDIR=\"Debug\"" -D_MBCS -D"CMAKE_INTDIR=\"Debug\"" -Xcompiler "/EHsc /W1 /nologo /Od /FS /Zi /RTC1 /MDd " -Xcompiler "/FdcmTC_001f7.dir\Debug\vc143.pdb" -o cmTC_001f7.dir\Debug\main.obj "C:\Users\J\AppData\Local\Temp\tmpduqojb02\build\CMakeFiles\CMakeScratch\TryCompile-1vdtef\main.cu"
          C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.2\include\crt/host_config.h(157): fatal error C1189: #error:  -- unsupported Microsoft Visual Studio version! Only the versions between 2017 and 2022 (inclusive) are supported! The nvcc flag '-allow-unsupported-compiler' can be used to override this version check; however, using an unsupported host compiler may cause compilation failure or incorrect run time execution. Use at your own risk. [C:\Users\J\AppData\Local\Temp\tmpduqojb02\build\CMakeFiles\CMakeScratch\TryCompile-1vdtef\cmTC_001f7.vcxproj]
            main.cu
          C:\Program Files\Microsoft Visual Studio\2022\Community\MSBuild\Microsoft\VC\v170\BuildCustomizations\CUDA 12.2.targets(799,9): error MSB3721: The command ""C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.2\bin\nvcc.exe"  --use-local-env -ccbin "C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.40.33807\bin\HostX64\x64" -x cu    -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.2\include"     --keep-dir cmTC_001f7\x64\Debug  -maxrregcount=0   --machine 64 --compile -cudart static --generate-code=arch=compute_52,code=[compute_52,sm_52] --generate-code=arch=compute_61,code=[compute_61,sm_61] --generate-code=arch=compute_70,code=[compute_70,sm_70] --generate-code=arch=compute_75,code=[compute_75,sm_75] -Xcompiler="/EHsc -Zi -Ob0" -g  -D_WINDOWS -D"CMAKE_INTDIR=\"Debug\"" -D_MBCS -D"CMAKE_INTDIR=\"Debug\"" -Xcompiler "/EHsc /W1 /nologo /Od /FS /Zi /RTC1 /MDd " -Xcompiler "/FdcmTC_001f7.dir\Debug\vc143.pdb" -o cmTC_001f7.dir\Debug\main.obj "C:\Users\J\AppData\Local\Temp\tmpduqojb02\build\CMakeFiles\CMakeScratch\TryCompile-1vdtef\main.cu"" exited with code 2. [C:\Users\J\AppData\Local\Temp\tmpduqojb02\build\CMakeFiles\CMakeScratch\TryCompile-1vdtef\cmTC_001f7.vcxproj]
          Done Building Project "C:\Users\J\AppData\Local\Temp\tmpduqojb02\build\CMakeFiles\CMakeScratch\TryCompile-1vdtef\cmTC_001f7.vcxproj" (CudaBuildCore target(s)) -- FAILED.
          Done Building Project "C:\Users\J\AppData\Local\Temp\tmpduqojb02\build\CMakeFiles\CMakeScratch\TryCompile-1vdtef\cmTC_001f7.vcxproj" (default targets) -- FAILED.

          Build FAILED.

          "C:\Users\J\AppData\Local\Temp\tmpduqojb02\build\CMakeFiles\CMakeScratch\TryCompile-1vdtef\cmTC_001f7.vcxproj" (default target) (1) ->
          (PrepareForBuild target) ->
            C:\Program Files\Microsoft Visual Studio\2022\Community\MSBuild\Microsoft\VC\v170\Microsoft.CppBuild.targets(541,5): warning MSB8029: The Intermediate directory or Output directory cannot reside under the Temporary directory as it could lead to issues with incremental build. [C:\Users\J\AppData\Local\Temp\tmpduqojb02\build\CMakeFiles\CMakeScratch\TryCompile-1vdtef\cmTC_001f7.vcxproj]

          "C:\Users\J\AppData\Local\Temp\tmpduqojb02\build\CMakeFiles\CMakeScratch\TryCompile-1vdtef\cmTC_001f7.vcxproj" (default target) (1) ->
          "C:\Users\J\AppData\Local\Temp\tmpduqojb02\build\CMakeFiles\CMakeScratch\TryCompile-1vdtef\cmTC_001f7.vcxproj" (CudaBuildCore target) (1:2) ->
          (CudaBuildCore target) ->
            C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.2\include\crt/host_config.h(157): fatal error C1189: #error:  -- unsupported Microsoft Visual Studio version! Only the versions between 2017 and 2022 (inclusive) are supported! The nvcc flag '-allow-unsupported-compiler' can be used to override this version check; however, using an unsupported host compiler may cause compilation failure or incorrect run time execution. Use at your own risk. [C:\Users\J\AppData\Local\Temp\tmpduqojb02\build\CMakeFiles\CMakeScratch\TryCompile-1vdtef\cmTC_001f7.vcxproj]
            C:\Program Files\Microsoft Visual Studio\2022\Community\MSBuild\Microsoft\VC\v170\BuildCustomizations\CUDA 12.2.targets(799,9): error MSB3721: The command ""C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.2\bin\nvcc.exe"  --use-local-env -ccbin "C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.40.33807\bin\HostX64\x64" -x cu    -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.2\include"     --keep-dir cmTC_001f7\x64\Debug  -maxrregcount=0   --machine 64 --compile -cudart static --generate-code=arch=compute_52,code=[compute_52,sm_52] --generate-code=arch=compute_61,code=[compute_61,sm_61] --generate-code=arch=compute_70,code=[compute_70,sm_70] --generate-code=arch=compute_75,code=[compute_75,sm_75] -Xcompiler="/EHsc -Zi -Ob0" -g  -D_WINDOWS -D"CMAKE_INTDIR=\"Debug\"" -D_MBCS -D"CMAKE_INTDIR=\"Debug\"" -Xcompiler "/EHsc /W1 /nologo /Od /FS /Zi /RTC1 /MDd " -Xcompiler "/FdcmTC_001f7.dir\Debug\vc143.pdb" -o cmTC_001f7.dir\Debug\main.obj "C:\Users\J\AppData\Local\Temp\tmpduqojb02\build\CMakeFiles\CMakeScratch\TryCompile-1vdtef\main.cu"" exited with code 2. [C:\Users\J\AppData\Local\Temp\tmpduqojb02\build\CMakeFiles\CMakeScratch\TryCompile-1vdtef\cmTC_001f7.vcxproj]

              1 Warning(s)
              2 Error(s)

          Time Elapsed 00:00:00.58

        CMake will not be able to correctly generate this project.
      Call Stack (most recent call first):
        vendor/llama.cpp/ggml/src/CMakeLists.txt:271 (enable_language)

      -- Configuring incomplete, errors occurred!

      *** CMake configuration failed
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
  ERROR: Failed building wheel for llama-cpp-python
Failed to build llama-cpp-python
ERROR: Could not build wheels for llama-cpp-python, which is required to install pyproject.toml-based projects

I think this part could be the problem

The CUDA compiler

          "C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v12.2/bin/nvcc.exe"

        is not able to compile a simple test program.

        It fails with the following output:

Please be gentle with me as I'm a newcomer trying to figure this out

kot197 commented 1 month ago

Still stuck on installation

Uninstalled everything and reinstalling

I have NVIDIA CUDA 12.5.1 I have Visual Studio 2022

now this is the error i got:

Building wheels for collected packages: llama-cpp-python
  Building wheel for llama-cpp-python (pyproject.toml) ... error
  error: subprocess-exited-with-error

  × Building wheel for llama-cpp-python (pyproject.toml) did not run successfully.
  │ exit code: 1
  ╰─> [17 lines of output]
      *** scikit-build-core 0.9.8 using CMake 3.30.1 (wheel)
      *** Configuring CMake...
      2024-07-28 23:19:17,161 - scikit_build_core - WARNING - Can't find a Python library, got libdir=None, ldlibrary=None, multiarch=None, masd=None
      loading initial cache file C:\Users\J\AppData\Local\Temp\tmpcuu0ov3w\build\CMakeInit.txt
      -- Building for: Visual Studio 15 2017 Win64
      CMake Error at CMakeLists.txt:3 (project):
        Generator

          Visual Studio 15 2017 Win64

        could not find any instance of Visual Studio.

      -- Configuring incomplete, errors occurred!

      *** CMake configuration failed
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
  ERROR: Failed building wheel for llama-cpp-python
Failed to build llama-cpp-python
ERROR: Could not build wheels for llama-cpp-python, which is required to install pyproject.toml-based projects

I ran the following commands

(base) C:\Users\J>set CMAKE_ARGS=-DLLAMA_CUBLAS=on

(base) C:\Users\J>set FORCE_CMAKE=1

(base) C:\Users\J>pip install llama-cpp-python --force-reinstall --upgrade --no-cache-dir

why is it saying:

-- Building for: Visual Studio 15 2017 Win64
      CMake Error at CMakeLists.txt:3 (project):
        Generator

          Visual Studio 15 2017 Win64

        could not find any instance of Visual Studio.

could this be the problem

image

kot197 commented 1 month ago

After a lot of things... I'm stuck at 'Building wheels for collected packages: llama-cpp-python' and I can't troubleshoot anymore because there is no error messages.

I think I'm giving up...is there another package as an alternative to this Wish there is more comprehensive guide on this, sad. I spent my entire day on installing this package.

EDIT: It's just keep printing these non-stop when I change to --verbose

(base) C:\Users\J\AppData\Local\Temp\tmpv0s5ys_x\build\vendor\llama.cpp\ggml\src>"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\bin\nvcc.exe"  --use-local-env -ccbin "C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.40.33807\bin\HostX64\x64" -x cu   -I"C:\Users\J\AppData\Local\Temp\pip-install-soxh_yjh\llama-cpp-python_e04e8cfebc0d46d38d6c4b3c28bb6bb4\vendor\llama.cpp\ggml\src\..\include" -I"C:\Users\J\AppData\Local\Temp\pip-install-soxh_yjh\llama-cpp-python_e04e8cfebc0d46d38d6c4b3c28bb6bb4\vendor\llama.cpp\ggml\src\." -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\include" -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\include"     --keep-dir ggml\x64\Release -use_fast_math -maxrregcount=0   --machine 64 --compile -cudart static --generate-code=arch=compute_52,code=[compute_52,sm_52] --generate-code=arch=compute_61,code=[compute_61,sm_61] --generate-code=arch=compute_70,code=[compute_70,sm_70] --generate-code=arch=compute_75,code=[compute_75,sm_75] -Xcompiler="/EHsc -Ob2 /arch:AVX2"   -D_WINDOWS -DNDEBUG -DGGML_USE_CUDA -DGGML_SHARED -DGGML_BUILD -D_CRT_SECURE_NO_WARNINGS -DGGML_SCHED_MAX_COPIES=4 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DK_QUANTS_PER_ITERATION=2 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -D_XOPEN_SOURCE=600 -D"CMAKE_INTDIR=\"Release\"" -Dggml_EXPORTS -D_WINDLL -D_MBCS -DWIN32 -D_WINDOWS -DNDEBUG -DGGML_USE_CUDA -DGGML_SHARED -DGGML_BUILD -D_CRT_SECURE_NO_WARNINGS -DGGML_SCHED_MAX_COPIES=4 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DK_QUANTS_PER_ITERATION=2 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -D_XOPEN_SOURCE=600 -D"CMAKE_INTDIR=\"Release\"" -Dggml_EXPORTS -Xcompiler "/EHsc /W1 /nologo /O2 /FS   /MD " -Xcompiler "/Fdggml.dir\Release\vc143.pdb" -o ggml.dir\Release\mmq-instance-q2_k.obj "C:\Users\J\AppData\Local\Temp\pip-install-soxh_yjh\llama-cpp-python_e04e8cfebc0d46d38d6c4b3c28bb6bb4\vendor\llama.cpp\ggml\src\ggml-cuda\template-instances\mmq-instance-q2_k.cu"
    mmq-instance-q2_k.cu
    tmpxft_000010e8_00000000-7_mmq-instance-q2_k.compute_75.cudafe1.cpp
  Done Building Project "C:\Users\J\AppData\Local\Temp\tmpv0s5ys_x\build\vendor\llama.cpp\ggml\src\ggml.vcxproj" (CudaBuildCore target(s)).
  Project "C:\Users\J\AppData\Local\Temp\tmpv0s5ys_x\build\vendor\llama.cpp\ggml\src\ggml.vcxproj" (5) is building "C:\Users\J\AppData\Local\Temp\tmpv0s5ys_x\build\vendor\llama.cpp\ggml\src\ggml.vcxproj" (5:46) on node 1 (CudaBuildCore target(s)).
  CudaBuildCore:
    Compiling CUDA source file ..\..\..\..\..\..\pip-install-soxh_yjh\llama-cpp-python_e04e8cfebc0d46d38d6c4b3c28bb6bb4\vendor\llama.cpp\ggml\src\ggml-cuda\template-instances\mmq-instance-q3_k.cu...
    cmd.exe /C "C:\Users\J\AppData\Local\Temp\tmp53333b2d379d496490439ae428dbbe15.cmd"
    "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\bin\nvcc.exe"  --use-local-env -ccbin "C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.40.33807\bin\HostX64\x64" -x cu   -I"C:\Users\J\AppData\Local\Temp\pip-install-soxh_yjh\llama-cpp-python_e04e8cfebc0d46d38d6c4b3c28bb6bb4\vendor\llama.cpp\ggml\src\..\include" -I"C:\Users\J\AppData\Local\Temp\pip-install-soxh_yjh\llama-cpp-python_e04e8cfebc0d46d38d6c4b3c28bb6bb4\vendor\llama.cpp\ggml\src\." -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\include" -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\include"     --keep-dir ggml\x64\Release -use_fast_math -maxrregcount=0   --machine 64 --compile -cudart static --generate-code=arch=compute_52,code=[compute_52,sm_52] --generate-code=arch=compute_61,code=[compute_61,sm_61] --generate-code=arch=compute_70,code=[compute_70,sm_70] --generate-code=arch=compute_75,code=[compute_75,sm_75] -Xcompiler="/EHsc -Ob2 /arch:AVX2"   -D_WINDOWS -DNDEBUG -DGGML_USE_CUDA -DGGML_SHARED -DGGML_BUILD -D_CRT_SECURE_NO_WARNINGS -DGGML_SCHED_MAX_COPIES=4 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DK_QUANTS_PER_ITERATION=2 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -D_XOPEN_SOURCE=600 -D"CMAKE_INTDIR=\"Release\"" -Dggml_EXPORTS -D_WINDLL -D_MBCS -DWIN32 -D_WINDOWS -DNDEBUG -DGGML_USE_CUDA -DGGML_SHARED -DGGML_BUILD -D_CRT_SECURE_NO_WARNINGS -DGGML_SCHED_MAX_COPIES=4 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DK_QUANTS_PER_ITERATION=2 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -D_XOPEN_SOURCE=600 -D"CMAKE_INTDIR=\"Release\"" -Dggml_EXPORTS -Xcompiler "/EHsc /W1 /nologo /O2 /FS   /MD " -Xcompiler "/Fdggml.dir\Release\vc143.pdb" -o ggml.dir\Release\mmq-instance-q3_k.obj "C:\Users\J\AppData\Local\Temp\pip-install-soxh_yjh\llama-cpp-python_e04e8cfebc0d46d38d6c4b3c28bb6bb4\vendor\llama.cpp\ggml\src\ggml-cuda\template-instances\mmq-instance-q3_k.cu"

    (base) C:\Users\J\AppData\Local\Temp\tmpv0s5ys_x\build\vendor\llama.cpp\ggml\src>"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\bin\nvcc.exe"  --use-local-env -ccbin "C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.40.33807\bin\HostX64\x64" -x cu   -I"C:\Users\J\AppData\Local\Temp\pip-install-soxh_yjh\llama-cpp-python_e04e8cfebc0d46d38d6c4b3c28bb6bb4\vendor\llama.cpp\ggml\src\..\include" -I"C:\Users\J\AppData\Local\Temp\pip-install-soxh_yjh\llama-cpp-python_e04e8cfebc0d46d38d6c4b3c28bb6bb4\vendor\llama.cpp\ggml\src\." -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\include" -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\include"     --keep-dir ggml\x64\Release -use_fast_math -maxrregcount=0   --machine 64 --compile -cudart static --generate-code=arch=compute_52,code=[compute_52,sm_52] --generate-code=arch=compute_61,code=[compute_61,sm_61] --generate-code=arch=compute_70,code=[compute_70,sm_70] --generate-code=arch=compute_75,code=[compute_75,sm_75] -Xcompiler="/EHsc -Ob2 /arch:AVX2"   -D_WINDOWS -DNDEBUG -DGGML_USE_CUDA -DGGML_SHARED -DGGML_BUILD -D_CRT_SECURE_NO_WARNINGS -DGGML_SCHED_MAX_COPIES=4 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DK_QUANTS_PER_ITERATION=2 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -D_XOPEN_SOURCE=600 -D"CMAKE_INTDIR=\"Release\"" -Dggml_EXPORTS -D_WINDLL -D_MBCS -DWIN32 -D_WINDOWS -DNDEBUG -DGGML_USE_CUDA -DGGML_SHARED -DGGML_BUILD -D_CRT_SECURE_NO_WARNINGS -DGGML_SCHED_MAX_COPIES=4 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DK_QUANTS_PER_ITERATION=2 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -D_XOPEN_SOURCE=600 -D"CMAKE_INTDIR=\"Release\"" -Dggml_EXPORTS -Xcompiler "/EHsc /W1 /nologo /O2 /FS   /MD " -Xcompiler "/Fdggml.dir\Release\vc143.pdb" -o ggml.dir\Release\mmq-instance-q3_k.obj "C:\Users\J\AppData\Local\Temp\pip-install-soxh_yjh\llama-cpp-python_e04e8cfebc0d46d38d6c4b3c28bb6bb4\vendor\llama.cpp\ggml\src\ggml-cuda\template-instances\mmq-instance-q3_k.cu"
gformcreation commented 1 month ago

Hi @kot197, As i can see you are using cuda 12.2 use the following command,

pip install llama-cpp-python \
  --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cu121

This will install a pre-built version on your pc.

kot197 commented 1 month ago

Hi @gformcreation ,

Installation still stuck on infinite loop, the error keeps printing this:

(base) C:\Users\J\AppData\Local\Temp\tmp7ppzklvp\build\vendor\llama.cpp\ggml\src>"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4\bin\nvcc.exe"  --use-local-env -ccbin "C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.40.33807\bin\HostX64\x64" -x cu   -I"C:\Users\J\AppData\Local\Temp\pip-install-53hjbtws\llama-cpp-python_0984bb5db8f142cd8aa50adfdd94ba58\vendor\llama.cpp\ggml\src\..\include" -I"C:\Users\J\AppData\Local\Temp\pip-install-53hjbtws\llama-cpp-python_0984bb5db8f142cd8aa50adfdd94ba58\vendor\llama.cpp\ggml\src\." -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4\include" -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4\include"     --keep-dir ggml\x64\Release -use_fast_math -maxrregcount=0   --machine 64 --compile -cudart static --generate-code=arch=compute_52,code=[compute_52,sm_52] --generate-code=arch=compute_61,code=[compute_61,sm_61] --generate-code=arch=compute_70,code=[compute_70,sm_70] --generate-code=arch=compute_75,code=[compute_75,sm_75] -Xcompiler="/EHsc -Ob2 /arch:AVX2"   -D_WINDOWS -DNDEBUG -DGGML_USE_CUDA -DGGML_SHARED -DGGML_BUILD -D_CRT_SECURE_NO_WARNINGS -DGGML_SCHED_MAX_COPIES=4 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DK_QUANTS_PER_ITERATION=2 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -D_XOPEN_SOURCE=600 -D"CMAKE_INTDIR=\"Release\"" -Dggml_EXPORTS -D_WINDLL -D_MBCS -DWIN32 -D_WINDOWS -DNDEBUG -DGGML_USE_CUDA -DGGML_SHARED -DGGML_BUILD -D_CRT_SECURE_NO_WARNINGS -DGGML_SCHED_MAX_COPIES=4 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DK_QUANTS_PER_ITERATION=2 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -D_XOPEN_SOURCE=600 -D"CMAKE_INTDIR=\"Release\"" -Dggml_EXPORTS -Xcompiler "/EHsc /W1 /nologo /O2 /FS   /MD " -Xcompiler "/Fdggml.dir\Release\vc143.pdb" -o ggml.dir\Release\fattn-wmma-f16-instance-kqfloat-cpb32.obj "C:\Users\J\AppData\Local\Temp\pip-install-53hjbtws\llama-cpp-python_0984bb5db8f142cd8aa50adfdd94ba58\vendor\llama.cpp\ggml\src\ggml-cuda\template-instances\fattn-wmma-f16-instance-kqfloat-cpb32.cu"

This happens after I installed CUDA 12.4