Can't install llama-cpp-python with HIPBLAS/ROCm on Windows

mbamg commented 5 months ago

I have a RX 6900XT GPU, and after installing ROCm 5.7 I followed the instructions to install llama-cpp-python with HIPBLAS=on, but got the error of "Building wheel for llama-cpp-python (pyproject.toml) did not run successfully".

Full error log: llama-cpp-python-hipblas-error.txt

As with the previously closed but unaddressed #1009, my debugging efforts have led me to believe that the wrong C and C++ compilers are being chosen for the cmake build:

MSVC is selected instead of clang

-- Building for: Visual Studio 17 2022
  -- Selecting Windows SDK version 10.0.22621.0 to target Windows 10.0.19045.
  -- The C compiler identification is MSVC 19.39.33523.0
  -- The CXX compiler identification is MSVC 19.39.33523.0
  -- Check for working C compiler: C:/Program Files/Microsoft Visual Studio/2022/Community/VC/Tools/MSVC/14.39.33519/bin/Hostx64/x64/cl.exe
  -- Check for working C compiler: C:/Program Files/Microsoft Visual Studio/2022/Community/VC/Tools/MSVC/14.39.33519/bin/Hostx64/x64/cl.exe - works
  -- Detecting C compile features
  -- Detecting C compile features - done
  -- Check for working CXX compiler: C:/Program Files/Microsoft Visual Studio/2022/Community/VC/Tools/MSVC/14.39.33519/bin/Hostx64/x64/cl.exe
  -- Check for working CXX compiler: C:/Program Files/Microsoft Visual Studio/2022/Community/VC/Tools/MSVC/14.39.33519/bin/Hostx64/x64/cl.exe - works
  -- Detecting CXX compile features
  -- Detecting CXX compile features - done

A clang-only option ('-x') is then ignored during compilation

ClCompile:
C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.39.33519\bin\HostX64\x64\CL.exe /c /I"C:\Users\mbamg\AppData\Local\Temp\pip-install-ps72vnaz\llama-cpp-python_1bb2b9676f39468bba7efbe70e3a1f33\vendor\llama.cpp\." /nologo /W1 /WX- /diagnostics:column /O2 /Ob2 /D _MBCS /D WIN32 /D _WINDOWS /D NDEBUG /D GGML_SCHED_MAX_COPIES=4 /D GGML_USE_LLAMAFILE /D GGML_USE_HIPBLAS /D GGML_USE_CUDA /D GGML_CUDA_DMMV_X=32 /D GGML_CUDA_MMV_Y=1 /D K_QUANTS_PER_ITERATION=2 /D _CRT_SECURE_NO_WARNINGS /D _XOPEN_SOURCE=600 /D __HIP_PLATFORM_HCC__=1 /D __HIP_PLATFORM_AMD__=1 /D "CMAKE_INTDIR=\"Release\"" /Gm- /EHsc /MD /GS /arch:AVX2 /fp:precise /Zc:wchar_t /Zc:forScope /Zc:inline /std:c11 /Fo"ggml.dir\Release\\" /Fd"ggml.dir\Release\ggml.pdb" /external:W0 /Gd /TC /errorReport:queue  /external:I "C:/Program Files/AMD/ROCm/5.7/include" -x hip "C:\Users\mbamg\AppData\Local\Temp\pip-install-ps72vnaz\llama-cpp-python_1bb2b9676f39468bba7efbe70e3a1f33\vendor\llama.cpp\ggml.c" "C:\Users\mbamg\AppData\Local\Temp\pip-install-ps72vnaz\llama-cpp-python_1bb2b9676f39468bba7efbe70e3a1f33\vendor\llama.cpp\ggml-alloc.c" "C:\Users\mbamg\AppData\Local\Temp\pip-install-ps72vnaz\llama-cpp-python_1bb2b9676f39468bba7efbe70e3a1f33\vendor\llama.cpp\ggml-backend.c" "C:\Users\mbamg\AppData\Local\Temp\pip-install-ps72vnaz\llama-cpp-python_1bb2b9676f39468bba7efbe70e3a1f33\vendor\llama.cpp\ggml-quants.c"
cl : command line  warning D9002: ignoring unknown option '-x' [C:\Users\mbamg\AppData\Local\Temp\tmpd9u_s_jt\build\vendor\llama.cpp\ggml.vcxproj]

The subsequent argument ('hip') is interpreted as a non-existent source file

/hip(1,1): error C1083: Cannot open source file: 'hip': No such file or directory [C:\Users\mbamg\AppData\Local\Temp\tmpd9u_s_jt\build\vendor\llama.cpp\ggml.vcxproj]
(compiling source file '/hip')

The build fails

"C:\Users\mbamg\AppData\Local\Temp\tmpd9u_s_jt\build\ALL_BUILD.vcxproj" (default target) (1) ->
"C:\Users\mbamg\AppData\Local\Temp\tmpd9u_s_jt\build\vendor\llama.cpp\ggml.vcxproj" (default target) (4) ->
(ClCompile target) ->
/hip(1,1): error C1083: Cannot open source file: 'hip': No such file or directory [C:\Users\mbamg\AppData\Local\Temp\tmpd9u_s_jt\build\vendor\llama.cpp\ggml.vcxproj]

  1 Warning(s)
  1 Error(s)

Time Elapsed 00:00:02.54

*** CMake build failed

As with the original reporter, I've also tried setting CMake environment variables to force Clang compilation, with no change in result:

[Environment]::SetEnvironmentVariable('CMAKE_ARGS', "-DLLAMA_HIPBLAS=on -DCMAKE_CXX_COMPILER='C:/Program Files/AMD/ROCm/5.7/bin/clang++.exe' -DCMAKE_C_ABI_COMPILED=FALSE -DCMAKE_CXX_ABI_COMPILED=FALSE -DCMAKE_CXX_STANDARD=17 -DCMAKE_CXX_STANDARD_REQUIRED=ON -DCMAKE_CXX_EXTENSIONS=OFF")

[Environment]::SetEnvironmentVariable('CMAKE_ARGS', "-DLLAMA_HIPBLAS=on -DCXX='C:/Program Files/AMD/ROCm/5.7/bin/clang++.exe' -DCMAKE_C_ABI_COMPILED=FALSE -DCMAKE_CXX_ABI_COMPILED=FALSE -DCMAKE_CXX_STANDARD=17 -DCMAKE_CXX_STANDARD_REQUIRED=ON -DCMAKE_CXX_EXTENSIONS=OFF")

PS: Reading through #40 it's rather concerning that MSVC might be needed for Windows compilation. Is this still the case?

PlankoAdam commented 5 months ago

I had the same problem, but eventually was able to install it using these flags:

pip install --upgrade --force-reinstall --no-cache-dir llama-cpp-python -C cmake.args="-DAMDGPU_TARGETS=gfx1032 -DLLAMA_HIPBLAS=ON -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++ -DCMAKE_BUILD_TYPE=Release"

Note, that I already had Visual Studio installed. Also note that I used the flag -DAMDGPU_TARGETS=gfx1032, because I have an RX 6650XT.

I have also installed the HIP SDK for Windows and the Python package seems to have been installed correctly, HOWEVER when I run the model using LangChain, the program seems to use only my CPU and main memory, GPU usage doesn't change (even though n_gpu_layers is set to 35), and it has the same performance as when installing llama-cpp-python without any flags (slow).

I would be glad if someone could help me figure this out!

GoodVessel92551 commented 5 months ago

I have the same problem with a RX 7900 XT, I have Visual Studio installed and I am able to get it to work on the CPU but not the GPU.

Engininja2 commented 5 months ago

Compiling llama.cpp for HIPBLAS on Windows needs a generator passed to CMake too. I tried -G Ninja at first except it kept building (excruciatingly slow) Debug no matter what I tried, but -G "Ninja Multi-Config" works.

Unfortunately the following workaround results in another error because of how AMD built the hip sdk: https://github.com/abetlen/llama-cpp-python/blob/10b7c50cd2055db575405b8ab3bd9c07979d557a/CMakeLists.txt#L43-L50

CMake tries to install amdhip64.dll into the wheel but can't find it because it's in c:\windows.

After commenting those lines out it builds & runs. This is what I used in the end from a VS x64 Native Tools command prompt:

set CMAKE_ARGS=-DLLAMA_HIPBLAS=on -DAMDGPU_TARGETS=gfx1010 -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++ -G "Ninja Multi-Config"
pip install --force-reinstall ./llama-cpp-python

I also have C:\Program Files\AMD\ROCm\5.7\bin set in my PATH.

GoodVessel92551 commented 5 months ago

Compiling llama.cpp for HIPBLAS on Windows needs a generator passed to CMake too. I tried -G Ninja at first except it kept building (excruciatingly slow) Debug no matter what I tried, but -G "Ninja Multi-Config" works.

Unfortunately the following workaround results in another error because of how AMD built the hip sdk:

https://github.com/abetlen/llama-cpp-python/blob/10b7c50cd2055db575405b8ab3bd9c07979d557a/CMakeLists.txt#L43-L50

CMake tries to install amdhip64.dll into the wheel but can't find it because it's in c:\windows.

After commenting those lines out it builds & runs. This is what I used in the end from a VS x64 Native Tools command prompt:
set CMAKE_ARGS=-DLLAMA_HIPBLAS=on -DAMDGPU_TARGETS=gfx1010 -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++ -G "Ninja Multi-Config"
pip install --force-reinstall ./llama-cpp-python
I also have C:\Program Files\AMD\ROCm\5.7\bin set in my PATH.

This work to install it and when I load the model it gets offloaded into the gpu memory

ggml_cuda_init: GGML_CUDA_FORCE_MMQ:   no
ggml_cuda_init: CUDA_USE_TENSOR_CORES: yes
ggml_cuda_init: found 2 ROCm devices:
  Device 0: AMD Radeon RX 7900 XT, compute capability 11.0, VMM: no
  Device 1: AMD Radeon(TM) Graphics, compute capability 10.3, VMM: no
llm_load_tensors: ggml ctx size =    0.25 MiB
llm_load_tensors: offloading 18 repeating layers to GPU
llm_load_tensors: offloading non-repeating layers to GPU
llm_load_tensors: offloaded 19/19 layers to GPU

However when I try to get a response from the model I get this error:

ggml_cuda_compute_forward: RMS_NORM failed
CUDA error: invalid device function
  current device: 1, in function ggml_cuda_compute_forward at C:/Code/llama-cpp/llama-cpp-python/vendor/llama.cpp/ggml-cuda.cu:2360
  err
GGML_ASSERT: C:/Code/llama-cpp/llama-cpp-python/vendor/llama.cpp/ggml-cuda.cu:100: !"CUDA error"

Engininja2 commented 5 months ago

ggml_cuda_init: found 2 ROCm devices: Device 0: AMD Radeon RX 7900 XT, compute capability 11.0, VMM: no Device 1: AMD Radeon(TM) Graphics, compute capability 10.3, VMM: no

I'm assuming that you used -DAMDGPU_TARGETS=gfx1100 for your 7900 XT, and that the second GPU is an iGPU that isn't gfx1030

Try setting the environment variable HIP_VISIBLE_DEVICES=0 before & when running python so that device#1 is hidden from llama.cpp & rocblas.

GoodVessel92551 commented 5 months ago

Try setting the environment variable HIP_VISIBLE_DEVICES=0 before & when running python so that device#1 is hidden from llama.cpp & rocblas.

This worked, Thank You!

Dajinu commented 5 months ago

@Engininja2 You are my hero! Thank you so much!!! I've spent hours and hours trying to figure out how to build this thing with no success. I was literally going crazy. You just saved my life! Kudos to you! I just wonder why devs can never explain properly how to build their own piece of sh*t? Really infuriating!

abetlen / llama-cpp-python

Can't install llama-cpp-python with HIPBLAS/ROCm on Windows #1489