Open mbamg opened 5 months ago
I had the same problem, but eventually was able to install it using these flags:
pip install --upgrade --force-reinstall --no-cache-dir llama-cpp-python -C cmake.args="-DAMDGPU_TARGETS=gfx1032 -DLLAMA_HIPBLAS=ON -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++ -DCMAKE_BUILD_TYPE=Release"
Note, that I already had Visual Studio installed.
Also note that I used the flag -DAMDGPU_TARGETS=gfx1032
, because I have an RX 6650XT.
I have also installed the HIP SDK for Windows and the Python package seems to have been installed correctly, HOWEVER when I run the model using LangChain, the program seems to use only my CPU and main memory, GPU usage doesn't change (even though n_gpu_layers is set to 35), and it has the same performance as when installing llama-cpp-python without any flags (slow).
I would be glad if someone could help me figure this out!
I have the same problem with a RX 7900 XT, I have Visual Studio installed and I am able to get it to work on the CPU but not the GPU.
Compiling llama.cpp for HIPBLAS on Windows needs a generator passed to CMake too. I tried -G Ninja
at first except it kept building (excruciatingly slow) Debug no matter what I tried, but -G "Ninja Multi-Config"
works.
Unfortunately the following workaround results in another error because of how AMD built the hip sdk: https://github.com/abetlen/llama-cpp-python/blob/10b7c50cd2055db575405b8ab3bd9c07979d557a/CMakeLists.txt#L43-L50
CMake tries to install amdhip64.dll into the wheel but can't find it because it's in c:\windows.
After commenting those lines out it builds & runs. This is what I used in the end from a VS x64 Native Tools command prompt:
set CMAKE_ARGS=-DLLAMA_HIPBLAS=on -DAMDGPU_TARGETS=gfx1010 -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++ -G "Ninja Multi-Config"
pip install --force-reinstall ./llama-cpp-python
I also have C:\Program Files\AMD\ROCm\5.7\bin
set in my PATH.
Compiling llama.cpp for HIPBLAS on Windows needs a generator passed to CMake too. I tried
-G Ninja
at first except it kept building (excruciatingly slow) Debug no matter what I tried, but-G "Ninja Multi-Config"
works.Unfortunately the following workaround results in another error because of how AMD built the hip sdk:
CMake tries to install amdhip64.dll into the wheel but can't find it because it's in c:\windows.
After commenting those lines out it builds & runs. This is what I used in the end from a VS x64 Native Tools command prompt:
set CMAKE_ARGS=-DLLAMA_HIPBLAS=on -DAMDGPU_TARGETS=gfx1010 -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++ -G "Ninja Multi-Config" pip install --force-reinstall ./llama-cpp-python
I also have
C:\Program Files\AMD\ROCm\5.7\bin
set in my PATH.
This work to install it and when I load the model it gets offloaded into the gpu memory
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: CUDA_USE_TENSOR_CORES: yes
ggml_cuda_init: found 2 ROCm devices:
Device 0: AMD Radeon RX 7900 XT, compute capability 11.0, VMM: no
Device 1: AMD Radeon(TM) Graphics, compute capability 10.3, VMM: no
llm_load_tensors: ggml ctx size = 0.25 MiB
llm_load_tensors: offloading 18 repeating layers to GPU
llm_load_tensors: offloading non-repeating layers to GPU
llm_load_tensors: offloaded 19/19 layers to GPU
However when I try to get a response from the model I get this error:
ggml_cuda_compute_forward: RMS_NORM failed
CUDA error: invalid device function
current device: 1, in function ggml_cuda_compute_forward at C:/Code/llama-cpp/llama-cpp-python/vendor/llama.cpp/ggml-cuda.cu:2360
err
GGML_ASSERT: C:/Code/llama-cpp/llama-cpp-python/vendor/llama.cpp/ggml-cuda.cu:100: !"CUDA error"
ggml_cuda_init: found 2 ROCm devices: Device 0: AMD Radeon RX 7900 XT, compute capability 11.0, VMM: no Device 1: AMD Radeon(TM) Graphics, compute capability 10.3, VMM: no
I'm assuming that you used -DAMDGPU_TARGETS=gfx1100
for your 7900 XT, and that the second GPU is an iGPU that isn't gfx1030
Try setting the environment variable HIP_VISIBLE_DEVICES=0
before & when running python so that device#1 is hidden from llama.cpp & rocblas.
Try setting the environment variable
HIP_VISIBLE_DEVICES=0
before & when running python so that device#1 is hidden from llama.cpp & rocblas.
This worked, Thank You!
@Engininja2 You are my hero! Thank you so much!!! I've spent hours and hours trying to figure out how to build this thing with no success. I was literally going crazy. You just saved my life! Kudos to you! I just wonder why devs can never explain properly how to build their own piece of sh*t? Really infuriating!
I have a RX 6900XT GPU, and after installing ROCm 5.7 I followed the instructions to install llama-cpp-python with HIPBLAS=on, but got the error of "Building wheel for llama-cpp-python (pyproject.toml) did not run successfully".
Full error log: llama-cpp-python-hipblas-error.txt
As with the previously closed but unaddressed #1009, my debugging efforts have led me to believe that the wrong C and C++ compilers are being chosen for the cmake build:
A clang-only option ('-x') is then ignored during compilation
The subsequent argument ('hip') is interpreted as a non-existent source file
The build fails
As with the original reporter, I've also tried setting CMake environment variables to force Clang compilation, with no change in result:
[Environment]::SetEnvironmentVariable('CMAKE_ARGS', "-DLLAMA_HIPBLAS=on -DCMAKE_CXX_COMPILER='C:/Program Files/AMD/ROCm/5.7/bin/clang++.exe' -DCMAKE_C_ABI_COMPILED=FALSE -DCMAKE_CXX_ABI_COMPILED=FALSE -DCMAKE_CXX_STANDARD=17 -DCMAKE_CXX_STANDARD_REQUIRED=ON -DCMAKE_CXX_EXTENSIONS=OFF")
[Environment]::SetEnvironmentVariable('CMAKE_ARGS', "-DLLAMA_HIPBLAS=on -DCXX='C:/Program Files/AMD/ROCm/5.7/bin/clang++.exe' -DCMAKE_C_ABI_COMPILED=FALSE -DCMAKE_CXX_ABI_COMPILED=FALSE -DCMAKE_CXX_STANDARD=17 -DCMAKE_CXX_STANDARD_REQUIRED=ON -DCMAKE_CXX_EXTENSIONS=OFF")
PS: Reading through #40 it's rather concerning that MSVC might be needed for Windows compilation. Is this still the case?