abetlen / llama-cpp-python

Python bindings for llama.cpp
https://llama-cpp-python.readthedocs.io
MIT License
8.24k stars 985 forks source link

CMAKE_ARGS="-DLLAMA_HIPBLAS=on" FORCE_CMAKE=1 pip install llama-cpp-python==0.1.78 #695

Open taikai-zz opened 1 year ago

taikai-zz commented 1 year ago

CMAKE_ARGS="-DLLAMA_HIPBLAS=on" FORCE_CMAKE=1 pip install llama-cpp-python==0.1.78  Normal Compilation Unable to compile after AMDGPU 0.1.78 version

abetlen commented 1 year ago

@taikai-zz could you share a --verbose installation log?

abetlen commented 1 year ago

This may actually be a duplicate of #646 I provided a fix there but I don't think it's been merged into llama.cpp yet, I'll create a PR.

taikai-zz commented 1 year ago

NOW Successfully executed CMAKE_ARGS="-DLLAMA_HIPBLAS=on" pip install llama-cpp-python==0.1.85
python -c "from llama_cpp import GGML_USE_CUBLAS; print(GGML_USE_CUBLAS)" return False why?

taikai-zz commented 1 year ago

(env) root@gpu:~/.local/share/Open Interpreter/models# CMAKE_ARGS="-DLLAMA_HIPBLAS=on" pip install llama-cpp-python==0.1.85 Requirement already satisfied: llama-cpp-python==0.1.85 in /dockers/text-generation-webui/env/lib/python3.10/site-packages (0.1.85) Requirement already satisfied: typing-extensions>=4.5.0 in /dockers/text-generation-webui/env/lib/python3.10/site-packages (from llama-cpp-python==0.1.85) (4.7.1) Requirement already satisfied: numpy>=1.20.0 in /dockers/text-generation-webui/env/lib/python3.10/site-packages (from llama-cpp-python==0.1.85) (1.25.2) Requirement already satisfied: diskcache>=5.6.1 in /dockers/text-generation-webui/env/lib/python3.10/site-packages (from llama-cpp-python==0.1.85) (5.6.3)

(env) root@gpu:~/.local/share/Open Interpreter/models# python -c "from llama_cpp import GGML_USE_CUBLAS; print(GGML_USE_CUBLAS)" False

(env) root@gpu:~/.local/share/Open Interpreter/models# CMAKE_ARGS="-DLLAMA_HIPBLAS=on" pip install llama-cpp-python==0.2.0 Collecting llama-cpp-python==0.2.0 Using cached llama_cpp_python-0.2.0.tar.gz (1.5 MB) Installing build dependencies ... done Getting requirements to build wheel ... done Installing backend dependencies ... done Preparing metadata (pyproject.toml) ... done Requirement already satisfied: typing-extensions>=4.5.0 in /dockers/text-generation-webui/env/lib/python3.10/site-packages (from llama-cpp-python==0.2.0) (4.7.1) Requirement already satisfied: numpy>=1.20.0 in /dockers/text-generation-webui/env/lib/python3.10/site-packages (from llama-cpp-python==0.2.0) (1.25.2) Requirement already satisfied: diskcache>=5.6.1 in /dockers/text-generation-webui/env/lib/python3.10/site-packages (from llama-cpp-python==0.2.0) (5.6.3) Building wheels for collected packages: llama-cpp-python Building wheel for llama-cpp-python (pyproject.toml) ... error error: subprocess-exited-with-error

× Building wheel for llama-cpp-python (pyproject.toml) did not run successfully. │ exit code: 1 ╰─> [69 lines of output] scikit-build-core 0.5.0 using CMake 3.27.4 (wheel) Configuring CMake... loading initial cache file /tmp/tmp33ysy1x2/build/CMakeInit.txt -- The C compiler identification is GNU 11.4.0 -- The CXX compiler identification is GNU 11.4.0 -- Detecting C compiler ABI info -- Detecting C compiler ABI info - done -- Check for working C compiler: /usr/bin/cc - skipped -- Detecting C compile features -- Detecting C compile features - done -- Detecting CXX compiler ABI info -- Detecting CXX compiler ABI info - done -- Check for working CXX compiler: /usr/bin/c++ - skipped -- Detecting CXX compile features -- Detecting CXX compile features - done -- Found Git: /usr/bin/git (found version "2.25.1") fatal: not a git repository: /tmp/pip-install-xs0nliaw/llama-cpp-python_b4b7025dac8f452f8ba3d3a1cb4b798d/vendor/llama.cpp/../../.git/modules/vendor/llama.cpp fatal: not a git repository: /tmp/pip-install-xs0nliaw/llama-cpp-python_b4b7025dac8f452f8ba3d3a1cb4b798d/vendor/llama.cpp/../../.git/modules/vendor/llama.cpp -- Performing Test CMAKE_HAVE_LIBC_PTHREAD -- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Failed -- Check if compiler accepts -pthread -- Check if compiler accepts -pthread - yes -- Found Threads: TRUE CMake Warning at vendor/llama.cpp/CMakeLists.txt:371 (message): Only LLVM is supported for HIP, hint: CC=/opt/rocm/llvm/bin/clang

  CMake Warning at vendor/llama.cpp/CMakeLists.txt:374 (message):
    Only LLVM is supported for HIP, hint: CXX=/opt/rocm/llvm/bin/clang++

  CMake Deprecation Warning at /opt/rocm/lib/cmake/hip/hip-config.cmake:20 (cmake_minimum_required):
    Compatibility with CMake < 3.5 will be removed from a future version of
    CMake.

    Update the VERSION argument <min> value or use a ...<max> suffix to tell
    CMake that the project does not need compatibility with older versions.
  Call Stack (most recent call first):
    vendor/llama.cpp/CMakeLists.txt:377 (find_package)

  -- hip::amdhip64 is SHARED_LIBRARY
  CMake Deprecation Warning at /opt/rocm/lib/cmake/hip/hip-config.cmake:20 (cmake_minimum_required):
    Compatibility with CMake < 3.5 will be removed from a future version of
    CMake.

    Update the VERSION argument <min> value or use a ...<max> suffix to tell
    CMake that the project does not need compatibility with older versions.
  Call Stack (most recent call first):
    /tmp/pip-build-env-f1_uucwa/normal/lib/python3.10/site-packages/cmake/data/share/cmake-3.27/Modules/CMakeFindDependencyMacro.cmake:76 (find_package)
    /opt/rocm/lib/cmake/hipblas/hipblas-config.cmake:90 (find_dependency)
    vendor/llama.cpp/CMakeLists.txt:378 (find_package)

  -- hip::amdhip64 is SHARED_LIBRARY
  -- HIP and hipBLAS found
  -- CMAKE_SYSTEM_PROCESSOR: x86_64
  -- x86 detected
  -- Configuring done (0.6s)
  -- Generating done (0.0s)
  -- Build files have been written to: /tmp/tmp33ysy1x2/build
  *** Building project with Ninja...
  Change Dir: '/tmp/tmp33ysy1x2/build'

  Run Build Command(s): /tmp/pip-build-env-f1_uucwa/normal/lib/python3.10/site-packages/ninja/data/bin/ninja -v
  ninja: error: '/tmp/pip-install-xs0nliaw/llama-cpp-python_b4b7025dac8f452f8ba3d3a1cb4b798d/.git/modules/vendor/llama.cpp/index', needed by '/tmp/pip-install-xs0nliaw/llama-cpp-python_b4b7025dac8f452f8ba3d3a1cb4b798d/vendor/llama.cpp/build-info.h', missing and no known rule to make it

  *** CMake build failed
  [end of output]

note: This error originates from a subprocess, and is likely not a problem with pip. ERROR: Failed building wheel for llama-cpp-python Failed to build llama-cpp-python ERROR: Could not build wheels for llama-cpp-python, which is required to install pyproject.toml-based projects

teleprint-me commented 1 year ago

Okay I figured this may be more appropriate here and wanted to share my findings.

I thought, as I stated in iss #646, that it might not be related to llama-cpp-python at all and that it was rooted in llama.cpp. I was wrong; It's user error.

The build process for AMD is a bit more involved than that and after troubling shooting all morning, I finally made a break through in successfully building the target for AMD GPU.

13:41:53 | ~/Documents/code/remote/llama.cpp
 git:(master | θ) λ make clean          
I llama.cpp build info: 
I UNAME_S:  Linux
I UNAME_P:  unknown
I UNAME_M:  x86_64
I CFLAGS:   -I. -Icommon -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -DNDEBUG -DGGML_USE_K_QUANTS  -O3 -std=c11   -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wdouble-promotion -Wshadow -Wstrict-prototypes -Wpointer-arith -Wmissing-prototypes -Werror=implicit-int -Wno-unused-function -pthread -march=native -mtune=native 
I CXXFLAGS: -I. -Icommon -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -DNDEBUG -DGGML_USE_K_QUANTS  -O3 -std=c++11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wmissing-declarations -Wno-unused-function -Wno-multichar -Wno-format-truncation -Wno-array-bounds -pthread -march=native -mtune=native 
I LDFLAGS:   
I CC:       cc (GCC) 13.2.1 20230801
I CXX:      g++ (GCC) 13.2.1 20230801
# Omitting for brevity
13:41:55 | ~/Documents/code/remote/llama.cpp
 git:(master | θ) λ make LLAMA_HIPBLAS=1
I llama.cpp build info: 
I UNAME_S:  Linux
I UNAME_P:  unknown
I UNAME_M:  x86_64
I CFLAGS:   -I. -Icommon -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -DNDEBUG -DGGML_USE_K_QUANTS -DGGML_USE_HIPBLAS -DGGML_USE_CUBLAS  -O3 -std=c11   -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wdouble-promotion -Wshadow -Wstrict-prototypes -Wpointer-arith -Wmissing-prototypes -Werror=implicit-int -Wno-unused-function -pthread -march=native -mtune=native 
I CXXFLAGS: -I. -Icommon -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -DNDEBUG -DGGML_USE_K_QUANTS -DGGML_USE_HIPBLAS -DGGML_USE_CUBLAS  -O3 -std=c++11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wmissing-declarations -Wno-unused-function -Wno-multichar -Wno-format-truncation -Wno-array-bounds -pthread -march=native -mtune=native 
I LDFLAGS:  -L/opt/rocm/lib -Wl,-rpath=/opt/rocm/lib -lhipblas -lamdhip64 -lrocblas 
I CC:       cc (GCC) 13.2.1 20230801
I CXX:      g++ (GCC) 13.2.1 20230801
# Omitting for brevity
13:43:23 | ~/Documents/code/remote/llama.cpp
 git:(master | θ) λ mkdir build         
13:44:13 | ~/Documents/code/remote/llama.cpp
 git:(master | θ) λ cd build 
13:44:15 | ~/Documents/code/remote/llama.cpp/build
 git:(master | θ) λ export HIP_VISIBLE_DEVICES=0  # If you have more than one GPU
13:44:26 | ~/Documents/code/remote/llama.cpp/build
 git:(master | θ) λ export HSA_OVERRIDE_GFX_VERSION=10.3.0  # If your GPU is not officially supported
13:44:36 | ~/Documents/code/remote/llama.cpp/build
 git:(master | θ) λ CC=/opt/rocm/llvm/bin/clang CXX=/opt/rocm/llvm/bin/clang++ cmake .. -DLLAMA_HIPBLAS=ON -DLLAMA_CUDA_DMMV_X=64 -DLLAMA_CUDA_MMV_Y=2
-- The C compiler identification is Clang 16.0.0
-- The CXX compiler identification is Clang 16.0.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /opt/rocm/llvm/bin/clang - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /opt/rocm/llvm/bin/clang++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Found Git: /usr/bin/git (found version "2.42.0") 
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
-- Found Threads: TRUE  
CMake Deprecation Warning at /opt/rocm/lib/cmake/hip/hip-config.cmake:20 (cmake_minimum_required):
  Compatibility with CMake < 3.5 will be removed from a future version of
  CMake.

  Update the VERSION argument <min> value or use a ...<max> suffix to tell
  CMake that the project does not need compatibility with older versions.
Call Stack (most recent call first):
  CMakeLists.txt:381 (find_package)

-- hip::amdhip64 is SHARED_LIBRARY
-- Performing Test HIP_CLANG_SUPPORTS_PARALLEL_JOBS
-- Performing Test HIP_CLANG_SUPPORTS_PARALLEL_JOBS - Success
CMake Deprecation Warning at /opt/rocm/lib/cmake/hip/hip-config.cmake:20 (cmake_minimum_required):
  Compatibility with CMake < 3.5 will be removed from a future version of
  CMake.

  Update the VERSION argument <min> value or use a ...<max> suffix to tell
  CMake that the project does not need compatibility with older versions.
Call Stack (most recent call first):
  /home/austin/.local/lib/python3.11/site-packages/cmake/data/share/cmake-3.27/Modules/CMakeFindDependencyMacro.cmake:76 (find_package)
  /opt/rocm/lib/cmake/hipblas/hipblas-config.cmake:90 (find_dependency)
  CMakeLists.txt:382 (find_package)

-- hip::amdhip64 is SHARED_LIBRARY
-- HIP and hipBLAS found
-- CMAKE_SYSTEM_PROCESSOR: x86_64
-- x86 detected
-- Configuring done (0.5s)
-- Generating done (0.0s)
-- Build files have been written to: /home/austin/Documents/code/remote/llama.cpp/build
13:44:46 | ~/Documents/code/remote/llama.cpp/build
 git:(master | θ) λ cmake --build .
[  1%] Built target BUILD_INFO
[  2%] Building CXX object CMakeFiles/ggml-rocm.dir/ggml-cuda.cu.o
# omitting for brevity... this is usually where errors begin. 
# if all goes well, it'll hang for a bit and then chug on...
[  4%] Building C object CMakeFiles/ggml.dir/ggml.c.o
# just wait it out...
[ 98%] Building CXX object pocs/vdot/CMakeFiles/q8dot.dir/q8dot.cpp.o
[100%] Linking CXX executable ../../bin/q8dot
[100%] Built target q8dot

This is just the start though. The way the instructions are set up for llama-cpp-python is not really clear and doesn't give you a lay of the land, so-to-speak.

It took a bit of tinkering and some troubleshooting, but I was able to get it working afterwards once I translated how to pass the environment variables along.

13:34:38 | ~/Documents/code/remote/pygptprompt
(.venv) git:(main | Δ) λ LLAMA_HIPBLAS=on HIP_VISIBLE_DEVICES=0 HSA_OVERRIDE_GFX_VERSION=10.3.0 LLAMA_CUDA_DMMV_X=64 LLAMA_CUDA_MMV_Y=2 pip install llama-cpp-python --force-reinstall --upgrade --no-cache-dir
Collecting llama-cpp-python
  Downloading llama_cpp_python-0.2.6.tar.gz (1.6 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.6/1.6 MB 13.6 MB/s eta 0:00:00
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Installing backend dependencies ... done
  Preparing metadata (pyproject.toml) ... done
Collecting typing-extensions>=4.5.0 (from llama-cpp-python)
  Obtaining dependency information for typing-extensions>=4.5.0 from https://files.pythonhosted.org/packages/ec/6b/63cc3df74987c36fe26157ee12e09e8f9db4de771e0f3404263117e75b95/typing_extensions-4.7.1-py3-none-any.whl.metadata
  Downloading typing_extensions-4.7.1-py3-none-any.whl.metadata (3.1 kB)
Collecting numpy>=1.20.0 (from llama-cpp-python)
  Obtaining dependency information for numpy>=1.20.0 from https://files.pythonhosted.org/packages/32/6a/65dbc57a89078af9ff8bfcd4c0761a50172d90192eaeb1b6f56e5fbf1c3d/numpy-1.25.2-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata
  Downloading numpy-1.25.2-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (5.6 kB)
Collecting diskcache>=5.6.1 (from llama-cpp-python)
  Obtaining dependency information for diskcache>=5.6.1 from https://files.pythonhosted.org/packages/3f/27/4570e78fc0bf5ea0ca45eb1de3818a23787af9b390c0b0a0033a1b8236f9/diskcache-5.6.3-py3-none-any.whl.metadata
  Downloading diskcache-5.6.3-py3-none-any.whl.metadata (20 kB)
Downloading diskcache-5.6.3-py3-none-any.whl (45 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 45.5/45.5 kB 636.3 MB/s eta 0:00:00
Downloading numpy-1.25.2-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (18.2 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 18.2/18.2 MB 38.9 MB/s eta 0:00:00
Downloading typing_extensions-4.7.1-py3-none-any.whl (33 kB)
Building wheels for collected packages: llama-cpp-python
  Building wheel for llama-cpp-python (pyproject.toml) ... done
  Created wheel for llama-cpp-python: filename=llama_cpp_python-0.2.6-cp311-cp311-manylinux_2_38_x86_64.whl size=992662 sha256=858a30b29b7511a65bc888c3325a703edfa4696d083392f1c469c28f93ad24a2
  Stored in directory: /tmp/pip-ephem-wheel-cache-bwqivaux/wheels/18/f3/e6/e6d374c76db44b5b0451c3a76b3049f29e881819bc43f53d4d
Successfully built llama-cpp-python
Installing collected packages: typing-extensions, numpy, diskcache, llama-cpp-python
  Attempting uninstall: typing-extensions
    Found existing installation: typing_extensions 4.7.1
    Uninstalling typing_extensions-4.7.1:
      Successfully uninstalled typing_extensions-4.7.1
  Attempting uninstall: numpy
    Found existing installation: numpy 1.25.2
    Uninstalling numpy-1.25.2:
      Successfully uninstalled numpy-1.25.2
  Attempting uninstall: diskcache
    Found existing installation: diskcache 5.6.3
    Uninstalling diskcache-5.6.3:
      Successfully uninstalled diskcache-5.6.3
  Attempting uninstall: llama-cpp-python
    Found existing installation: llama_cpp_python 0.2.6
    Uninstalling llama_cpp_python-0.2.6:
      Successfully uninstalled llama_cpp_python-0.2.6
Successfully installed diskcache-5.6.3 llama-cpp-python-0.2.6 numpy-1.25.2 typing-extensions-4.7.1

Step-by-step instructions that omit the noise to make it clearer:

Building llama.cpp for AMD GPU

  1. Set the required environment variables (adjust values if necessary):

    export HIP_VISIBLE_DEVICES=0  # If you have more than one GPU
    export HSA_OVERRIDE_GFX_VERSION=10.3.0  # If your GPU is not officially supported
  2. Configure the build using CMake:

    CC=/opt/rocm/llvm/bin/clang CXX=/opt/rocm/llvm/bin/clang++ cmake .. -DLLAMA_HIPBLAS=ON -DLLAMA_CUDA_DMMV_X=64 -DLLAMA_CUDA_MMV_Y=2
  3. Build llama.cpp:

    cmake --build .

Building llama-cpp-python for AMD GPU

  1. Make sure you have the llama.cpp repository cloned and built as mentioned above.

  2. Create a virtual environment (optional but recommended):

    python -m venv .venv
    source .venv/bin/activate
  3. Install llama-cpp-python with the required environment variables (adjust values if necessary):

    LLAMA_HIPBLAS=on HIP_VISIBLE_DEVICES=0 HSA_OVERRIDE_GFX_VERSION=10.3.0 LLAMA_CUDA_DMMV_X=64 LLAMA_CUDA_MMV_Y=2 pip install llama-cpp-python --force-reinstall --upgrade --no-cache-dir

Hopefully this helps!

Let me know if this works!

muaiyadh commented 1 year ago

Just wanted to provide something potentially useful:

Running the command in the README to install with hipblas $ CMAKE_ARGS="-DLLAMA_HIPBLAS=on" pip install llama-cpp-python couldn't build wheels (c++: error: language hip not recognized)

After searching around, I apparently needed to set $ export CXX=hipcc then $ CMAKE_ARGS="-DLLAMA_HIPBLAS=on" pip install llama-cpp-python and was able to successfully install the bindings without any issues.

I couldn't find this setting mentioned anywhere, perhaps putting a note in the README might be useful?

taikai-zz commented 1 year ago

@muaiyadh
Thank you very much for following your method and achieving success

taikai-zz commented 1 year ago

@teleprint-me  Thank you very much for your help. $ export CXX=hipcc $ CMAKE_ARGS="-DLLAMA_HIPBLAS=on" pip install llama-cpp-python Solved my problem

teleprint-me commented 1 year ago

Technically, CMAKE_ARGS isn't really needed; Neither are the quotes. What matters is that the environment variables are passed along with their values.

Exporting CXX=hipcc just tells cmake what compiler to use for C++, e.g. hipcc.

So, if the ROCm compiler is globally availble via PATH environment variable, then that's why it'll work.

I just use absolute paths because it makes it clearer what's happening as a result.

CC=/opt/rocm/llvm/bin/clang CXX=/opt/rocm/llvm/bin/clang++

CC for C and CXX for C++.

Using the proper compiler that's installed and configured for your system will play a key role in whether it succeeds or not.

When you use export CXX=hipcc, it just injects the key/value pair into the environ.

echo $CXX
hipcc  # this is echoed to stdout

A good way to check if hipcc is available is to just use which.

which hipcc

I don't have hipcc set, so it just tells me its not found, hence why I referenced the absolute paths.

Passing the variables before the command just passes the key/value pairs to the CLI app as it executes.

So, technically,

CXX=hipcc DLLAMA_HIPBLAS=on pip install llama-cpp-python

should be valid as well.

If I wanted it set, then all I'd need to do is

which hipcc
hipcc not found
PATH=/opt/rocm/bin:${PATH}      
which hipcc
/opt/rocm/bin/hipcc

The reason CMAKE_ARGS="-DLLAMA_HIPBLAS=on" works is because cmake is passed -DLLAMA_HIPBLAS=on as a CLI argument which is what llama.cpp uses to build the AMD targets.

hugo-brites commented 1 year ago

@teleprint-me what os are you using?

I'm using arch and able to compile, but I receive a cuda error when trying to run a model where I offload some layers to the gpu.

llm_load_tensors: ggml ctx size =    0.12 MB
llm_load_tensors: using ROCm for GPU acceleration
llm_load_tensors: mem required  = 8942.91 MB (+  400.00 MB per state)
llm_load_tensors: offloading 5 repeating layers to GPU
llm_load_tensors: offloaded 5/43 layers to GPU
llm_load_tensors: VRAM used: 1241 MB
....................................................................................................
llama_new_context_with_model: kv self size  =  400.00 MB
llama_new_context_with_model: compute buffer total size =   75.47 MB
llama_new_context_with_model: VRAM scratch buffer: 74.00 MB
CUDA error 98 at /tmp/pip-install-bojptw49/llama-cpp-python_4627b943be6f48b8939df8bb4aad9957/vendor/llama.cpp/ggml-cuda.cu:6233: invalid device function
current device: 0

Strangly compiling llama.cpp and running the model over there it works perfectly.

Note: I've compile llama.cpp using make and not cmake. Using your command to compile using cmake gives the same error.

teleprint-me commented 1 year ago

@hugo-brites

I'm using the EndeavourOS distribution (it's faster/easier to install). It should work the same way on Arch Linux as it is its parent distribution.

I followed the guide after reviewing the AMD ROCm, Arch Linux, and llama.cpp documentation.

It'd be more helpful to understand the steps you're taking though.

I can't really tell what's happening just from errors unless I see both the CLI input and output; Similar to what I provided earlier in this thread.

I hand picked the packages though.

# Function to install OpenCL
install_opencl() {
    # NOTE: omitted miopengemm for kernel generation because it requires AUR
    if ! sudo pacman -S openblas openblas64 opencl-headers libclc opencl-clhpp ocl-icd lib32-ocl-icd clinfo clpeak nvtop --noconfirm; then
        echo "Failed to install OpenCL"
        exit 1
    fi
}

# Function to install AMD Vulkan driver support
install_amd_vulkan() {
    if ! sudo pacman -S mesa lib32-mesa vulkan-radeon lib32-vulkan-radeon vulkan-icd-loader lib32-vulkan-icd-loader vkd3d lib32-vkd3d vulkan-headers vulkan-validation-layers vulkan-tools --noconfirm; then
        echo "Failed to install AMD Vulkan driver support"
        exit 1
    fi
    echo "AMD Vulkan driver support installed successfully"
}

# Function to install AMD ROCm
install_amd_rocm() {
    if ! sudo pacman -S rocm-core rocm-llvm rocm-clang-ocl rocm-cmake rocm-smi-lib rocm-hip-libraries rocm-hip-runtime rocm-hip-sdk rocm-language-runtime rocm-opencl-runtime rocm-opencl-sdk rocm-device-libs rocm-ml-libraries rocm-ml-sdk rocminfo hipblas rocblas rocsparse rccl python-pytorch-rocm python-pytorch-opt-rocm --noconfirm; then
        echo "Failed to install AMD ROCm"
        exit 1
    fi
}

install_amd() {
    confirm_proceed "AMD GPU drivers, OpenCL, Vulkan, and ROCm" || return

    install_opencl
    install_amd_vulkan
    install_amd_rocm
    install_python_mlai_rocm
}

Arch Linux GPU Install Script

If it works normally while you're building from source for llama.cpp, I would extrapolate the steps from there and repackage the environment variables that helped you succeed with your build.

If you carefully inspect, review, and follow my original comment, you'll see that's exactly what I did.

I had to figure out what environment variables would work for my build and then I was able to extrapolate that into a coherent and valid command line.

You should post it because it might help someone else in the future and there's not enough AMD info out there. It's sparse, scattered, and incoherent. Maybe this thread can become useful in the future as a result.

The architecture of your AMD GPU matters.

03:02:33 | ~/Documents/code/remote/pygptprompt
(.venv) git:(main | Δ) λ /opt/rocm/llvm/bin/amdgpu-arch
gfx803
gfx1036

For example, my RX580 doesn't work out of the box, so I had to set the custom flag to override it.

HSA_OVERRIDE_GFX_VERSION=10.3.0
hugo-brites commented 1 year ago

@teleprint-me

For me the hardest part was trying to figuring out the packages that are required. I even installed ubuntu 22.04 but I I'm doing the same steps you are.

I had a look at your script and saw some of couple of packages missing and added them with the exception of python-pytorch-rocm and python-pytorch-opt-rocm because they conflict with each other.

As far as I'm aware, the python-pytorch-rocm is for cpus that don't support AVX2, which is not my case as I have a Ryzen 3600.

Building llama.cpp, my output windows looks exactly like yours

=❯ mkdir build ; cd build
=❯ CC=/opt/rocm/llvm/bin/clang CXX=/opt/rocm/llvm/bin/clang++ cmake .. -DLLAMA_HIPBLAS=ON
-- The C compiler identification is Clang 16.0.0
-- The CXX compiler identification is Clang 16.0.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /opt/rocm/llvm/bin/clang - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /opt/rocm/llvm/bin/clang++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Found Git: /usr/bin/git (found version "2.42.0") 
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
-- Found Threads: TRUE  
CMake Deprecation Warning at /opt/rocm/lib/cmake/hip/hip-config.cmake:20 (cmake_minimum_required):
  Compatibility with CMake < 3.5 will be removed from a future version of
  CMake.

  Update the VERSION argument <min> value or use a ...<max> suffix to tell
  CMake that the project does not need compatibility with older versions.
Call Stack (most recent call first):
  CMakeLists.txt:384 (find_package)

-- hip::amdhip64 is SHARED_LIBRARY
-- Performing Test HIP_CLANG_SUPPORTS_PARALLEL_JOBS
-- Performing Test HIP_CLANG_SUPPORTS_PARALLEL_JOBS - Success
CMake Deprecation Warning at /opt/rocm/lib/cmake/hip/hip-config.cmake:20 (cmake_minimum_required):
  Compatibility with CMake < 3.5 will be removed from a future version of
  CMake.

  Update the VERSION argument <min> value or use a ...<max> suffix to tell
  CMake that the project does not need compatibility with older versions.
Call Stack (most recent call first):
  /usr/share/cmake/Modules/CMakeFindDependencyMacro.cmake:76 (find_package)
  /opt/rocm/lib/cmake/hipblas/hipblas-config.cmake:90 (find_dependency)
  CMakeLists.txt:385 (find_package)

-- hip::amdhip64 is SHARED_LIBRARY
-- HIP and hipBLAS found
-- CMAKE_SYSTEM_PROCESSOR: x86_64
-- x86 detected
-- Configuring done (0.7s)
-- Generating done (0.1s)
-- Build files have been written to: /home/hugo/develop/study/python/notebook/llama.cpp/build

=❯ cmake --build .
[  1%] Built target BUILD_INFO
[  2%] Building CXX object CMakeFiles/ggml-rocm.dir/ggml-cuda.cu.o
[  2%] Built target ggml-rocm
[  4%] Building C object CMakeFiles/ggml.dir/ggml.c.o
/home/hugo/develop/study/python/notebook/llama.cpp/ggml.c:2391:5: warning: implicit conversion increases floating-point precision: 'float' to 'ggml_float' (aka 'double') [-Wdouble-promotion]
    GGML_F16_VEC_REDUCE(sumf, sum);
    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/hugo/develop/study/python/notebook/llama.cpp/ggml.c:2023:37: note: expanded from macro 'GGML_F16_VEC_REDUCE'
#define GGML_F16_VEC_REDUCE         GGML_F32Cx8_REDUCE
                                    ^
/home/hugo/develop/study/python/notebook/llama.cpp/ggml.c:2013:33: note: expanded from macro 'GGML_F32Cx8_REDUCE'
#define GGML_F32Cx8_REDUCE      GGML_F32x8_REDUCE
                                ^
/home/hugo/develop/study/python/notebook/llama.cpp/ggml.c:1959:11: note: expanded from macro 'GGML_F32x8_REDUCE'
    res = _mm_cvtss_f32(_mm_hadd_ps(t1, t1));                     \
        ~ ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/hugo/develop/study/python/notebook/llama.cpp/ggml.c:3657:9: warning: implicit conversion increases floating-point precision: 'float' to 'ggml_float' (aka 'double') [-Wdouble-promotion]
        GGML_F16_VEC_REDUCE(sumf[k], sum[k]);
        ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/hugo/develop/study/python/notebook/llama.cpp/ggml.c:2023:37: note: expanded from macro 'GGML_F16_VEC_REDUCE'
#define GGML_F16_VEC_REDUCE         GGML_F32Cx8_REDUCE
                                    ^
/home/hugo/develop/study/python/notebook/llama.cpp/ggml.c:2013:33: note: expanded from macro 'GGML_F32Cx8_REDUCE'
#define GGML_F32Cx8_REDUCE      GGML_F32x8_REDUCE
                                ^
/home/hugo/develop/study/python/notebook/llama.cpp/ggml.c:1959:11: note: expanded from macro 'GGML_F32x8_REDUCE'
    res = _mm_cvtss_f32(_mm_hadd_ps(t1, t1));                     \
        ~ ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
2 warnings generated.
[  5%] Building C object CMakeFiles/ggml.dir/ggml-alloc.c.o
... removing but no other errors
[ 94%] Built target server
[ 95%] Building CXX object pocs/vdot/CMakeFiles/vdot.dir/vdot.cpp.o
[ 97%] Linking CXX executable ../../bin/vdot
[ 97%] Built target vdot
[ 98%] Building CXX object pocs/vdot/CMakeFiles/q8dot.dir/q8dot.cpp.o
[100%] Linking CXX executable ../../bin/q8dot
[100%] Built target q8dot
=❯ ./bin/main -ngl 25 -m ../../models/speechless-llama2-hermes-orca-platypus-wizardlm-13b.Q6_K.gguf -p "Create me a list of the moons for each planet of the solar system:\n" -n 400 -e
Log start
... removing but no errors
llm_load_print_meta: format         = GGUF V2 (latest)
llm_load_print_meta: arch           = llama
llm_load_print_meta: vocab type     = SPM
llm_load_print_meta: n_vocab        = 32000
llm_load_print_meta: n_merges       = 0
llm_load_print_meta: n_ctx_train    = 4096
llm_load_print_meta: n_ctx          = 512
llm_load_print_meta: n_embd         = 5120
llm_load_print_meta: n_head         = 40
llm_load_print_meta: n_head_kv      = 40
llm_load_print_meta: n_layer        = 40
llm_load_print_meta: n_rot          = 128
llm_load_print_meta: n_gqa          = 1
llm_load_print_meta: f_norm_eps     = 1.0e-05
llm_load_print_meta: f_norm_rms_eps = 1.0e-05
llm_load_print_meta: n_ff           = 13824
llm_load_print_meta: freq_base      = 10000.0
llm_load_print_meta: freq_scale     = 1
llm_load_print_meta: model type     = 13B
llm_load_print_meta: model ftype    = mostly Q6_K
llm_load_print_meta: model params   = 13.02 B
llm_load_print_meta: model size     = 9.95 GiB (6.56 BPW) 
llm_load_print_meta: general.name   = LLaMA v2
llm_load_print_meta: BOS token = 1 '<s>'
llm_load_print_meta: EOS token = 2 '</s>'
llm_load_print_meta: UNK token = 0 '<unk>'
llm_load_print_meta: LF token  = 13 '<0x0A>'
llm_load_tensors: ggml ctx size =    0.12 MB
llm_load_tensors: using ROCm for GPU acceleration
llm_load_tensors: mem required  = 3979.24 MB (+  400.00 MB per state)
llm_load_tensors: offloading 25 repeating layers to GPU
llm_load_tensors: offloaded 25/43 layers to GPU
llm_load_tensors: VRAM used: 6205 MB
....................................................................................................
llama_new_context_with_model: kv self size  =  400.00 MB
llama_new_context_with_model: compute buffer total size =   75.47 MB
llama_new_context_with_model: VRAM scratch buffer: 74.00 MB

CUDA error 98 at /home/hugo/develop/study/python/notebook/llama.cpp/ggml-cuda.cu:6246: invalid device function
current device: 0

At this moment I think that this is something related to the llvm libraries, because compiling with make I have another complete outcome.

in the root folder of llama.cpp

=> rm -rf dist
=> make clean
=> make LLAMA_HIPBLAS=1
... some warnings but if compiles

But when I run

./main -ngl 25 -m ../models/speechless-llama2-hermes-orca-platypus-wizardlm-13b.Q6_K.gguf -p "Create me a list of the moons for each planet of the solar system:\n" -n 400 -e
Log start
... removing but no errors
llm_load_print_meta: format         = GGUF V2 (latest)
llm_load_print_meta: arch           = llama
llm_load_print_meta: vocab type     = SPM
llm_load_print_meta: n_vocab        = 32000
llm_load_print_meta: n_merges       = 0
llm_load_print_meta: n_ctx_train    = 4096
llm_load_print_meta: n_ctx          = 512
llm_load_print_meta: n_embd         = 5120
llm_load_print_meta: n_head         = 40
llm_load_print_meta: n_head_kv      = 40
llm_load_print_meta: n_layer        = 40
llm_load_print_meta: n_rot          = 128
llm_load_print_meta: n_gqa          = 1
llm_load_print_meta: f_norm_eps     = 1.0e-05
llm_load_print_meta: f_norm_rms_eps = 1.0e-05
llm_load_print_meta: n_ff           = 13824
llm_load_print_meta: freq_base      = 10000.0
llm_load_print_meta: freq_scale     = 1
llm_load_print_meta: model type     = 13B
llm_load_print_meta: model ftype    = mostly Q6_K
llm_load_print_meta: model params   = 13.02 B
llm_load_print_meta: model size     = 9.95 GiB (6.56 BPW) 
llm_load_print_meta: general.name   = LLaMA v2
llm_load_print_meta: BOS token = 1 '<s>'
llm_load_print_meta: EOS token = 2 '</s>'
llm_load_print_meta: UNK token = 0 '<unk>'
llm_load_print_meta: LF token  = 13 '<0x0A>'
llm_load_tensors: ggml ctx size =    0.12 MB
llm_load_tensors: using ROCm for GPU acceleration
llm_load_tensors: mem required  = 3979.24 MB (+  400.00 MB per state)
llm_load_tensors: offloading 25 repeating layers to GPU
llm_load_tensors: offloaded 25/43 layers to GPU
llm_load_tensors: VRAM used: 6205 MB
....................................................................................................
llama_new_context_with_model: kv self size  =  400.00 MB
llama_new_context_with_model: compute buffer total size =   75.47 MB
llama_new_context_with_model: VRAM scratch buffer: 74.00 MB

system_info: n_threads = 6 / 12 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | 
sampling: repeat_last_n = 64, repeat_penalty = 1.100000, presence_penalty = 0.000000, frequency_penalty = 0.000000, top_k = 40, tfs_z = 1.000000, top_p = 0.950000, typical_p = 1.000000, temp = 0.800000, mirostat = 0, mirostat_lr = 0.100000, mirostat_ent = 5.000000
generate: n_ctx = 512, n_batch = 512, n_predict = 400, n_keep = 0

 Create me a list of the moons for each planet of the solar system:

1. Mercury - There are no known moons around Mercury.
2. Venus - No known moons, only one small irregular satellite called Sputnik 1 (also known as Venera 1) was artificially placed into orbit by the Soviet Union in 1957, but it no longer orbits Venus.
3. Earth - 1 moon: The Moon
4. Mars - 2 moons: Phobos and Deimos
5. Jupiter - 79 known moons (including four large ones: Io, Europa, Ganymede, and Callisto)
6. Saturn - 82 known moons (including the largest moon, Titan)
7. Uranus - 27 known moons (including Titania, Oberon, Umbriel, Ariel, Miranda, and Umbriel)
8. Neptune - 14 known moons (including Triton, Proteus, Nereid, and Psyché)
9. Pluto - No known natural moons, but its largest moon is Charon

Please note that the number of moons can change as new discoveries are made. This information is accurate as of my last update in 2021. [end of text]

llama_print_timings:        load time =  2406.33 ms
llama_print_timings:      sample time =   175.91 ms /   273 runs   (    0.64 ms per token,  1551.95 tokens per second)
llama_print_timings: prompt eval time =   851.03 ms /    18 tokens (   47.28 ms per token,    21.15 tokens per second)
llama_print_timings:        eval time = 32915.72 ms /   272 runs   (  121.01 ms per token,     8.26 tokens per second)
llama_print_timings:       total time = 34065.02 ms
Log end

As for the current packages

=> yay -Q | grep rocm
python-pytorch-opt-rocm 2.0.1-9
python-torchvision-rocm 0.15.2-1
rocm-clang-ocl 5.6.1-1
rocm-cmake 5.6.1-1
rocm-core 5.6.1-1
rocm-device-libs 5.6.1-1
rocm-hip-libraries 5.6.1-1
rocm-hip-runtime 5.6.1-1
rocm-hip-sdk 5.6.1-1
rocm-language-runtime 5.6.1-1
rocm-llvm 5.6.1-1
rocm-ml-libraries 5.6.1-1
rocm-ml-sdk 5.6.1-1
rocm-opencl-runtime 5.6.1-1
rocm-opencl-sdk 5.6.1-1
rocm-smi-lib 5.6.1-1
rocminfo 5.6.1-1
=> yay -Q | grep hip
hip-runtime-amd 5.6.1-1
hipblas 5.6.1-1
hipcub 5.6.1-1
hipfft 5.6.1-1
hipsolver 5.6.1-1
hipsparse 5.6.1-1
magma-hip 2.7.1-9
miopen-hip 5.6.1-1
rocm-hip-libraries 5.6.1-1
rocm-hip-runtime 5.6.1-1
rocm-hip-sdk 5.6.1-1

I will keep trying and if I can fix it, I will post it here.

oliverhu commented 1 year ago

I ran into this:

FAILED: vendor/llama.cpp/CMakeFiles/ggml-rocm.dir/ggml-cuda.cu.o
      /usr/bin/c++ -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_USE_CUBLAS -DGGML_USE_HIPBLAS -DGGML_USE_K_QUANTS -DK_QUANTS_PER_ITERATION=2 -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -D__HIP_PLATFORM_AMD__=1 -D__HIP_PLATFORM_HCC__=1 -isystem /opt/rocm/include -isystem /opt/rocm-5.6.0/include -O3 -DNDEBUG -std=gnu++11 -fPIC -x hip --offload-arch=gfx900 --offload-arch=gfx906 --offload-arch=gfx908 --offload-arch=gfx90a --offload-arch=gfx1030 -MD -MT vendor/llama.cpp/CMakeFiles/ggml-rocm.dir/ggml-cuda.cu.o -MF vendor/llama.cpp/CMakeFiles/ggml-rocm.dir/ggml-cuda.cu.o.d -o vendor/llama.cpp/CMakeFiles/ggml-rocm.dir/ggml-cuda.cu.o -c /tmp/pip-install-42jywmya/llama-cpp-python_8c90f19cea74411a841d5b229dfc2d75/vendor/llama.cpp/ggml-cuda.cu
      c++: error: unrecognized command-line option ‘--offload-arch=gfx900’
      c++: error: unrecognized command-line option ‘--offload-arch=gfx906’
      c++: error: unrecognized command-line option ‘--offload-arch=gfx908’
      c++: error: unrecognized command-line option ‘--offload-arch=gfx90a’
      c++: error: unrecognized command-line option ‘--offload-arch=gfx1030’

Adding CXX=hipcc in front of CMAKE_ARGS="-DLLAMA_HIPBLAS=on" pip install llama-cpp-python solved the problem. shall we update the documentation?

abigrock commented 1 year ago

Just wanted to provide something potentially useful:

Running the command in the README to install with hipblas $ CMAKE_ARGS="-DLLAMA_HIPBLAS=on" pip install llama-cpp-python couldn't build wheels (c++: error: language hip not recognized)

After searching around, I apparently needed to set $ export CXX=hipcc then $ CMAKE_ARGS="-DLLAMA_HIPBLAS=on" pip install llama-cpp-python and was able to successfully install the bindings without any issues.

I couldn't find this setting mentioned anywhere, perhaps putting a note in the README might be useful?

This worked for me on Arch but I had to specify the full path to hipcc:

CMAKE_ARGS="-DLLAMA_HIPBLAS=on" FORCE_CMAKE=1 CXX=/opt/rocm/bin/hipcc pip install llama-cpp-python --force-reinstall --upgrade --no-cache
hugo-brites commented 1 year ago

Hi @teleprint-me ,

Finally got to working. It ended up finding a thread on llama.cpp ROCm error: ggml-cuda.cu:6246: invalid device function that pointed me on what was missing in my setup.

My GPU is a 7900xtx, which means is a gfx1100 card and by default is not included in the defaults of hip-config.cmake. So we need to change the command line to include the said support.

CMAKE_ARGS="-DLLAMA_HIPBLAS=on -DAMDGPU_TARGETS=gfx1100" FORCE_CMAKE=1 CXX=/opt/rocm/bin/hipcc pip install llama-cpp-python --force-reinstall --upgrade --no-cache

Thanks for everyone's help

mauricioscotton commented 1 year ago

Ran into similar issue...

My solution was:

CMAKE_ARGS="-DLLAMA_HIPBLAS=on -DCMAKE_C_COMPILER=/opt/rocm/llvm/bin/clang -DCMAKE_CXX_COMPILER=/opt/rocm/llvm/bin/clang++ -DCMAKE_PREFIX_PATH=/opt/rocm" FORCE_CMAKE=1 pip install llama-cpp-python

Although I believe that DCMAKE_PREFIX_PATH can be omitted.

gcapozzo commented 11 months ago

Hi guys, i want to add my experience: Ryzen 5700X AMD RX6700 (RDNA2, gfx1031 not supported officially) Ubuntu 22.04.3 Python 3.10.12

First I able to run llama.cpp with ROCm 6 make -j16 LLAMA_HIPBLAS=1 HSA_OVERRIDE_GFX_VERSION=10.3.0

After that I tried use the llama-cpp-python wrapper with this options: CMAKE_ARGS="-DLLAMA_HIPBLAS=on -DHSA_OVERRIDE_GFX_VERSION=10.3.0 -DAMDGPU_TARGETS=gfx1030" pip install --verbose llama-cpp-python Error building wheel (HSA_OVERRIDE_GFX_VERSION is not recognized as CMake arg, this is a bug?)

CMAKE_ARGS="-DLLAMA_HIPBLAS=on -DCMAKE_C_COMPILER=/opt/rocm/llvm/bin/clang -DCMAKE_CXX_COMPILER=/opt/rocm/llvm/bin/clang++ -DCMAKE_PREFIX_PATH=/opt/rocm" FORCE_CMAKE=1 pip install llama-cpp-python Installs correctly but got @hugo-brites error CUDA error 98 at /home/hugo/develop/study/python/notebook/llama.cpp/ggml-cuda.cu:6246: invalid device function current device: 0

Works for me with: CMAKE_ARGS="-DLLAMA_HIPBLAS=on -DCMAKE_C_COMPILER=/opt/rocm/llvm/bin/clang -DCMAKE_CXX_COMPILER=/opt/rocm/llvm/bin/clang++ -DCMAKE_PREFIX_PATH=/opt/rocm -DAMDGPU_TARGETS=gfx1030" FORCE_CMAKE=1 pip install llama-cpp-python

I hope it is useful to someone.

hrz6976 commented 11 months ago

For RDNA3 users come across this issue, this works for me:

CC=/opt/rocm/llvm/bin/clang CXX=/opt/rocm/llvm/bin/clang++ CMAKE_ARGS="-DLLAMA_HIPBLAS=on -DCMAKE_BUILD_TYP
E=Release -DLLAMA_HIPBLAS=ON -DLLAMA_CUDA_DMMV_X=64 -DLLAMA_CUDA_MMV_Y=4 -DCMAKE_PREFIX_PATH=/opt/rocm -DAMDGPU_TARGETS=gfx1100" pip install llama-cpp-python
imatrisciano commented 11 months ago

For RDNA3 users come across this issue, this works for me:

CC=/opt/rocm/llvm/bin/clang CXX=/opt/rocm/llvm/bin/clang++ CMAKE_ARGS="-DLLAMA_HIPBLAS=on -DCMAKE_BUILD_TYP
E=Release -DLLAMA_HIPBLAS=ON -DLLAMA_CUDA_DMMV_X=64 -DLLAMA_CUDA_MMV_Y=4 -DCMAKE_PREFIX_PATH=/opt/rocm -DAMDGPU_TARGETS=**gfx1100**" pip install llama-cpp-python

This worked for me using RDNA2 You can find out the name of the GPU target by running rocminfo | grep gfx

ShadwDrgn commented 8 months ago

none of these work for me. The furthest I get is with @hugo-brites last suggestion, but it stil lfails to compile with errors saying unreachable-code-break isn't correct and to use -Wunreachable-code instead, but i don't know how to change that with a pipe install command

Update: Finally got llama-cpp-python to install with: CC='/opt/rocm/llvm/bin/clang' CXX='/opt/rocm/llvm/bin/clang++' CFLAGS='-fPIC' CXXFLAGS='-fPIC' CMAKE_PREFIX_PATH='/opt/rocm' ROCM_PATH="/opt/rocm" HIP_PATH="/opt/rocm" CMAKE_ARGS="-GNinja -DLLAMA_HIPBLAS=ON -DLLAMA_AVX2=on -DGPU_TARGETS=$GFX_VER" pip install --no-cache-dir llama-cpp-python but now when i run my program that was successfully running before I get: Memory access fault by GPU node-2 (Agent handle: 0x5b706a0f2430) on address 0x7363692cf000. Reason: Page not present or supervisor privilege. zsh: IOT instruction (core dumped) HSA_OVERRIDE_GFX_VERSION=11.0.0 HIP_VISIBLE_DEVICES=1 python bot.py

Setting n_gpu_layers=6 instead of -1 gives:

ggml_init_cublas: GGML_CUDA_FORCE_MMQ:   no
ggml_init_cublas: CUDA_USE_TENSOR_CORES: yes
ggml_init_cublas: found 2 ROCm devices:
  Device 0: AMD Radeon RX 7900 XTX, compute capability 11.0, VMM: no
  Device 1: AMD Radeon Graphics, compute capability 11.0, VMM: no
CUDA error: out of memory
  current device: 1, in function ggml_init_cublas at /tmp/pip-install-tuy0pzxb/llama-cpp-python_608d1cdca52343c7aa3b2b70be5ab63f/vendor/llama.cpp/ggml-cuda.cu:7867
  hipStreamCreateWithFlags(&g_cudaStreams[id][is], 0x01)
GGML_ASSERT: /tmp/pip-install-tuy0pzxb/llama-cpp-python_608d1cdca52343c7aa3b2b70be5ab63f/vendor/llama.cpp/ggml-cuda.cu:271: !"CUDA error"
ptrace: Operation not permitted.
No stack.
The program is not being run.
zsh: IOT instruction (core dumped)  HSA_OVERRIDE_GFX_VERSION=11.0.0 python bot.py

Even setting n_gpu_layers to 1 and n_ctx and n_batch to 128 still gives this error.

Final Update: I'm stupid. HIP_VISIBLE_DEVICES=1 should have been 0 not 1. My igpu is 1 somehow. my 7900 xtx is 0. how did i not see that. All working now. :D