Open taikai-zz opened 1 year ago
@taikai-zz could you share a --verbose
installation log?
This may actually be a duplicate of #646 I provided a fix there but I don't think it's been merged into llama.cpp yet, I'll create a PR.
NOW Successfully executed CMAKE_ARGS="-DLLAMA_HIPBLAS=on" pip install llama-cpp-python==0.1.85
python -c "from llama_cpp import GGML_USE_CUBLAS; print(GGML_USE_CUBLAS)" return False
why?
(env) root@gpu:~/.local/share/Open Interpreter/models# CMAKE_ARGS="-DLLAMA_HIPBLAS=on" pip install llama-cpp-python==0.1.85 Requirement already satisfied: llama-cpp-python==0.1.85 in /dockers/text-generation-webui/env/lib/python3.10/site-packages (0.1.85) Requirement already satisfied: typing-extensions>=4.5.0 in /dockers/text-generation-webui/env/lib/python3.10/site-packages (from llama-cpp-python==0.1.85) (4.7.1) Requirement already satisfied: numpy>=1.20.0 in /dockers/text-generation-webui/env/lib/python3.10/site-packages (from llama-cpp-python==0.1.85) (1.25.2) Requirement already satisfied: diskcache>=5.6.1 in /dockers/text-generation-webui/env/lib/python3.10/site-packages (from llama-cpp-python==0.1.85) (5.6.3)
(env) root@gpu:~/.local/share/Open Interpreter/models# python -c "from llama_cpp import GGML_USE_CUBLAS; print(GGML_USE_CUBLAS)" False
(env) root@gpu:~/.local/share/Open Interpreter/models# CMAKE_ARGS="-DLLAMA_HIPBLAS=on" pip install llama-cpp-python==0.2.0 Collecting llama-cpp-python==0.2.0 Using cached llama_cpp_python-0.2.0.tar.gz (1.5 MB) Installing build dependencies ... done Getting requirements to build wheel ... done Installing backend dependencies ... done Preparing metadata (pyproject.toml) ... done Requirement already satisfied: typing-extensions>=4.5.0 in /dockers/text-generation-webui/env/lib/python3.10/site-packages (from llama-cpp-python==0.2.0) (4.7.1) Requirement already satisfied: numpy>=1.20.0 in /dockers/text-generation-webui/env/lib/python3.10/site-packages (from llama-cpp-python==0.2.0) (1.25.2) Requirement already satisfied: diskcache>=5.6.1 in /dockers/text-generation-webui/env/lib/python3.10/site-packages (from llama-cpp-python==0.2.0) (5.6.3) Building wheels for collected packages: llama-cpp-python Building wheel for llama-cpp-python (pyproject.toml) ... error error: subprocess-exited-with-error
× Building wheel for llama-cpp-python (pyproject.toml) did not run successfully. │ exit code: 1 ╰─> [69 lines of output] scikit-build-core 0.5.0 using CMake 3.27.4 (wheel) Configuring CMake... loading initial cache file /tmp/tmp33ysy1x2/build/CMakeInit.txt -- The C compiler identification is GNU 11.4.0 -- The CXX compiler identification is GNU 11.4.0 -- Detecting C compiler ABI info -- Detecting C compiler ABI info - done -- Check for working C compiler: /usr/bin/cc - skipped -- Detecting C compile features -- Detecting C compile features - done -- Detecting CXX compiler ABI info -- Detecting CXX compiler ABI info - done -- Check for working CXX compiler: /usr/bin/c++ - skipped -- Detecting CXX compile features -- Detecting CXX compile features - done -- Found Git: /usr/bin/git (found version "2.25.1") fatal: not a git repository: /tmp/pip-install-xs0nliaw/llama-cpp-python_b4b7025dac8f452f8ba3d3a1cb4b798d/vendor/llama.cpp/../../.git/modules/vendor/llama.cpp fatal: not a git repository: /tmp/pip-install-xs0nliaw/llama-cpp-python_b4b7025dac8f452f8ba3d3a1cb4b798d/vendor/llama.cpp/../../.git/modules/vendor/llama.cpp -- Performing Test CMAKE_HAVE_LIBC_PTHREAD -- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Failed -- Check if compiler accepts -pthread -- Check if compiler accepts -pthread - yes -- Found Threads: TRUE CMake Warning at vendor/llama.cpp/CMakeLists.txt:371 (message): Only LLVM is supported for HIP, hint: CC=/opt/rocm/llvm/bin/clang
CMake Warning at vendor/llama.cpp/CMakeLists.txt:374 (message):
Only LLVM is supported for HIP, hint: CXX=/opt/rocm/llvm/bin/clang++
CMake Deprecation Warning at /opt/rocm/lib/cmake/hip/hip-config.cmake:20 (cmake_minimum_required):
Compatibility with CMake < 3.5 will be removed from a future version of
CMake.
Update the VERSION argument <min> value or use a ...<max> suffix to tell
CMake that the project does not need compatibility with older versions.
Call Stack (most recent call first):
vendor/llama.cpp/CMakeLists.txt:377 (find_package)
-- hip::amdhip64 is SHARED_LIBRARY
CMake Deprecation Warning at /opt/rocm/lib/cmake/hip/hip-config.cmake:20 (cmake_minimum_required):
Compatibility with CMake < 3.5 will be removed from a future version of
CMake.
Update the VERSION argument <min> value or use a ...<max> suffix to tell
CMake that the project does not need compatibility with older versions.
Call Stack (most recent call first):
/tmp/pip-build-env-f1_uucwa/normal/lib/python3.10/site-packages/cmake/data/share/cmake-3.27/Modules/CMakeFindDependencyMacro.cmake:76 (find_package)
/opt/rocm/lib/cmake/hipblas/hipblas-config.cmake:90 (find_dependency)
vendor/llama.cpp/CMakeLists.txt:378 (find_package)
-- hip::amdhip64 is SHARED_LIBRARY
-- HIP and hipBLAS found
-- CMAKE_SYSTEM_PROCESSOR: x86_64
-- x86 detected
-- Configuring done (0.6s)
-- Generating done (0.0s)
-- Build files have been written to: /tmp/tmp33ysy1x2/build
*** Building project with Ninja...
Change Dir: '/tmp/tmp33ysy1x2/build'
Run Build Command(s): /tmp/pip-build-env-f1_uucwa/normal/lib/python3.10/site-packages/ninja/data/bin/ninja -v
ninja: error: '/tmp/pip-install-xs0nliaw/llama-cpp-python_b4b7025dac8f452f8ba3d3a1cb4b798d/.git/modules/vendor/llama.cpp/index', needed by '/tmp/pip-install-xs0nliaw/llama-cpp-python_b4b7025dac8f452f8ba3d3a1cb4b798d/vendor/llama.cpp/build-info.h', missing and no known rule to make it
*** CMake build failed
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip. ERROR: Failed building wheel for llama-cpp-python Failed to build llama-cpp-python ERROR: Could not build wheels for llama-cpp-python, which is required to install pyproject.toml-based projects
Okay I figured this may be more appropriate here and wanted to share my findings.
I thought, as I stated in iss #646, that it might not be related to llama-cpp-python at all and that it was rooted in llama.cpp. I was wrong; It's user error.
The build process for AMD is a bit more involved than that and after troubling shooting all morning, I finally made a break through in successfully building the target for AMD GPU.
13:41:53 | ~/Documents/code/remote/llama.cpp
git:(master | θ) λ make clean
I llama.cpp build info:
I UNAME_S: Linux
I UNAME_P: unknown
I UNAME_M: x86_64
I CFLAGS: -I. -Icommon -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -DNDEBUG -DGGML_USE_K_QUANTS -O3 -std=c11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wdouble-promotion -Wshadow -Wstrict-prototypes -Wpointer-arith -Wmissing-prototypes -Werror=implicit-int -Wno-unused-function -pthread -march=native -mtune=native
I CXXFLAGS: -I. -Icommon -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -DNDEBUG -DGGML_USE_K_QUANTS -O3 -std=c++11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wmissing-declarations -Wno-unused-function -Wno-multichar -Wno-format-truncation -Wno-array-bounds -pthread -march=native -mtune=native
I LDFLAGS:
I CC: cc (GCC) 13.2.1 20230801
I CXX: g++ (GCC) 13.2.1 20230801
# Omitting for brevity
13:41:55 | ~/Documents/code/remote/llama.cpp
git:(master | θ) λ make LLAMA_HIPBLAS=1
I llama.cpp build info:
I UNAME_S: Linux
I UNAME_P: unknown
I UNAME_M: x86_64
I CFLAGS: -I. -Icommon -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -DNDEBUG -DGGML_USE_K_QUANTS -DGGML_USE_HIPBLAS -DGGML_USE_CUBLAS -O3 -std=c11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wdouble-promotion -Wshadow -Wstrict-prototypes -Wpointer-arith -Wmissing-prototypes -Werror=implicit-int -Wno-unused-function -pthread -march=native -mtune=native
I CXXFLAGS: -I. -Icommon -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -DNDEBUG -DGGML_USE_K_QUANTS -DGGML_USE_HIPBLAS -DGGML_USE_CUBLAS -O3 -std=c++11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wmissing-declarations -Wno-unused-function -Wno-multichar -Wno-format-truncation -Wno-array-bounds -pthread -march=native -mtune=native
I LDFLAGS: -L/opt/rocm/lib -Wl,-rpath=/opt/rocm/lib -lhipblas -lamdhip64 -lrocblas
I CC: cc (GCC) 13.2.1 20230801
I CXX: g++ (GCC) 13.2.1 20230801
# Omitting for brevity
13:43:23 | ~/Documents/code/remote/llama.cpp
git:(master | θ) λ mkdir build
13:44:13 | ~/Documents/code/remote/llama.cpp
git:(master | θ) λ cd build
13:44:15 | ~/Documents/code/remote/llama.cpp/build
git:(master | θ) λ export HIP_VISIBLE_DEVICES=0 # If you have more than one GPU
13:44:26 | ~/Documents/code/remote/llama.cpp/build
git:(master | θ) λ export HSA_OVERRIDE_GFX_VERSION=10.3.0 # If your GPU is not officially supported
13:44:36 | ~/Documents/code/remote/llama.cpp/build
git:(master | θ) λ CC=/opt/rocm/llvm/bin/clang CXX=/opt/rocm/llvm/bin/clang++ cmake .. -DLLAMA_HIPBLAS=ON -DLLAMA_CUDA_DMMV_X=64 -DLLAMA_CUDA_MMV_Y=2
-- The C compiler identification is Clang 16.0.0
-- The CXX compiler identification is Clang 16.0.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /opt/rocm/llvm/bin/clang - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /opt/rocm/llvm/bin/clang++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Found Git: /usr/bin/git (found version "2.42.0")
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
-- Found Threads: TRUE
CMake Deprecation Warning at /opt/rocm/lib/cmake/hip/hip-config.cmake:20 (cmake_minimum_required):
Compatibility with CMake < 3.5 will be removed from a future version of
CMake.
Update the VERSION argument <min> value or use a ...<max> suffix to tell
CMake that the project does not need compatibility with older versions.
Call Stack (most recent call first):
CMakeLists.txt:381 (find_package)
-- hip::amdhip64 is SHARED_LIBRARY
-- Performing Test HIP_CLANG_SUPPORTS_PARALLEL_JOBS
-- Performing Test HIP_CLANG_SUPPORTS_PARALLEL_JOBS - Success
CMake Deprecation Warning at /opt/rocm/lib/cmake/hip/hip-config.cmake:20 (cmake_minimum_required):
Compatibility with CMake < 3.5 will be removed from a future version of
CMake.
Update the VERSION argument <min> value or use a ...<max> suffix to tell
CMake that the project does not need compatibility with older versions.
Call Stack (most recent call first):
/home/austin/.local/lib/python3.11/site-packages/cmake/data/share/cmake-3.27/Modules/CMakeFindDependencyMacro.cmake:76 (find_package)
/opt/rocm/lib/cmake/hipblas/hipblas-config.cmake:90 (find_dependency)
CMakeLists.txt:382 (find_package)
-- hip::amdhip64 is SHARED_LIBRARY
-- HIP and hipBLAS found
-- CMAKE_SYSTEM_PROCESSOR: x86_64
-- x86 detected
-- Configuring done (0.5s)
-- Generating done (0.0s)
-- Build files have been written to: /home/austin/Documents/code/remote/llama.cpp/build
13:44:46 | ~/Documents/code/remote/llama.cpp/build
git:(master | θ) λ cmake --build .
[ 1%] Built target BUILD_INFO
[ 2%] Building CXX object CMakeFiles/ggml-rocm.dir/ggml-cuda.cu.o
# omitting for brevity... this is usually where errors begin.
# if all goes well, it'll hang for a bit and then chug on...
[ 4%] Building C object CMakeFiles/ggml.dir/ggml.c.o
# just wait it out...
[ 98%] Building CXX object pocs/vdot/CMakeFiles/q8dot.dir/q8dot.cpp.o
[100%] Linking CXX executable ../../bin/q8dot
[100%] Built target q8dot
This is just the start though. The way the instructions are set up for llama-cpp-python is not really clear and doesn't give you a lay of the land, so-to-speak.
It took a bit of tinkering and some troubleshooting, but I was able to get it working afterwards once I translated how to pass the environment variables along.
13:34:38 | ~/Documents/code/remote/pygptprompt
(.venv) git:(main | Δ) λ LLAMA_HIPBLAS=on HIP_VISIBLE_DEVICES=0 HSA_OVERRIDE_GFX_VERSION=10.3.0 LLAMA_CUDA_DMMV_X=64 LLAMA_CUDA_MMV_Y=2 pip install llama-cpp-python --force-reinstall --upgrade --no-cache-dir
Collecting llama-cpp-python
Downloading llama_cpp_python-0.2.6.tar.gz (1.6 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.6/1.6 MB 13.6 MB/s eta 0:00:00
Installing build dependencies ... done
Getting requirements to build wheel ... done
Installing backend dependencies ... done
Preparing metadata (pyproject.toml) ... done
Collecting typing-extensions>=4.5.0 (from llama-cpp-python)
Obtaining dependency information for typing-extensions>=4.5.0 from https://files.pythonhosted.org/packages/ec/6b/63cc3df74987c36fe26157ee12e09e8f9db4de771e0f3404263117e75b95/typing_extensions-4.7.1-py3-none-any.whl.metadata
Downloading typing_extensions-4.7.1-py3-none-any.whl.metadata (3.1 kB)
Collecting numpy>=1.20.0 (from llama-cpp-python)
Obtaining dependency information for numpy>=1.20.0 from https://files.pythonhosted.org/packages/32/6a/65dbc57a89078af9ff8bfcd4c0761a50172d90192eaeb1b6f56e5fbf1c3d/numpy-1.25.2-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata
Downloading numpy-1.25.2-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (5.6 kB)
Collecting diskcache>=5.6.1 (from llama-cpp-python)
Obtaining dependency information for diskcache>=5.6.1 from https://files.pythonhosted.org/packages/3f/27/4570e78fc0bf5ea0ca45eb1de3818a23787af9b390c0b0a0033a1b8236f9/diskcache-5.6.3-py3-none-any.whl.metadata
Downloading diskcache-5.6.3-py3-none-any.whl.metadata (20 kB)
Downloading diskcache-5.6.3-py3-none-any.whl (45 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 45.5/45.5 kB 636.3 MB/s eta 0:00:00
Downloading numpy-1.25.2-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (18.2 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 18.2/18.2 MB 38.9 MB/s eta 0:00:00
Downloading typing_extensions-4.7.1-py3-none-any.whl (33 kB)
Building wheels for collected packages: llama-cpp-python
Building wheel for llama-cpp-python (pyproject.toml) ... done
Created wheel for llama-cpp-python: filename=llama_cpp_python-0.2.6-cp311-cp311-manylinux_2_38_x86_64.whl size=992662 sha256=858a30b29b7511a65bc888c3325a703edfa4696d083392f1c469c28f93ad24a2
Stored in directory: /tmp/pip-ephem-wheel-cache-bwqivaux/wheels/18/f3/e6/e6d374c76db44b5b0451c3a76b3049f29e881819bc43f53d4d
Successfully built llama-cpp-python
Installing collected packages: typing-extensions, numpy, diskcache, llama-cpp-python
Attempting uninstall: typing-extensions
Found existing installation: typing_extensions 4.7.1
Uninstalling typing_extensions-4.7.1:
Successfully uninstalled typing_extensions-4.7.1
Attempting uninstall: numpy
Found existing installation: numpy 1.25.2
Uninstalling numpy-1.25.2:
Successfully uninstalled numpy-1.25.2
Attempting uninstall: diskcache
Found existing installation: diskcache 5.6.3
Uninstalling diskcache-5.6.3:
Successfully uninstalled diskcache-5.6.3
Attempting uninstall: llama-cpp-python
Found existing installation: llama_cpp_python 0.2.6
Uninstalling llama_cpp_python-0.2.6:
Successfully uninstalled llama_cpp_python-0.2.6
Successfully installed diskcache-5.6.3 llama-cpp-python-0.2.6 numpy-1.25.2 typing-extensions-4.7.1
Step-by-step instructions that omit the noise to make it clearer:
Building llama.cpp for AMD GPU
Set the required environment variables (adjust values if necessary):
export HIP_VISIBLE_DEVICES=0 # If you have more than one GPU
export HSA_OVERRIDE_GFX_VERSION=10.3.0 # If your GPU is not officially supported
Configure the build using CMake:
CC=/opt/rocm/llvm/bin/clang CXX=/opt/rocm/llvm/bin/clang++ cmake .. -DLLAMA_HIPBLAS=ON -DLLAMA_CUDA_DMMV_X=64 -DLLAMA_CUDA_MMV_Y=2
Build llama.cpp:
cmake --build .
Building llama-cpp-python for AMD GPU
Make sure you have the llama.cpp repository cloned and built as mentioned above.
Create a virtual environment (optional but recommended):
python -m venv .venv
source .venv/bin/activate
Install llama-cpp-python with the required environment variables (adjust values if necessary):
LLAMA_HIPBLAS=on HIP_VISIBLE_DEVICES=0 HSA_OVERRIDE_GFX_VERSION=10.3.0 LLAMA_CUDA_DMMV_X=64 LLAMA_CUDA_MMV_Y=2 pip install llama-cpp-python --force-reinstall --upgrade --no-cache-dir
Hopefully this helps!
Let me know if this works!
Just wanted to provide something potentially useful:
Running the command in the README to install with hipblas
$ CMAKE_ARGS="-DLLAMA_HIPBLAS=on" pip install llama-cpp-python
couldn't build wheels (c++: error: language hip not recognized
)
After searching around, I apparently needed to set
$ export CXX=hipcc
then
$ CMAKE_ARGS="-DLLAMA_HIPBLAS=on" pip install llama-cpp-python
and was able to successfully install the bindings without any issues.
I couldn't find this setting mentioned anywhere, perhaps putting a note in the README might be useful?
@muaiyadh
Thank you very much for following your method and achieving success
@teleprint-me Thank you very much for your help. $ export CXX=hipcc $ CMAKE_ARGS="-DLLAMA_HIPBLAS=on" pip install llama-cpp-python Solved my problem
Technically, CMAKE_ARGS
isn't really needed; Neither are the quotes. What matters is that the environment variables are passed along with their values.
Exporting CXX=hipcc
just tells cmake
what compiler to use for C++, e.g. hipcc
.
So, if the ROCm compiler is globally availble via PATH
environment variable, then that's why it'll work.
I just use absolute paths because it makes it clearer what's happening as a result.
CC=/opt/rocm/llvm/bin/clang CXX=/opt/rocm/llvm/bin/clang++
CC
for C and CXX
for C++.
Using the proper compiler that's installed and configured for your system will play a key role in whether it succeeds or not.
When you use export CXX=hipcc
, it just injects the key/value pair into the environ.
echo $CXX
hipcc # this is echoed to stdout
A good way to check if hipcc
is available is to just use which
.
which hipcc
I don't have hipcc
set, so it just tells me its not found, hence why I referenced the absolute paths.
Passing the variables before the command just passes the key/value pairs to the CLI app as it executes.
So, technically,
CXX=hipcc DLLAMA_HIPBLAS=on pip install llama-cpp-python
should be valid as well.
If I wanted it set, then all I'd need to do is
which hipcc
hipcc not found
PATH=/opt/rocm/bin:${PATH}
which hipcc
/opt/rocm/bin/hipcc
The reason CMAKE_ARGS="-DLLAMA_HIPBLAS=on"
works is because cmake
is passed -DLLAMA_HIPBLAS=on
as a CLI argument which is what llama.cpp uses to build the AMD targets.
@teleprint-me what os are you using?
I'm using arch and able to compile, but I receive a cuda error when trying to run a model where I offload some layers to the gpu.
llm_load_tensors: ggml ctx size = 0.12 MB
llm_load_tensors: using ROCm for GPU acceleration
llm_load_tensors: mem required = 8942.91 MB (+ 400.00 MB per state)
llm_load_tensors: offloading 5 repeating layers to GPU
llm_load_tensors: offloaded 5/43 layers to GPU
llm_load_tensors: VRAM used: 1241 MB
....................................................................................................
llama_new_context_with_model: kv self size = 400.00 MB
llama_new_context_with_model: compute buffer total size = 75.47 MB
llama_new_context_with_model: VRAM scratch buffer: 74.00 MB
CUDA error 98 at /tmp/pip-install-bojptw49/llama-cpp-python_4627b943be6f48b8939df8bb4aad9957/vendor/llama.cpp/ggml-cuda.cu:6233: invalid device function
current device: 0
Strangly compiling llama.cpp and running the model over there it works perfectly.
Note: I've compile llama.cpp using make and not cmake. Using your command to compile using cmake gives the same error.
@hugo-brites
I'm using the EndeavourOS distribution (it's faster/easier to install). It should work the same way on Arch Linux as it is its parent distribution.
I followed the guide after reviewing the AMD ROCm, Arch Linux, and llama.cpp documentation.
It'd be more helpful to understand the steps you're taking though.
I can't really tell what's happening just from errors unless I see both the CLI input and output; Similar to what I provided earlier in this thread.
I hand picked the packages though.
# Function to install OpenCL
install_opencl() {
# NOTE: omitted miopengemm for kernel generation because it requires AUR
if ! sudo pacman -S openblas openblas64 opencl-headers libclc opencl-clhpp ocl-icd lib32-ocl-icd clinfo clpeak nvtop --noconfirm; then
echo "Failed to install OpenCL"
exit 1
fi
}
# Function to install AMD Vulkan driver support
install_amd_vulkan() {
if ! sudo pacman -S mesa lib32-mesa vulkan-radeon lib32-vulkan-radeon vulkan-icd-loader lib32-vulkan-icd-loader vkd3d lib32-vkd3d vulkan-headers vulkan-validation-layers vulkan-tools --noconfirm; then
echo "Failed to install AMD Vulkan driver support"
exit 1
fi
echo "AMD Vulkan driver support installed successfully"
}
# Function to install AMD ROCm
install_amd_rocm() {
if ! sudo pacman -S rocm-core rocm-llvm rocm-clang-ocl rocm-cmake rocm-smi-lib rocm-hip-libraries rocm-hip-runtime rocm-hip-sdk rocm-language-runtime rocm-opencl-runtime rocm-opencl-sdk rocm-device-libs rocm-ml-libraries rocm-ml-sdk rocminfo hipblas rocblas rocsparse rccl python-pytorch-rocm python-pytorch-opt-rocm --noconfirm; then
echo "Failed to install AMD ROCm"
exit 1
fi
}
install_amd() {
confirm_proceed "AMD GPU drivers, OpenCL, Vulkan, and ROCm" || return
install_opencl
install_amd_vulkan
install_amd_rocm
install_python_mlai_rocm
}
If it works normally while you're building from source for llama.cpp, I would extrapolate the steps from there and repackage the environment variables that helped you succeed with your build.
If you carefully inspect, review, and follow my original comment, you'll see that's exactly what I did.
I had to figure out what environment variables would work for my build and then I was able to extrapolate that into a coherent and valid command line.
You should post it because it might help someone else in the future and there's not enough AMD info out there. It's sparse, scattered, and incoherent. Maybe this thread can become useful in the future as a result.
The architecture of your AMD GPU matters.
03:02:33 | ~/Documents/code/remote/pygptprompt
(.venv) git:(main | Δ) λ /opt/rocm/llvm/bin/amdgpu-arch
gfx803
gfx1036
For example, my RX580 doesn't work out of the box, so I had to set the custom flag to override it.
HSA_OVERRIDE_GFX_VERSION=10.3.0
@teleprint-me
For me the hardest part was trying to figuring out the packages that are required. I even installed ubuntu 22.04 but I I'm doing the same steps you are.
I had a look at your script and saw some of couple of packages missing and added them with the exception of python-pytorch-rocm
and python-pytorch-opt-rocm
because they conflict with each other.
As far as I'm aware, the python-pytorch-rocm
is for cpus that don't support AVX2, which is not my case as I have a Ryzen 3600.
Building llama.cpp, my output windows looks exactly like yours
=❯ mkdir build ; cd build
=❯ CC=/opt/rocm/llvm/bin/clang CXX=/opt/rocm/llvm/bin/clang++ cmake .. -DLLAMA_HIPBLAS=ON
-- The C compiler identification is Clang 16.0.0
-- The CXX compiler identification is Clang 16.0.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /opt/rocm/llvm/bin/clang - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /opt/rocm/llvm/bin/clang++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Found Git: /usr/bin/git (found version "2.42.0")
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
-- Found Threads: TRUE
CMake Deprecation Warning at /opt/rocm/lib/cmake/hip/hip-config.cmake:20 (cmake_minimum_required):
Compatibility with CMake < 3.5 will be removed from a future version of
CMake.
Update the VERSION argument <min> value or use a ...<max> suffix to tell
CMake that the project does not need compatibility with older versions.
Call Stack (most recent call first):
CMakeLists.txt:384 (find_package)
-- hip::amdhip64 is SHARED_LIBRARY
-- Performing Test HIP_CLANG_SUPPORTS_PARALLEL_JOBS
-- Performing Test HIP_CLANG_SUPPORTS_PARALLEL_JOBS - Success
CMake Deprecation Warning at /opt/rocm/lib/cmake/hip/hip-config.cmake:20 (cmake_minimum_required):
Compatibility with CMake < 3.5 will be removed from a future version of
CMake.
Update the VERSION argument <min> value or use a ...<max> suffix to tell
CMake that the project does not need compatibility with older versions.
Call Stack (most recent call first):
/usr/share/cmake/Modules/CMakeFindDependencyMacro.cmake:76 (find_package)
/opt/rocm/lib/cmake/hipblas/hipblas-config.cmake:90 (find_dependency)
CMakeLists.txt:385 (find_package)
-- hip::amdhip64 is SHARED_LIBRARY
-- HIP and hipBLAS found
-- CMAKE_SYSTEM_PROCESSOR: x86_64
-- x86 detected
-- Configuring done (0.7s)
-- Generating done (0.1s)
-- Build files have been written to: /home/hugo/develop/study/python/notebook/llama.cpp/build
=❯ cmake --build .
[ 1%] Built target BUILD_INFO
[ 2%] Building CXX object CMakeFiles/ggml-rocm.dir/ggml-cuda.cu.o
[ 2%] Built target ggml-rocm
[ 4%] Building C object CMakeFiles/ggml.dir/ggml.c.o
/home/hugo/develop/study/python/notebook/llama.cpp/ggml.c:2391:5: warning: implicit conversion increases floating-point precision: 'float' to 'ggml_float' (aka 'double') [-Wdouble-promotion]
GGML_F16_VEC_REDUCE(sumf, sum);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/hugo/develop/study/python/notebook/llama.cpp/ggml.c:2023:37: note: expanded from macro 'GGML_F16_VEC_REDUCE'
#define GGML_F16_VEC_REDUCE GGML_F32Cx8_REDUCE
^
/home/hugo/develop/study/python/notebook/llama.cpp/ggml.c:2013:33: note: expanded from macro 'GGML_F32Cx8_REDUCE'
#define GGML_F32Cx8_REDUCE GGML_F32x8_REDUCE
^
/home/hugo/develop/study/python/notebook/llama.cpp/ggml.c:1959:11: note: expanded from macro 'GGML_F32x8_REDUCE'
res = _mm_cvtss_f32(_mm_hadd_ps(t1, t1)); \
~ ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/hugo/develop/study/python/notebook/llama.cpp/ggml.c:3657:9: warning: implicit conversion increases floating-point precision: 'float' to 'ggml_float' (aka 'double') [-Wdouble-promotion]
GGML_F16_VEC_REDUCE(sumf[k], sum[k]);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/hugo/develop/study/python/notebook/llama.cpp/ggml.c:2023:37: note: expanded from macro 'GGML_F16_VEC_REDUCE'
#define GGML_F16_VEC_REDUCE GGML_F32Cx8_REDUCE
^
/home/hugo/develop/study/python/notebook/llama.cpp/ggml.c:2013:33: note: expanded from macro 'GGML_F32Cx8_REDUCE'
#define GGML_F32Cx8_REDUCE GGML_F32x8_REDUCE
^
/home/hugo/develop/study/python/notebook/llama.cpp/ggml.c:1959:11: note: expanded from macro 'GGML_F32x8_REDUCE'
res = _mm_cvtss_f32(_mm_hadd_ps(t1, t1)); \
~ ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
2 warnings generated.
[ 5%] Building C object CMakeFiles/ggml.dir/ggml-alloc.c.o
... removing but no other errors
[ 94%] Built target server
[ 95%] Building CXX object pocs/vdot/CMakeFiles/vdot.dir/vdot.cpp.o
[ 97%] Linking CXX executable ../../bin/vdot
[ 97%] Built target vdot
[ 98%] Building CXX object pocs/vdot/CMakeFiles/q8dot.dir/q8dot.cpp.o
[100%] Linking CXX executable ../../bin/q8dot
[100%] Built target q8dot
=❯ ./bin/main -ngl 25 -m ../../models/speechless-llama2-hermes-orca-platypus-wizardlm-13b.Q6_K.gguf -p "Create me a list of the moons for each planet of the solar system:\n" -n 400 -e
Log start
... removing but no errors
llm_load_print_meta: format = GGUF V2 (latest)
llm_load_print_meta: arch = llama
llm_load_print_meta: vocab type = SPM
llm_load_print_meta: n_vocab = 32000
llm_load_print_meta: n_merges = 0
llm_load_print_meta: n_ctx_train = 4096
llm_load_print_meta: n_ctx = 512
llm_load_print_meta: n_embd = 5120
llm_load_print_meta: n_head = 40
llm_load_print_meta: n_head_kv = 40
llm_load_print_meta: n_layer = 40
llm_load_print_meta: n_rot = 128
llm_load_print_meta: n_gqa = 1
llm_load_print_meta: f_norm_eps = 1.0e-05
llm_load_print_meta: f_norm_rms_eps = 1.0e-05
llm_load_print_meta: n_ff = 13824
llm_load_print_meta: freq_base = 10000.0
llm_load_print_meta: freq_scale = 1
llm_load_print_meta: model type = 13B
llm_load_print_meta: model ftype = mostly Q6_K
llm_load_print_meta: model params = 13.02 B
llm_load_print_meta: model size = 9.95 GiB (6.56 BPW)
llm_load_print_meta: general.name = LLaMA v2
llm_load_print_meta: BOS token = 1 '<s>'
llm_load_print_meta: EOS token = 2 '</s>'
llm_load_print_meta: UNK token = 0 '<unk>'
llm_load_print_meta: LF token = 13 '<0x0A>'
llm_load_tensors: ggml ctx size = 0.12 MB
llm_load_tensors: using ROCm for GPU acceleration
llm_load_tensors: mem required = 3979.24 MB (+ 400.00 MB per state)
llm_load_tensors: offloading 25 repeating layers to GPU
llm_load_tensors: offloaded 25/43 layers to GPU
llm_load_tensors: VRAM used: 6205 MB
....................................................................................................
llama_new_context_with_model: kv self size = 400.00 MB
llama_new_context_with_model: compute buffer total size = 75.47 MB
llama_new_context_with_model: VRAM scratch buffer: 74.00 MB
CUDA error 98 at /home/hugo/develop/study/python/notebook/llama.cpp/ggml-cuda.cu:6246: invalid device function
current device: 0
At this moment I think that this is something related to the llvm libraries, because compiling with make
I have another complete outcome.
in the root folder of llama.cpp
=> rm -rf dist
=> make clean
=> make LLAMA_HIPBLAS=1
... some warnings but if compiles
But when I run
./main -ngl 25 -m ../models/speechless-llama2-hermes-orca-platypus-wizardlm-13b.Q6_K.gguf -p "Create me a list of the moons for each planet of the solar system:\n" -n 400 -e
Log start
... removing but no errors
llm_load_print_meta: format = GGUF V2 (latest)
llm_load_print_meta: arch = llama
llm_load_print_meta: vocab type = SPM
llm_load_print_meta: n_vocab = 32000
llm_load_print_meta: n_merges = 0
llm_load_print_meta: n_ctx_train = 4096
llm_load_print_meta: n_ctx = 512
llm_load_print_meta: n_embd = 5120
llm_load_print_meta: n_head = 40
llm_load_print_meta: n_head_kv = 40
llm_load_print_meta: n_layer = 40
llm_load_print_meta: n_rot = 128
llm_load_print_meta: n_gqa = 1
llm_load_print_meta: f_norm_eps = 1.0e-05
llm_load_print_meta: f_norm_rms_eps = 1.0e-05
llm_load_print_meta: n_ff = 13824
llm_load_print_meta: freq_base = 10000.0
llm_load_print_meta: freq_scale = 1
llm_load_print_meta: model type = 13B
llm_load_print_meta: model ftype = mostly Q6_K
llm_load_print_meta: model params = 13.02 B
llm_load_print_meta: model size = 9.95 GiB (6.56 BPW)
llm_load_print_meta: general.name = LLaMA v2
llm_load_print_meta: BOS token = 1 '<s>'
llm_load_print_meta: EOS token = 2 '</s>'
llm_load_print_meta: UNK token = 0 '<unk>'
llm_load_print_meta: LF token = 13 '<0x0A>'
llm_load_tensors: ggml ctx size = 0.12 MB
llm_load_tensors: using ROCm for GPU acceleration
llm_load_tensors: mem required = 3979.24 MB (+ 400.00 MB per state)
llm_load_tensors: offloading 25 repeating layers to GPU
llm_load_tensors: offloaded 25/43 layers to GPU
llm_load_tensors: VRAM used: 6205 MB
....................................................................................................
llama_new_context_with_model: kv self size = 400.00 MB
llama_new_context_with_model: compute buffer total size = 75.47 MB
llama_new_context_with_model: VRAM scratch buffer: 74.00 MB
system_info: n_threads = 6 / 12 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 |
sampling: repeat_last_n = 64, repeat_penalty = 1.100000, presence_penalty = 0.000000, frequency_penalty = 0.000000, top_k = 40, tfs_z = 1.000000, top_p = 0.950000, typical_p = 1.000000, temp = 0.800000, mirostat = 0, mirostat_lr = 0.100000, mirostat_ent = 5.000000
generate: n_ctx = 512, n_batch = 512, n_predict = 400, n_keep = 0
Create me a list of the moons for each planet of the solar system:
1. Mercury - There are no known moons around Mercury.
2. Venus - No known moons, only one small irregular satellite called Sputnik 1 (also known as Venera 1) was artificially placed into orbit by the Soviet Union in 1957, but it no longer orbits Venus.
3. Earth - 1 moon: The Moon
4. Mars - 2 moons: Phobos and Deimos
5. Jupiter - 79 known moons (including four large ones: Io, Europa, Ganymede, and Callisto)
6. Saturn - 82 known moons (including the largest moon, Titan)
7. Uranus - 27 known moons (including Titania, Oberon, Umbriel, Ariel, Miranda, and Umbriel)
8. Neptune - 14 known moons (including Triton, Proteus, Nereid, and Psyché)
9. Pluto - No known natural moons, but its largest moon is Charon
Please note that the number of moons can change as new discoveries are made. This information is accurate as of my last update in 2021. [end of text]
llama_print_timings: load time = 2406.33 ms
llama_print_timings: sample time = 175.91 ms / 273 runs ( 0.64 ms per token, 1551.95 tokens per second)
llama_print_timings: prompt eval time = 851.03 ms / 18 tokens ( 47.28 ms per token, 21.15 tokens per second)
llama_print_timings: eval time = 32915.72 ms / 272 runs ( 121.01 ms per token, 8.26 tokens per second)
llama_print_timings: total time = 34065.02 ms
Log end
As for the current packages
=> yay -Q | grep rocm
python-pytorch-opt-rocm 2.0.1-9
python-torchvision-rocm 0.15.2-1
rocm-clang-ocl 5.6.1-1
rocm-cmake 5.6.1-1
rocm-core 5.6.1-1
rocm-device-libs 5.6.1-1
rocm-hip-libraries 5.6.1-1
rocm-hip-runtime 5.6.1-1
rocm-hip-sdk 5.6.1-1
rocm-language-runtime 5.6.1-1
rocm-llvm 5.6.1-1
rocm-ml-libraries 5.6.1-1
rocm-ml-sdk 5.6.1-1
rocm-opencl-runtime 5.6.1-1
rocm-opencl-sdk 5.6.1-1
rocm-smi-lib 5.6.1-1
rocminfo 5.6.1-1
=> yay -Q | grep hip
hip-runtime-amd 5.6.1-1
hipblas 5.6.1-1
hipcub 5.6.1-1
hipfft 5.6.1-1
hipsolver 5.6.1-1
hipsparse 5.6.1-1
magma-hip 2.7.1-9
miopen-hip 5.6.1-1
rocm-hip-libraries 5.6.1-1
rocm-hip-runtime 5.6.1-1
rocm-hip-sdk 5.6.1-1
I will keep trying and if I can fix it, I will post it here.
I ran into this:
FAILED: vendor/llama.cpp/CMakeFiles/ggml-rocm.dir/ggml-cuda.cu.o
/usr/bin/c++ -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_USE_CUBLAS -DGGML_USE_HIPBLAS -DGGML_USE_K_QUANTS -DK_QUANTS_PER_ITERATION=2 -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -D__HIP_PLATFORM_AMD__=1 -D__HIP_PLATFORM_HCC__=1 -isystem /opt/rocm/include -isystem /opt/rocm-5.6.0/include -O3 -DNDEBUG -std=gnu++11 -fPIC -x hip --offload-arch=gfx900 --offload-arch=gfx906 --offload-arch=gfx908 --offload-arch=gfx90a --offload-arch=gfx1030 -MD -MT vendor/llama.cpp/CMakeFiles/ggml-rocm.dir/ggml-cuda.cu.o -MF vendor/llama.cpp/CMakeFiles/ggml-rocm.dir/ggml-cuda.cu.o.d -o vendor/llama.cpp/CMakeFiles/ggml-rocm.dir/ggml-cuda.cu.o -c /tmp/pip-install-42jywmya/llama-cpp-python_8c90f19cea74411a841d5b229dfc2d75/vendor/llama.cpp/ggml-cuda.cu
c++: error: unrecognized command-line option ‘--offload-arch=gfx900’
c++: error: unrecognized command-line option ‘--offload-arch=gfx906’
c++: error: unrecognized command-line option ‘--offload-arch=gfx908’
c++: error: unrecognized command-line option ‘--offload-arch=gfx90a’
c++: error: unrecognized command-line option ‘--offload-arch=gfx1030’
Adding CXX=hipcc
in front of CMAKE_ARGS="-DLLAMA_HIPBLAS=on" pip install llama-cpp-python
solved the problem. shall we update the documentation?
Just wanted to provide something potentially useful:
Running the command in the README to install with hipblas
$ CMAKE_ARGS="-DLLAMA_HIPBLAS=on" pip install llama-cpp-python
couldn't build wheels (c++: error: language hip not recognized
)After searching around, I apparently needed to set
$ export CXX=hipcc
then$ CMAKE_ARGS="-DLLAMA_HIPBLAS=on" pip install llama-cpp-python
and was able to successfully install the bindings without any issues.I couldn't find this setting mentioned anywhere, perhaps putting a note in the README might be useful?
This worked for me on Arch but I had to specify the full path to hipcc:
CMAKE_ARGS="-DLLAMA_HIPBLAS=on" FORCE_CMAKE=1 CXX=/opt/rocm/bin/hipcc pip install llama-cpp-python --force-reinstall --upgrade --no-cache
Hi @teleprint-me ,
Finally got to working. It ended up finding a thread on llama.cpp ROCm error: ggml-cuda.cu:6246: invalid device function that pointed me on what was missing in my setup.
My GPU is a 7900xtx, which means is a gfx1100
card and by default is not included in the defaults of hip-config.cmake
.
So we need to change the command line to include the said support.
CMAKE_ARGS="-DLLAMA_HIPBLAS=on -DAMDGPU_TARGETS=gfx1100" FORCE_CMAKE=1 CXX=/opt/rocm/bin/hipcc pip install llama-cpp-python --force-reinstall --upgrade --no-cache
Thanks for everyone's help
Ran into similar issue...
My solution was:
CMAKE_ARGS="-DLLAMA_HIPBLAS=on -DCMAKE_C_COMPILER=/opt/rocm/llvm/bin/clang -DCMAKE_CXX_COMPILER=/opt/rocm/llvm/bin/clang++ -DCMAKE_PREFIX_PATH=/opt/rocm" FORCE_CMAKE=1 pip install llama-cpp-python
Although I believe that DCMAKE_PREFIX_PATH can be omitted.
Hi guys, i want to add my experience: Ryzen 5700X AMD RX6700 (RDNA2, gfx1031 not supported officially) Ubuntu 22.04.3 Python 3.10.12
First I able to run llama.cpp with ROCm 6
make -j16 LLAMA_HIPBLAS=1 HSA_OVERRIDE_GFX_VERSION=10.3.0
After that I tried use the llama-cpp-python wrapper with this options:
CMAKE_ARGS="-DLLAMA_HIPBLAS=on -DHSA_OVERRIDE_GFX_VERSION=10.3.0 -DAMDGPU_TARGETS=gfx1030" pip install --verbose llama-cpp-python
Error building wheel (HSA_OVERRIDE_GFX_VERSION is not recognized as CMake arg, this is a bug?)
CMAKE_ARGS="-DLLAMA_HIPBLAS=on -DCMAKE_C_COMPILER=/opt/rocm/llvm/bin/clang -DCMAKE_CXX_COMPILER=/opt/rocm/llvm/bin/clang++ -DCMAKE_PREFIX_PATH=/opt/rocm" FORCE_CMAKE=1 pip install llama-cpp-python
Installs correctly but got @hugo-brites error CUDA error 98 at /home/hugo/develop/study/python/notebook/llama.cpp/ggml-cuda.cu:6246: invalid device function current device: 0
Works for me with:
CMAKE_ARGS="-DLLAMA_HIPBLAS=on -DCMAKE_C_COMPILER=/opt/rocm/llvm/bin/clang -DCMAKE_CXX_COMPILER=/opt/rocm/llvm/bin/clang++ -DCMAKE_PREFIX_PATH=/opt/rocm -DAMDGPU_TARGETS=gfx1030" FORCE_CMAKE=1 pip install llama-cpp-python
I hope it is useful to someone.
For RDNA3 users come across this issue, this works for me:
CC=/opt/rocm/llvm/bin/clang CXX=/opt/rocm/llvm/bin/clang++ CMAKE_ARGS="-DLLAMA_HIPBLAS=on -DCMAKE_BUILD_TYP
E=Release -DLLAMA_HIPBLAS=ON -DLLAMA_CUDA_DMMV_X=64 -DLLAMA_CUDA_MMV_Y=4 -DCMAKE_PREFIX_PATH=/opt/rocm -DAMDGPU_TARGETS=gfx1100" pip install llama-cpp-python
For RDNA3 users come across this issue, this works for me:
CC=/opt/rocm/llvm/bin/clang CXX=/opt/rocm/llvm/bin/clang++ CMAKE_ARGS="-DLLAMA_HIPBLAS=on -DCMAKE_BUILD_TYP E=Release -DLLAMA_HIPBLAS=ON -DLLAMA_CUDA_DMMV_X=64 -DLLAMA_CUDA_MMV_Y=4 -DCMAKE_PREFIX_PATH=/opt/rocm -DAMDGPU_TARGETS=**gfx1100**" pip install llama-cpp-python
This worked for me using RDNA2
You can find out the name of the GPU target by running rocminfo | grep gfx
none of these work for me. The furthest I get is with @hugo-brites last suggestion, but it stil lfails to compile with errors saying unreachable-code-break isn't correct and to use -Wunreachable-code instead, but i don't know how to change that with a pipe install command
Update: Finally got llama-cpp-python to install with:
CC='/opt/rocm/llvm/bin/clang' CXX='/opt/rocm/llvm/bin/clang++' CFLAGS='-fPIC' CXXFLAGS='-fPIC' CMAKE_PREFIX_PATH='/opt/rocm' ROCM_PATH="/opt/rocm" HIP_PATH="/opt/rocm" CMAKE_ARGS="-GNinja -DLLAMA_HIPBLAS=ON -DLLAMA_AVX2=on -DGPU_TARGETS=$GFX_VER" pip install --no-cache-dir llama-cpp-python
but now when i run my program that was successfully running before I get:
Memory access fault by GPU node-2 (Agent handle: 0x5b706a0f2430) on address 0x7363692cf000. Reason: Page not present or supervisor privilege.
zsh: IOT instruction (core dumped) HSA_OVERRIDE_GFX_VERSION=11.0.0 HIP_VISIBLE_DEVICES=1 python bot.py
Setting n_gpu_layers=6 instead of -1 gives:
ggml_init_cublas: GGML_CUDA_FORCE_MMQ: no
ggml_init_cublas: CUDA_USE_TENSOR_CORES: yes
ggml_init_cublas: found 2 ROCm devices:
Device 0: AMD Radeon RX 7900 XTX, compute capability 11.0, VMM: no
Device 1: AMD Radeon Graphics, compute capability 11.0, VMM: no
CUDA error: out of memory
current device: 1, in function ggml_init_cublas at /tmp/pip-install-tuy0pzxb/llama-cpp-python_608d1cdca52343c7aa3b2b70be5ab63f/vendor/llama.cpp/ggml-cuda.cu:7867
hipStreamCreateWithFlags(&g_cudaStreams[id][is], 0x01)
GGML_ASSERT: /tmp/pip-install-tuy0pzxb/llama-cpp-python_608d1cdca52343c7aa3b2b70be5ab63f/vendor/llama.cpp/ggml-cuda.cu:271: !"CUDA error"
ptrace: Operation not permitted.
No stack.
The program is not being run.
zsh: IOT instruction (core dumped) HSA_OVERRIDE_GFX_VERSION=11.0.0 python bot.py
Even setting n_gpu_layers to 1 and n_ctx and n_batch to 128 still gives this error.
Final Update: I'm stupid. HIP_VISIBLE_DEVICES=1 should have been 0 not 1. My igpu is 1 somehow. my 7900 xtx is 0. how did i not see that. All working now. :D
CMAKE_ARGS="-DLLAMA_HIPBLAS=on" FORCE_CMAKE=1 pip install llama-cpp-python==0.1.78 Normal Compilation Unable to compile after AMDGPU 0.1.78 version