Open rudiservo opened 2 weeks ago
I had similar error on Archlinux (rocm 6.0.2) and RX 6700 XT and what helped for me is compiling with AMDGPU_TARGETS=gfx1030
. Looking at makefile, when AMDGPU_TARGETS
is not set, it will auto detect arch as gfx1031. However as gfx1031 is not officially supported. I have to set HSA_OVERRIDE_GFX_VERSION=10.3.0
and I guess it doesn’t like that llama.cpp was compiled for "different" GPU arch.
@Arvamer Oh... in the Dockerfile in .devops, the ENV variable that is set is GPU_TARGETS, not AMDGPU_TARGETS.
Going to try and change it, I'll report my findings.
Found the issues on the Dockerfile for Rocm. GPU_TARGETS have to be AMDGPU_TARGETS
and
ARG ROCM_DOCKER_ARCH is missing " ".
So it becomes
ARG ROCM_DOCKER_ARCH="\ gfx803 \ gfx900 \ gfx906 \ gfx908 \ gfx90a \ gfx1010 \ gfx1030 \ gfx1100 \ gfx1101 \ gfx1102"
There also needs to be 2 different types of rocm versions, Rocm5 and Rocm6.
There is a noticeable performance improvement on rocm 6.1.2.
GFX803 and GFX900 are not supported, and GFX906 is deprecated on rocm6.
Should I make a PR?
What happened?
The docker version with ROCm 5.6 exits after graph splits, I tried building and image with ROCm 5.6, 5.7.1, 6.1.2.
These last ones give me an error that is in the logs.
If I compiled and run it on Metal, it works flawlessly.
I have been trying to run it with several version for the past 7 days.
Name and Version
Latest build, always pulled from the last 7 days.
System is Pop_Os 22.04 ROCm 6.1.2 Kernel 6.9.3
What operating system are you seeing the problem on?
Linux
Relevant log output