Closed mega-ice closed 11 months ago
I can confirm that on RX6950XT it works fine. I use ROCm 5.6.
CMake by default generates compilation for the following architectures:
You can add more with CMAKE_CXX_FLAGS, for instance:
cmake -DWHISPER_HIPBLAS=ON -DCMAKE_CXX_FLAGS="--offload-arch=gfx1100 --offload-arch=gfx1102 --offload-arch=gfx1103"
Maybe the problem is in wrong GPU architecture?
Tested latest commit (953419c) both with Makefile and CMake on my machine with the same sample and got correct result.
As for what mkiol mentioned, Makefile should use your correct arch. CMake defaults to multiple arch, not sure if AMD included gfx1100 in current ROCm version. I Believe that the best way to force it is to add -DAMDGPU_TARGETS='gfx1100'
. You can check for which ISA you compiled with roc-obj-ls libwhisper.so
.
I did a clean compile again and everything works even without the gpu architecture flag specification (which is what I was using originally). Most likely it was a glitch in my system, because I used different implementations of LLMs before compiling whisper.cpp. So let this thread be a warning for people who don't try to restart when the program behaves strangely. :-$
but anyway, thanks for the ideas! :)
What is the command line to compile for rocm using make?
What is the command line to compile for rocm using make?
A lot of code has changed since the end of last year. Compiling with make is no longer possible (not all targets have been defined). As for cmake, I am unable to compile for HIPBLAS due to the cuda flash attention code.
When compiling with hipBLAS support for ROCm, the test run takes very long time and produces garbage. (I have working llama.cpp / exllamav2 / etc.. ROCm implementations on the same machine. Ubuntu, ROCm 7.1, RDNA3 amd card)
I have tested both cmake and make build options. I tried the --debug-mode switch, but no log was generated. P.S. When monitoring GPU VRAM usage, it appears that only 1/3 of the model size is loaded.
run output:
compilation output:
Details
/ai/whisper_project/whisper.cpp$ CC=/opt/rocm/llvm/bin/clang CXX=/opt/rocm/llvm/bin/clang++ cmake -DWHISPER_HIPBLAS=ON -- hip::amdhip64 is SHARED_LIBRARY -- hip::amdhip64 is SHARED_LIBRARY -- hip::amdhip64 is SHARED_LIBRARY -- HIP and hipBLAS found -- CMAKE_SYSTEM_PROCESSOR: x86_64 -- x86 detected -- Configuring done -- Generating done -- Build files have been written to: /home/ice/ai/whisper_project/whisper.cpp ice@ubuntu:~/ai/whisper_project/whisper.cpp$ cmake --build . [ 6%] Building CXX object CMakeFiles/ggml-rocm.dir/ggml-cuda.cu.o /home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:242:41: warning: cast from 'const signed char *' to 'unsigned short *' drops const qualifier [-Wcast-qual] const uint16_t * x16 = (uint16_t *) (x8 + sizeof(int) * i32); // assume at least 2 byte alignment ^ /home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:252:41: warning: cast from 'const unsigned char *' to 'unsigned short *' drops const qualifier [-Wcast-qual] const uint16_t * x16 = (uint16_t *) (x8 + sizeof(int) * i32); // assume at least 2 byte alignment ^ /home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:262:22: warning: cast from 'const signed char *' to 'int *' drops const qualifier [-Wcast-qual] return *((int *) (x8 + sizeof(int) * i32)); // assume at least 4 byte alignment ^ /home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:266:22: warning: cast from 'const unsigned char *' to 'int *' drops const qualifier [-Wcast-qual] return *((int *) (x8 + sizeof(int) * i32)); // assume at least 4 byte alignment ^ /home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:474:75: warning: suggest braces around initialization of subobject [-Wmissing-braces] static cudaStream_t g_cudaStreams[GGML_CUDA_MAX_DEVICES][MAX_STREAMS] = { nullptr }; ^~~~~~~ { } /home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:2235:116: warning: unused parameter 'x_qh' [-Wunused-parameter] template