ggerganov / whisper.cpp

Port of OpenAI's Whisper model in C/C++
MIT License
35.4k stars 3.61k forks source link

ROCm port - compilation ok, but garbage output when run. #1453

Closed mega-ice closed 11 months ago

mega-ice commented 11 months ago

When compiling with hipBLAS support for ROCm, the test run takes very long time and produces garbage. (I have working llama.cpp / exllamav2 / etc.. ROCm implementations on the same machine. Ubuntu, ROCm 7.1, RDNA3 amd card)

I have tested both cmake and make build options. I tried the --debug-mode switch, but no log was generated. P.S. When monitoring GPU VRAM usage, it appears that only 1/3 of the model size is loaded.

run output:

./bin/main -m models/ggml-large-v2.bin -f samples/jfk.wav
whisper_init_from_file_with_params_no_state: loading model from 'models/ggml-large-v2.bin'
whisper_model_load: loading model
whisper_model_load: n_vocab       = 51865
whisper_model_load: n_audio_ctx   = 1500
whisper_model_load: n_audio_state = 1280
whisper_model_load: n_audio_head  = 20
whisper_model_load: n_audio_layer = 32
whisper_model_load: n_text_ctx    = 448
whisper_model_load: n_text_state  = 1280
whisper_model_load: n_text_head   = 20
whisper_model_load: n_text_layer  = 32
whisper_model_load: n_mels        = 80
whisper_model_load: ftype         = 1
whisper_model_load: qntvr         = 0
whisper_model_load: type          = 5 (large)
whisper_model_load: adding 1608 extra tokens
whisper_model_load: n_langs       = 99
whisper_model_load: model ctx     = 2951.27 MB
ggml_init_cublas: GGML_CUDA_FORCE_MMQ:   no
ggml_init_cublas: CUDA_USE_TENSOR_CORES: yes
ggml_init_cublas: found 1 ROCm devices:
  Device 0: AMD Radeon RX 7800 XT, compute capability 11.0
whisper_model_load: model size    = 2950.66 MB
whisper_init_state: kv self size  =   70.00 MB
whisper_init_state: kv cross size =  234.38 MB
whisper_init_state: compute buffer (conv)   =   40.47 MB
whisper_init_state: compute buffer (encode) =  202.52 MB
whisper_init_state: compute buffer (cross)  =    8.89 MB
whisper_init_state: compute buffer (decode) =   59.40 MB

system_info: n_threads = 4 / 16 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | METAL = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | COREML = 0 | OPENVINO = 0 |

main: processing 'samples/jfk.wav' (176000 samples, 11.0 sec), 4 threads, 1 processors, lang = en, task = transcribe, timestamps = 1 ...

[00:00:00.980 --> 00:00:30.000]     later  ... f  as-  awe lo .  ... amP go cl g    ... l 'm   -  acer   Pittsburgh   l

whisper_print_timings:     load time =  3813.30 ms
whisper_print_timings:     fallbacks =   2 p /   0 h
whisper_print_timings:      mel time =    14.47 ms
whisper_print_timings:   sample time =   597.67 ms /  1100 runs (    0.54 ms per run)
whisper_print_timings:   encode time =  3796.02 ms /     1 runs ( 3796.02 ms per run)
whisper_print_timings:   decode time = 73741.14 ms /  1095 runs (   67.34 ms per run)
whisper_print_timings:   prompt time =   465.94 ms /     3 runs (  155.31 ms per run)
whisper_print_timings:    total time = 82719.89 ms

compilation output:

Details /ai/whisper_project/whisper.cpp$ CC=/opt/rocm/llvm/bin/clang CXX=/opt/rocm/llvm/bin/clang++ cmake -DWHISPER_HIPBLAS=ON -- hip::amdhip64 is SHARED_LIBRARY -- hip::amdhip64 is SHARED_LIBRARY -- hip::amdhip64 is SHARED_LIBRARY -- HIP and hipBLAS found -- CMAKE_SYSTEM_PROCESSOR: x86_64 -- x86 detected -- Configuring done -- Generating done -- Build files have been written to: /home/ice/ai/whisper_project/whisper.cpp ice@ubuntu:~/ai/whisper_project/whisper.cpp$ cmake --build . [ 6%] Building CXX object CMakeFiles/ggml-rocm.dir/ggml-cuda.cu.o /home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:242:41: warning: cast from 'const signed char *' to 'unsigned short *' drops const qualifier [-Wcast-qual] const uint16_t * x16 = (uint16_t *) (x8 + sizeof(int) * i32); // assume at least 2 byte alignment ^ /home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:252:41: warning: cast from 'const unsigned char *' to 'unsigned short *' drops const qualifier [-Wcast-qual] const uint16_t * x16 = (uint16_t *) (x8 + sizeof(int) * i32); // assume at least 2 byte alignment ^ /home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:262:22: warning: cast from 'const signed char *' to 'int *' drops const qualifier [-Wcast-qual] return *((int *) (x8 + sizeof(int) * i32)); // assume at least 4 byte alignment ^ /home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:266:22: warning: cast from 'const unsigned char *' to 'int *' drops const qualifier [-Wcast-qual] return *((int *) (x8 + sizeof(int) * i32)); // assume at least 4 byte alignment ^ /home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:474:75: warning: suggest braces around initialization of subobject [-Wmissing-braces] static cudaStream_t g_cudaStreams[GGML_CUDA_MAX_DEVICES][MAX_STREAMS] = { nullptr }; ^~~~~~~ { } /home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:2235:116: warning: unused parameter 'x_qh' [-Wunused-parameter] template static __device__ __forceinline__ void allocate_tiles_q4_0(int ** x_ql, half2 ** x_dm, int ** x_qh, int ** x_sc) { ^ /home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:2235:129: warning: unused parameter 'x_sc' [-Wunused-parameter] template static __device__ __forceinline__ void allocate_tiles_q4_0(int ** x_ql, half2 ** x_dm, int ** x_qh, int ** x_sc) { ^ /home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:2256:45: warning: cast from 'const void *' to 'block_q4_0 *' drops const qualifier [-Wcast-qual] const block_q4_0 * bx0 = (block_q4_0 *) vx; ^ /home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:2245:106: warning: unused parameter 'x_qh' [-Wunused-parameter] const void * __restrict__ vx, int * __restrict__ x_ql, half2 * __restrict__ x_dm, int * __restrict__ x_qh, ^ /home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:2246:24: warning: unused parameter 'x_sc' [-Wunused-parameter] int * __restrict__ x_sc, const int & i_offset, const int & i_max, const int & k, const int & blocks_per_row) { ^ /home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:2296:37: warning: cast from 'const __half2 *' to 'float *' drops const qualifier [-Wcast-qual] const float * x_dmf = (float *) x_dm; ^ /home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:2292:94: warning: unused parameter 'x_qh' [-Wunused-parameter] const int * __restrict__ x_ql, const half2 * __restrict__ x_dm, const int * __restrict__ x_qh, const int * __restrict__ x_sc, ^ /home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:2292:125: warning: unused parameter 'x_sc' [-Wunused-parameter] const int * __restrict__ x_ql, const half2 * __restrict__ x_dm, const int * __restrict__ x_qh, const int * __restrict__ x_sc, ^ /home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:2329:116: warning: unused parameter 'x_qh' [-Wunused-parameter] template static __device__ __forceinline__ void allocate_tiles_q4_1(int ** x_ql, half2 ** x_dm, int ** x_qh, int ** x_sc) { ^ /home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:2329:129: warning: unused parameter 'x_sc' [-Wunused-parameter] template static __device__ __forceinline__ void allocate_tiles_q4_1(int ** x_ql, half2 ** x_dm, int ** x_qh, int ** x_sc) { ^ /home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:2350:45: warning: cast from 'const void *' to 'block_q4_1 *' drops const qualifier [-Wcast-qual] const block_q4_1 * bx0 = (block_q4_1 *) vx; ^ /home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:2339:106: warning: unused parameter 'x_qh' [-Wunused-parameter] const void * __restrict__ vx, int * __restrict__ x_ql, half2 * __restrict__ x_dm, int * __restrict__ x_qh, ^ /home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:2340:24: warning: unused parameter 'x_sc' [-Wunused-parameter] int * __restrict__ x_sc, const int & i_offset, const int & i_max, const int & k, const int & blocks_per_row) { ^ /home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:2383:94: warning: unused parameter 'x_qh' [-Wunused-parameter] const int * __restrict__ x_ql, const half2 * __restrict__ x_dm, const int * __restrict__ x_qh, const int * __restrict__ x_sc, ^ /home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:2383:125: warning: unused parameter 'x_sc' [-Wunused-parameter] const int * __restrict__ x_ql, const half2 * __restrict__ x_dm, const int * __restrict__ x_qh, const int * __restrict__ x_sc, ^ /home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:2421:116: warning: unused parameter 'x_qh' [-Wunused-parameter] template static __device__ __forceinline__ void allocate_tiles_q5_0(int ** x_ql, half2 ** x_dm, int ** x_qh, int ** x_sc) { ^ /home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:2421:129: warning: unused parameter 'x_sc' [-Wunused-parameter] template static __device__ __forceinline__ void allocate_tiles_q5_0(int ** x_ql, half2 ** x_dm, int ** x_qh, int ** x_sc) { ^ /home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:2442:45: warning: cast from 'const void *' to 'block_q5_0 *' drops const qualifier [-Wcast-qual] const block_q5_0 * bx0 = (block_q5_0 *) vx; ^ /home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:2431:106: warning: unused parameter 'x_qh' [-Wunused-parameter] const void * __restrict__ vx, int * __restrict__ x_ql, half2 * __restrict__ x_dm, int * __restrict__ x_qh, ^ /home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:2432:24: warning: unused parameter 'x_sc' [-Wunused-parameter] int * __restrict__ x_sc, const int & i_offset, const int & i_max, const int & k, const int & blocks_per_row) { ^ /home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:2495:94: warning: unused parameter 'x_qh' [-Wunused-parameter] const int * __restrict__ x_ql, const half2 * __restrict__ x_dm, const int * __restrict__ x_qh, const int * __restrict__ x_sc, ^ /home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:2495:125: warning: unused parameter 'x_sc' [-Wunused-parameter] const int * __restrict__ x_ql, const half2 * __restrict__ x_dm, const int * __restrict__ x_qh, const int * __restrict__ x_sc, ^ /home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:2535:116: warning: unused parameter 'x_qh' [-Wunused-parameter] template static __device__ __forceinline__ void allocate_tiles_q5_1(int ** x_ql, half2 ** x_dm, int ** x_qh, int ** x_sc) { ^ /home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:2535:129: warning: unused parameter 'x_sc' [-Wunused-parameter] template static __device__ __forceinline__ void allocate_tiles_q5_1(int ** x_ql, half2 ** x_dm, int ** x_qh, int ** x_sc) { ^ /home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:2556:45: warning: cast from 'const void *' to 'block_q5_1 *' drops const qualifier [-Wcast-qual] const block_q5_1 * bx0 = (block_q5_1 *) vx; ^ ... redacted ... /home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:3626:9: note: in instantiation of function template specialization 'load_tiles_q4_0<64, 8, true>' requested here load_tiles_q4_0, VDR_Q4_0_Q8_1_MMQ, vec_dot_q4_0_q8_1_mul_mat> ^ /home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:5169:9: note: in instantiation of function template specialization 'mul_mat_q4_0' requested here mul_mat_q4_0<<>> ^ /home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:3506:49: warning: suggest braces around initialization of subobject [-Wmissing-braces] float sum[mmq_y/WARP_SIZE][mmq_x/nwarps] = {0.0f}; ^~~~ { } /home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:3625:5: note: in instantiation of function template specialization 'mul_mat_q<32, 2, 4, true, block_q4_0, 64, 64, 8, &allocate_tiles_q4_0, &load_tiles_q4_0, 4, &vec_dot_q4_0_q8_1_mul_mat>' requested here mul_mat_q, ^ /home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:5169:9: note: in instantiation of function template specialization 'mul_mat_q4_0' requested here mul_mat_q4_0<<>> ^ /home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:2350:45: warning: cast from 'const void *' to 'block_q4_1 *' drops const qualifier [-Wcast-qual] const block_q4_1 * bx0 = (block_q4_1 *) vx; ^ /home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:3695:9: note: in instantiation of function template specialization 'load_tiles_q4_1<64, 8, false>' requested here load_tiles_q4_1, VDR_Q4_1_Q8_1_MMQ, vec_dot_q4_1_q8_1_mul_mat> ^ /home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:5210:9: note: in instantiation of function template specialization 'mul_mat_q4_1' requested here mul_mat_q4_1<<>> ^ /home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:3506:49: warning: suggest braces around initialization of subobject [-Wmissing-braces] float sum[mmq_y/WARP_SIZE][mmq_x/nwarps] = {0.0f}; ^~~~ { } /home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:3694:5: note: in instantiation of function template specialization 'mul_mat_q<32, 2, 4, true, block_q4_1, 64, 64, 8, &allocate_tiles_q4_1, &load_tiles_q4_1, 4, &vec_dot_q4_1_q8_1_mul_mat>' requested here mul_mat_q, ^ /home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:5210:9: note: in instantiation of function template specialization 'mul_mat_q4_1' requested here mul_mat_q4_1<<>> ^ /home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:2350:45: warning: cast from 'const void *' to 'block_q4_1 *' drops const qualifier [-Wcast-qual] const block_q4_1 * bx0 = (block_q4_1 *) vx; ^ /home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:3695:9: note: in instantiation of function template specialization 'load_tiles_q4_1<64, 8, true>' requested here load_tiles_q4_1, VDR_Q4_1_Q8_1_MMQ, vec_dot_q4_1_q8_1_mul_mat> ^ /home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:5214:9: note: in instantiation of function template specialization 'mul_mat_q4_1' requested here mul_mat_q4_1<<>> ^ /home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:3506:49: warning: suggest braces around initialization of subobject [-Wmissing-braces] float sum[mmq_y/WARP_SIZE][mmq_x/nwarps] = {0.0f}; ^~~~ { } /home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:3694:5: note: in instantiation of function template specialization 'mul_mat_q<32, 2, 4, true, block_q4_1, 64, 64, 8, &allocate_tiles_q4_1, &load_tiles_q4_1, 4, &vec_dot_q4_1_q8_1_mul_mat>' requested here mul_mat_q, ^ /home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:5214:9: note: in instantiation of function template specialization 'mul_mat_q4_1' requested here mul_mat_q4_1<<>> ^ /home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:2442:45: warning: cast from 'const void *' to 'block_q5_0 *' drops const qualifier [-Wcast-qual] const block_q5_0 * bx0 = (block_q5_0 *) vx; ^ /home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:3762:9: note: in instantiation of function template specialization 'load_tiles_q5_0<64, 8, false>' requested here load_tiles_q5_0, VDR_Q5_0_Q8_1_MMQ, vec_dot_q5_0_q8_1_mul_mat> ^ /home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:5255:9: note: in instantiation of function template specialization 'mul_mat_q5_0' requested here mul_mat_q5_0<<>> ^ /home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:3506:49: warning: suggest braces around initialization of subobject [-Wmissing-braces] float sum[mmq_y/WARP_SIZE][mmq_x/nwarps] = {0.0f}; ^~~~ { } /home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:3761:5: note: in instantiation of function template specialization 'mul_mat_q<32, 2, 4, false, block_q5_0, 64, 64, 8, &allocate_tiles_q5_0, &load_tiles_q5_0, 4, &vec_dot_q5_0_q8_1_mul_mat>' requested here mul_mat_q, ^ /home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:5255:9: note: in instantiation of function template specialization 'mul_mat_q5_0' requested here mul_mat_q5_0<<>> ^ /home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:2442:45: warning: cast from 'const void *' to 'block_q5_0 *' drops const qualifier [-Wcast-qual] const block_q5_0 * bx0 = (block_q5_0 *) vx; ^ /home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:3762:9: note: in instantiation of function template specialization 'load_tiles_q5_0<64, 8, true>' requested here load_tiles_q5_0, VDR_Q5_0_Q8_1_MMQ, vec_dot_q5_0_q8_1_mul_mat> ^ /home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:5259:9: note: in instantiation of function template specialization 'mul_mat_q5_0' requested here mul_mat_q5_0<<>> ^ /home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:3506:49: warning: suggest braces around initialization of subobject [-Wmissing-braces] float sum[mmq_y/WARP_SIZE][mmq_x/nwarps] = {0.0f}; ^~~~ { } /home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:3761:5: note: in instantiation of function template specialization 'mul_mat_q<32, 2, 4, false, block_q5_0, 64, 64, 8, &allocate_tiles_q5_0, &load_tiles_q5_0, 4, &vec_dot_q5_0_q8_1_mul_mat>' requested here mul_mat_q, ^ /home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:5259:9: note: in instantiation of function template specialization 'mul_mat_q5_0' requested here mul_mat_q5_0<<>> ^ /home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:2556:45: warning: cast from 'const void *' to 'block_q5_1 *' drops const qualifier [-Wcast-qual] const block_q5_1 * bx0 = (block_q5_1 *) vx; ^ /home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:3829:9: note: in instantiation of function template specialization 'load_tiles_q5_1<64, 8, false>' requested here load_tiles_q5_1, VDR_Q5_1_Q8_1_MMQ, vec_dot_q5_1_q8_1_mul_mat> ^ /home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:5300:9: note: in instantiation of function template specialization 'mul_mat_q5_1' requested here mul_mat_q5_1<<>> ^ /home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:3506:49: warning: suggest braces around initialization of subobject [-Wmissing-braces] float sum[mmq_y/WARP_SIZE][mmq_x/nwarps] = {0.0f}; ^~~~ { } /home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:3828:5: note: in instantiation of function template specialization 'mul_mat_q<32, 2, 4, true, block_q5_1, 64, 64, 8, &allocate_tiles_q5_1, &load_tiles_q5_1, 4, &vec_dot_q5_1_q8_1_mul_mat>' requested here mul_mat_q, ^ /home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:5300:9: note: in instantiation of function template specialization 'mul_mat_q5_1' requested here mul_mat_q5_1<<>> ^ /home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:2556:45: warning: cast from 'const void *' to 'block_q5_1 *' drops const qualifier [-Wcast-qual] const block_q5_1 * bx0 = (block_q5_1 *) vx; ^ /home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:3829:9: note: in instantiation of function template specialization 'load_tiles_q5_1<64, 8, true>' requested here load_tiles_q5_1, VDR_Q5_1_Q8_1_MMQ, vec_dot_q5_1_q8_1_mul_mat> ^ /home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:5304:9: note: in instantiation of function template specialization 'mul_mat_q5_1' requested here mul_mat_q5_1<<>> ^ /home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:3506:49: warning: suggest braces around initialization of subobject [-Wmissing-braces] float sum[mmq_y/WARP_SIZE][mmq_x/nwarps] = {0.0f}; ^~~~ { } /home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:3828:5: note: in instantiation of function template specialization 'mul_mat_q<32, 2, 4, true, block_q5_1, 64, 64, 8, &allocate_tiles_q5_1, &load_tiles_q5_1, 4, &vec_dot_q5_1_q8_1_mul_mat>' requested here mul_mat_q, ^ /home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:5304:9: note: in instantiation of function template specialization 'mul_mat_q5_1' requested here mul_mat_q5_1<<>> ^ /home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:2663:45: warning: cast from 'const void *' to 'block_q8_0 *' drops const qualifier [-Wcast-qual] const block_q8_0 * bx0 = (block_q8_0 *) vx; ^ /home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:3896:9: note: in instantiation of function template specialization 'load_tiles_q8_0<64, 8, false>' requested here load_tiles_q8_0, VDR_Q8_0_Q8_1_MMQ, vec_dot_q8_0_q8_1_mul_mat> ^ /home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:5345:9: note: in instantiation of function template specialization 'mul_mat_q8_0' requested here mul_mat_q8_0<<>> ^ /home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:3506:49: warning: suggest braces around initialization of subobject [-Wmissing-braces] float sum[mmq_y/WARP_SIZE][mmq_x/nwarps] = {0.0f}; ^~~~ { } /home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:3895:5: note: in instantiation of function template specialization 'mul_mat_q<32, 1, 8, false, block_q8_0, 64, 64, 8, &allocate_tiles_q8_0, &load_tiles_q8_0, 8, &vec_dot_q8_0_q8_1_mul_mat>' requested here mul_mat_q, ^ /home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:5345:9: note: in instantiation of function template specialization 'mul_mat_q8_0' requested here mul_mat_q8_0<<>> ^ /home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:2663:45: warning: cast from 'const void *' to 'block_q8_0 *' drops const qualifier [-Wcast-qual] const block_q8_0 * bx0 = (block_q8_0 *) vx; ^ /home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:3896:9: note: in instantiation of function template specialization 'load_tiles_q8_0<64, 8, true>' requested here load_tiles_q8_0, VDR_Q8_0_Q8_1_MMQ, vec_dot_q8_0_q8_1_mul_mat> ^ /home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:5349:9: note: in instantiation of function template specialization 'mul_mat_q8_0' requested here mul_mat_q8_0<<>> ^ /home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:3506:49: warning: suggest braces around initialization of subobject [-Wmissing-braces] float sum[mmq_y/WARP_SIZE][mmq_x/nwarps] = {0.0f}; ^~~~ { } /home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:3895:5: note: in instantiation of function template specialization 'mul_mat_q<32, 1, 8, false, block_q8_0, 64, 64, 8, &allocate_tiles_q8_0, &load_tiles_q8_0, 8, &vec_dot_q8_0_q8_1_mul_mat>' requested here mul_mat_q, ^ /home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:5349:9: note: in instantiation of function template specialization 'mul_mat_q8_0' requested here mul_mat_q8_0<<>> ^ /home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:2753:45: warning: cast from 'const void *' to 'block_q2_K *' drops const qualifier [-Wcast-qual] const block_q2_K * bx0 = (block_q2_K *) vx; ^ /home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:3963:9: note: in instantiation of function template specialization 'load_tiles_q2_K<32, 8, false>' requested here load_tiles_q2_K, VDR_Q2_K_Q8_1_MMQ, vec_dot_q2_K_q8_1_mul_mat> ^ /home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:5390:9: note: in instantiation of function template specialization 'mul_mat_q2_K' requested here mul_mat_q2_K<<>> ^ /home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:3506:49: warning: suggest braces around initialization of subobject [-Wmissing-braces] float sum[mmq_y/WARP_SIZE][mmq_x/nwarps] = {0.0f}; ^~~~ { } /home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:3962:5: note: in instantiation of function template specialization 'mul_mat_q<256, 4, 16, false, block_q2_K, 128, 32, 8, &allocate_tiles_q2_K, &load_tiles_q2_K, 2, &vec_dot_q2_K_q8_1_mul_mat>' requested here mul_mat_q, ^ /home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:5390:9: note: in instantiation of function template specialization 'mul_mat_q2_K' requested here mul_mat_q2_K<<>> ^ /home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:2753:45: warning: cast from 'const void *' to 'block_q2_K *' drops const qualifier [-Wcast-qual] const block_q2_K * bx0 = (block_q2_K *) vx; ^ /home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:3963:9: note: in instantiation of function template specialization 'load_tiles_q2_K<32, 8, true>' requested here load_tiles_q2_K, VDR_Q2_K_Q8_1_MMQ, vec_dot_q2_K_q8_1_mul_mat> ^ /home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:5394:9: note: in instantiation of function template specialization 'mul_mat_q2_K' requested here mul_mat_q2_K<<>> ^ /home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:3506:49: warning: suggest braces around initialization of subobject [-Wmissing-braces] float sum[mmq_y/WARP_SIZE][mmq_x/nwarps] = {0.0f}; ^~~~ { } /home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:3962:5: note: in instantiation of function template specialization 'mul_mat_q<256, 4, 16, false, block_q2_K, 128, 32, 8, &allocate_tiles_q2_K, &load_tiles_q2_K, 2, &vec_dot_q2_K_q8_1_mul_mat>' requested here mul_mat_q, ^ /home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:5394:9: note: in instantiation of function template specialization 'mul_mat_q2_K' requested here mul_mat_q2_K<<>> ^ /home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:2874:45: warning: cast from 'const void *' to 'block_q3_K *' drops const qualifier [-Wcast-qual] const block_q3_K * bx0 = (block_q3_K *) vx; ^ /home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:4032:9: note: in instantiation of function template specialization 'load_tiles_q3_K<128, 8, false>' requested here load_tiles_q3_K, VDR_Q3_K_Q8_1_MMQ, vec_dot_q3_K_q8_1_mul_mat> ^ /home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:5437:9: note: in instantiation of function template specialization 'mul_mat_q3_K' requested here mul_mat_q3_K<<>> ^ /home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:3506:49: warning: suggest braces around initialization of subobject [-Wmissing-braces] float sum[mmq_y/WARP_SIZE][mmq_x/nwarps] = {0.0f}; ^~~~ { } /home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:4031:5: note: in instantiation of function template specialization 'mul_mat_q<256, 4, 16, false, block_q3_K, 32, 128, 8, &allocate_tiles_q3_K, &load_tiles_q3_K, 2, &vec_dot_q3_K_q8_1_mul_mat>' requested here mul_mat_q, ^ /home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:5437:9: note: in instantiation of function template specialization 'mul_mat_q3_K' requested here mul_mat_q3_K<<>> ^ /home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:2874:45: warning: cast from 'const void *' to 'block_q3_K *' drops const qualifier [-Wcast-qual] const block_q3_K * bx0 = (block_q3_K *) vx; ^ /home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:4032:9: note: in instantiation of function template specialization 'load_tiles_q3_K<128, 8, true>' requested here load_tiles_q3_K, VDR_Q3_K_Q8_1_MMQ, vec_dot_q3_K_q8_1_mul_mat> ^ /home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:5441:9: note: in instantiation of function template specialization 'mul_mat_q3_K' requested here mul_mat_q3_K<<>> ^ /home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:3506:49: warning: suggest braces around initialization of subobject [-Wmissing-braces] float sum[mmq_y/WARP_SIZE][mmq_x/nwarps] = {0.0f}; ^~~~ { } /home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:4031:5: note: in instantiation of function template specialization 'mul_mat_q<256, 4, 16, false, block_q3_K, 32, 128, 8, &allocate_tiles_q3_K, &load_tiles_q3_K, 2, &vec_dot_q3_K_q8_1_mul_mat>' requested here mul_mat_q, ^ /home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:5441:9: note: in instantiation of function template specialization 'mul_mat_q3_K' requested here mul_mat_q3_K<<>> ^ /home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:3092:45: warning: cast from 'const void *' to 'block_q4_K *' drops const qualifier [-Wcast-qual] const block_q4_K * bx0 = (block_q4_K *) vx; ^ /home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:4101:9: note: in instantiation of function template specialization 'load_tiles_q4_K<64, 8, false>' requested here load_tiles_q4_K, VDR_Q4_K_Q8_1_MMQ, vec_dot_q4_K_q8_1_mul_mat> ^ /home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:5483:9: note: in instantiation of function template specialization 'mul_mat_q4_K' requested here mul_mat_q4_K<<>> ^ /home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:3137:38: warning: cast from 'const unsigned char *' to 'int *' drops const qualifier [-Wcast-qual] const int * scales = (int *) bxi->scales; ^ /home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:3506:49: warning: suggest braces around initialization of subobject [-Wmissing-braces] float sum[mmq_y/WARP_SIZE][mmq_x/nwarps] = {0.0f}; ^~~~ { } /home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:4100:5: note: in instantiation of function template specialization 'mul_mat_q<256, 2, 32, true, block_q4_K, 32, 64, 8, &allocate_tiles_q4_K, &load_tiles_q4_K, 8, &vec_dot_q4_K_q8_1_mul_mat>' requested here mul_mat_q, ^ /home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:5483:9: note: in instantiation of function template specialization 'mul_mat_q4_K' requested here mul_mat_q4_K<<>> ^ /home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:3092:45: warning: cast from 'const void *' to 'block_q4_K *' drops const qualifier [-Wcast-qual] const block_q4_K * bx0 = (block_q4_K *) vx; ^ /home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:4101:9: note: in instantiation of function template specialization 'load_tiles_q4_K<64, 8, true>' requested here load_tiles_q4_K, VDR_Q4_K_Q8_1_MMQ, vec_dot_q4_K_q8_1_mul_mat> ^ /home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:5487:9: note: in instantiation of function template specialization 'mul_mat_q4_K' requested here mul_mat_q4_K<<>> ^ /home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:3137:38: warning: cast from 'const unsigned char *' to 'int *' drops const qualifier [-Wcast-qual] const int * scales = (int *) bxi->scales; ^ /home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:3506:49: warning: suggest braces around initialization of subobject [-Wmissing-braces] float sum[mmq_y/WARP_SIZE][mmq_x/nwarps] = {0.0f}; ^~~~ { } /home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:4100:5: note: in instantiation of function template specialization 'mul_mat_q<256, 2, 32, true, block_q4_K, 32, 64, 8, &allocate_tiles_q4_K, &load_tiles_q4_K, 8, &vec_dot_q4_K_q8_1_mul_mat>' requested here mul_mat_q, ^ /home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:5487:9: note: in instantiation of function template specialization 'mul_mat_q4_K' requested here mul_mat_q4_K<<>> ^ /home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:3273:45: warning: cast from 'const void *' to 'block_q5_K *' drops const qualifier [-Wcast-qual] const block_q5_K * bx0 = (block_q5_K *) vx; ^ /home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:4168:9: note: in instantiation of function template specialization 'load_tiles_q5_K<64, 8, false>' requested here load_tiles_q5_K, VDR_Q5_K_Q8_1_MMQ, vec_dot_q5_K_q8_1_mul_mat> ^ /home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:5528:9: note: in instantiation of function template specialization 'mul_mat_q5_K' requested here mul_mat_q5_K<<>> ^ /home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:3329:38: warning: cast from 'const unsigned char *' to 'int *' drops const qualifier [-Wcast-qual] const int * scales = (int *) bxi->scales; ^ /home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:3506:49: warning: suggest braces around initialization of subobject [-Wmissing-braces] float sum[mmq_y/WARP_SIZE][mmq_x/nwarps] = {0.0f}; ^~~~ { } /home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:4167:5: note: in instantiation of function template specialization 'mul_mat_q<256, 2, 32, true, block_q5_K, 32, 64, 8, &allocate_tiles_q5_K, &load_tiles_q5_K, 8, &vec_dot_q5_K_q8_1_mul_mat>' requested here mul_mat_q, ^ /home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:5528:9: note: in instantiation of function template specialization 'mul_mat_q5_K' requested here mul_mat_q5_K<<>> ^ /home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:3273:45: warning: cast from 'const void *' to 'block_q5_K *' drops const qualifier [-Wcast-qual] const block_q5_K * bx0 = (block_q5_K *) vx; ^ /home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:4168:9: note: in instantiation of function template specialization 'load_tiles_q5_K<64, 8, true>' requested here load_tiles_q5_K, VDR_Q5_K_Q8_1_MMQ, vec_dot_q5_K_q8_1_mul_mat> ^ /home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:5532:9: note: in instantiation of function template specialization 'mul_mat_q5_K' requested here mul_mat_q5_K<<>> ^ /home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:3329:38: warning: cast from 'const unsigned char *' to 'int *' drops const qualifier [-Wcast-qual] const int * scales = (int *) bxi->scales; ^ /home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:3506:49: warning: suggest braces around initialization of subobject [-Wmissing-braces] float sum[mmq_y/WARP_SIZE][mmq_x/nwarps] = {0.0f}; ^~~~ { } /home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:4167:5: note: in instantiation of function template specialization 'mul_mat_q<256, 2, 32, true, block_q5_K, 32, 64, 8, &allocate_tiles_q5_K, &load_tiles_q5_K, 8, &vec_dot_q5_K_q8_1_mul_mat>' requested here mul_mat_q, ^ /home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:5532:9: note: in instantiation of function template specialization 'mul_mat_q5_K' requested here mul_mat_q5_K<<>> ^ /home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:3402:45: warning: cast from 'const void *' to 'block_q6_K *' drops const qualifier [-Wcast-qual] const block_q6_K * bx0 = (block_q6_K *) vx; ^ /home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:4237:9: note: in instantiation of function template specialization 'load_tiles_q6_K<64, 8, false>' requested here load_tiles_q6_K, VDR_Q6_K_Q8_1_MMQ, vec_dot_q6_K_q8_1_mul_mat> ^ /home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:5573:9: note: in instantiation of function template specialization 'mul_mat_q6_K' requested here mul_mat_q6_K<<>> ^ /home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:3506:49: warning: suggest braces around initialization of subobject [-Wmissing-braces] float sum[mmq_y/WARP_SIZE][mmq_x/nwarps] = {0.0f}; ^~~~ { } /home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:4236:5: note: in instantiation of function template specialization 'mul_mat_q<256, 2, 32, false, block_q6_K, 32, 64, 8, &allocate_tiles_q6_K, &load_tiles_q6_K, 8, &vec_dot_q6_K_q8_1_mul_mat>' requested here mul_mat_q, ^ /home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:5573:9: note: in instantiation of function template specialization 'mul_mat_q6_K' requested here mul_mat_q6_K<<>> ^ /home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:3402:45: warning: cast from 'const void *' to 'block_q6_K *' drops const qualifier [-Wcast-qual] const block_q6_K * bx0 = (block_q6_K *) vx; ^ /home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:4237:9: note: in instantiation of function template specialization 'load_tiles_q6_K<64, 8, true>' requested here load_tiles_q6_K, VDR_Q6_K_Q8_1_MMQ, vec_dot_q6_K_q8_1_mul_mat> ^ /home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:5577:9: note: in instantiation of function template specialization 'mul_mat_q6_K' requested here mul_mat_q6_K<<>> ^ /home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:3506:49: warning: suggest braces around initialization of subobject [-Wmissing-braces] float sum[mmq_y/WARP_SIZE][mmq_x/nwarps] = {0.0f}; ^~~~ { } /home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:4236:5: note: in instantiation of function template specialization 'mul_mat_q<256, 2, 32, false, block_q6_K, 32, 64, 8, &allocate_tiles_q6_K, &load_tiles_q6_K, 8, &vec_dot_q6_K_q8_1_mul_mat>' requested here mul_mat_q, ^ /home/ice/ai/whisper_project/whisper.cpp/ggml-cuda.cu:5577:9: note: in instantiation of function template specialization 'mul_mat_q6_K' requested here mul_mat_q6_K<<>> ^ 107 warnings generated when compiling for host. [ 6%] Built target ggml-rocm [ 12%] Building C object CMakeFiles/whisper.dir/ggml.c.o [ 18%] Building C object CMakeFiles/whisper.dir/ggml-alloc.c.o [ 25%] Building C object CMakeFiles/whisper.dir/ggml-backend.c.o [ 31%] Building C object CMakeFiles/whisper.dir/ggml-quants.c.o [ 37%] Building CXX object CMakeFiles/whisper.dir/whisper.cpp.o /home/ice/ai/whisper_project/whisper.cpp/whisper.cpp:161:29: warning: unused function 'ggml_mul_mat_pad' [-Wunused-function] static struct ggml_tensor * ggml_mul_mat_pad(struct ggml_context * ctx, struct ggml_tensor * x, struct ggml_tensor * y, int pad = 32) { ^ 1 warning generated. [ 43%] Linking CXX shared library libwhisper.so [ 43%] Built target whisper [ 50%] Building CXX object examples/CMakeFiles/common.dir/common.cpp.o [ 56%] Building CXX object examples/CMakeFiles/common.dir/common-ggml.cpp.o [ 62%] Linking CXX static library libcommon.a [ 62%] Built target common [ 68%] Building CXX object examples/main/CMakeFiles/main.dir/main.cpp.o [ 75%] Linking CXX executable ../../bin/main [ 75%] Built target main [ 81%] Building CXX object examples/bench/CMakeFiles/bench.dir/bench.cpp.o [ 87%] Linking CXX executable ../../bin/bench [ 87%] Built target bench [ 93%] Building CXX object examples/quantize/CMakeFiles/quantize.dir/quantize.cpp.o [100%] Linking CXX executable ../../bin/quantize [100%] Built target quantize
mkiol commented 11 months ago

I can confirm that on RX6950XT it works fine. I use ROCm 5.6.

CMake by default generates compilation for the following architectures:

You can add more with CMAKE_CXX_FLAGS, for instance:

cmake -DWHISPER_HIPBLAS=ON -DCMAKE_CXX_FLAGS="--offload-arch=gfx1100 --offload-arch=gfx1102 --offload-arch=gfx1103"

Maybe the problem is in wrong GPU architecture?

ardfork commented 11 months ago

Tested latest commit (953419c) both with Makefile and CMake on my machine with the same sample and got correct result.

As for what mkiol mentioned, Makefile should use your correct arch. CMake defaults to multiple arch, not sure if AMD included gfx1100 in current ROCm version. I Believe that the best way to force it is to add -DAMDGPU_TARGETS='gfx1100'. You can check for which ISA you compiled with roc-obj-ls libwhisper.so.

mega-ice commented 11 months ago

I did a clean compile again and everything works even without the gpu architecture flag specification (which is what I was using originally). Most likely it was a glitch in my system, because I used different implementations of LLMs before compiling whisper.cpp. So let this thread be a warning for people who don't try to restart when the program behaves strangely. :-$

but anyway, thanks for the ideas! :)

ccbadd commented 4 months ago

What is the command line to compile for rocm using make?

mega-ice commented 4 months ago

What is the command line to compile for rocm using make?

A lot of code has changed since the end of last year. Compiling with make is no longer possible (not all targets have been defined). As for cmake, I am unable to compile for HIPBLAS due to the cuda flash attention code.