Open gn64 opened 3 days ago
MacBook Pro 16 inch 2019 with AMD 5500M has issue with Vulkan-loader 1.3.302 input data is Chinese, but auto language detect as af(p=0.01000) and no output test log info as following:
ggml_vulkan: Found 1 Vulkan devices: ggml_vulkan: 0 = AMD Radeon Pro 5500M (MoltenVK) | uma: 0 | fp16: 1 | warp size: 64 whisper_init_with_params_no_state: devices = 3 whisper_init_with_params_no_state: backends = 3 whisper_model_load: loading model whisper_model_load: n_vocab = 51865 whisper_model_load: n_audio_ctx = 1500 whisper_model_load: n_audio_state = 384 whisper_model_load: n_audio_head = 6 whisper_model_load: n_audio_layer = 4 whisper_model_load: n_text_ctx = 448 whisper_model_load: n_text_state = 384 whisper_model_load: n_text_head = 6 whisper_model_load: n_text_layer = 4 whisper_model_load: n_mels = 80 whisper_model_load: ftype = 1 whisper_model_load: qntvr = 0 whisper_model_load: type = 1 (tiny) whisper_model_load: adding 1608 extra tokens whisper_model_load: n_langs = 99 whisper_default_buffer_type: using device Vulkan0 (AMD Radeon Pro 5500M) ggml_vulkan: Compiling shaders..............................Done! whisper_model_load: Vulkan0 total size = 77.11 MB whisper_model_load: model size = 77.11 MB whisper_backend_init_gpu: using Vulkan0 backend whisper_backend_init: using BLAS backend whisper_init_state: kv self size = 3.15 MB whisper_init_state: kv cross size = 9.44 MB whisper_init_state: kv pad size = 2.36 MB whisper_init_state: compute buffer (conv) = 14.15 MB whisper_init_state: compute buffer (encode) = 64.79 MB whisper_init_state: compute buffer (cross) = 3.88 MB whisper_init_state: compute buffer (decode) = 96.81 MB
system_info: n_threads = 4 / 16 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | COREML = 0 | OPENVINO = 0 |
main: processing '/Volumes/Share/Streams/audio/voices/c4.wav' (160000 samples, 10.0 sec), 4 threads, 1 processors, 5 beams + best of 5, lang = auto, task = transcribe, timestamps = 1 ...
whisper_full_with_state: auto-detected language: af (p = 0.010000)
whisper_print_timings: load time = 368.12 ms whisper_print_timings: fallbacks = 5 p / 0 h whisper_print_timings: mel time = 8.73 ms whisper_print_timings: sample time = 5.37 ms / 30 runs ( 0.18 ms per run) whisper_print_timings: encode time = 218.31 ms / 2 runs ( 109.15 ms per run) whisper_print_timings: decode time = 6.32 ms / 1 runs ( 6.32 ms per run) whisper_print_timings: batchd time = 97.96 ms / 18 runs ( 5.44 ms per run) whisper_print_timings: prompt time = 0.00 ms / 1 runs ( 0.00 ms per run) whisper_print_timings: total time = 734.51 ms
BTW, if I revert to version 1.7.2 release(git: 6266a9f), then result is OK, and output right. And same code in Linux with 1080Ti vulkan backend also is correct。
The issue was fixed in my environment by modifying the line
const uint rowy = rowx % p.KY;
to
const uint rowy = (p.KY > 0) ? (rowx % p.KY) : 0;
in the void soft_max(uint num_iters) function within ggml-vulkan/vulkan-shaders/soft_max.comp.
This change prevents a division by zero error when p.KY is 0.
Environment OS: Windows 11 CPU: AMD Ryzen 7 7840u GPU: AMD Radeon 780M (iGPU) Model: ggml-tiny.bin whisper.cpp: Both latest version from main branch and 1.7.2
Issue Description When using Release build with Vulkan backend on AMD GPU, the output becomes garbled (showing timestamps with exclamation marks) and the output changes between runs. To investigate this issue, I switched to Debug build which revealed an underlying problem with probability calculations.
Steps to Reproduce First with Release build:
Output from Release build:
Then with Debug build to investigate:
Error in Debug Build
This issue appears to be related to whisper.cpp#2400. I thought it might be connected to llama.cpp#10434, so I tried applying the same fix, but it didn't improve the situation. Any suggestions on how to resolve this would be appreciated.