ggerganov / whisper.cpp

Port of OpenAI's Whisper model in C/C++
MIT License
36.01k stars 3.68k forks source link

Invalid probability vector error with AMD iGPU on Vulkan backend Environment #2596

Open gn64 opened 3 days ago

gn64 commented 3 days ago

Environment OS: Windows 11 CPU: AMD Ryzen 7 7840u GPU: AMD Radeon 780M (iGPU) Model: ggml-tiny.bin whisper.cpp: Both latest version from main branch and 1.7.2

Issue Description When using Release build with Vulkan backend on AMD GPU, the output becomes garbled (showing timestamps with exclamation marks) and the output changes between runs. To investigate this issue, I switched to Debug build which revealed an underlying problem with probability calculations.

Steps to Reproduce First with Release build:

cmake -B build -DGGML_VULKAN=ON
cmake --build build --config Release
.\main.exe -m .\ggml-tiny.bin -l ja .\jfk.wav

Output from Release build:

whisper_init_from_file_with_params_no_state: loading model from '.\ggml-tiny.bin'
whisper_init_with_params_no_state: use gpu    = 1
whisper_init_with_params_no_state: flash attn = 0
whisper_init_with_params_no_state: gpu_device = 0
whisper_init_with_params_no_state: dtw        = 0
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = AMD Radeon(TM) 780M (AMD proprietary driver) | uma: 1 | fp16: 1 | warp size: 64
register_backend: registered backend Vulkan (1 devices)
register_device: registered device Vulkan0 (AMD Radeon(TM) 780M)
register_backend: registered backend CPU (1 devices)
register_device: registered device CPU (AMD Ryzen 7 7840U w/ Radeon  780M Graphics     )
whisper_init_with_params_no_state: backends   = 2
whisper_model_load: loading model
whisper_model_load: n_vocab       = 51865
whisper_model_load: n_audio_ctx   = 1500
whisper_model_load: n_audio_state = 384
whisper_model_load: n_audio_head  = 6
whisper_model_load: n_audio_layer = 4
whisper_model_load: n_text_ctx    = 448
whisper_model_load: n_text_state  = 384
whisper_model_load: n_text_head   = 6
whisper_model_load: n_text_layer  = 4
whisper_model_load: n_mels        = 80
whisper_model_load: ftype         = 1
whisper_model_load: qntvr         = 0
whisper_model_load: type          = 1 (tiny)
whisper_model_load: adding 1608 extra tokens
whisper_model_load: n_langs       = 99
ggml_vulkan: Compiling shaders.............................Done!
whisper_model_load:  Vulkan0 total size =    77.11 MB
whisper_model_load: model size    =   77.11 MB
whisper_backend_init_gpu: using Vulkan backend
whisper_init_state: kv self size  =    3.15 MB
whisper_init_state: kv cross size =    9.44 MB
whisper_init_state: kv pad  size  =    2.36 MB
ggml_backend_sched_alloc_splits: failed to allocate graph, reserving (backend_ids_changed = 1)
ggml_gallocr_reserve_n: reallocating Vulkan0 buffer from size 0.00 MiB to 11.08 MiB
ggml_gallocr_reserve_n: reallocating CPU buffer from size 0.00 MiB to 0.92 MiB
whisper_init_state: compute buffer (conv)   =   14.15 MB
ggml_gallocr_needs_realloc: graph has different number of nodes
ggml_gallocr_alloc_graph: cannot reallocate multi buffer graph automatically, call reserve
ggml_backend_sched_alloc_splits: failed to allocate graph, reserving (backend_ids_changed = 0)
ggml_gallocr_reserve_n: reallocating Vulkan0 buffer from size 0.00 MiB to 60.29 MiB
ggml_gallocr_reserve_n: reallocating CPU buffer from size 0.00 MiB to 0.00 MiB
whisper_init_state: compute buffer (encode) =   64.79 MB
ggml_gallocr_needs_realloc: graph has different number of nodes
ggml_gallocr_alloc_graph: cannot reallocate multi buffer graph automatically, call reserve
ggml_backend_sched_alloc_splits: failed to allocate graph, reserving (backend_ids_changed = 0)
ggml_gallocr_reserve_n: reallocating Vulkan0 buffer from size 0.00 MiB to 2.20 MiB
ggml_gallocr_reserve_n: reallocating CPU buffer from size 0.00 MiB to 0.00 MiB
whisper_init_state: compute buffer (cross)  =    3.88 MB
ggml_backend_sched_alloc_splits: failed to allocate graph, reserving (backend_ids_changed = 1)
ggml_gallocr_reserve_n: reallocating Vulkan0 buffer from size 0.00 MiB to 89.95 MiB
ggml_gallocr_reserve_n: reallocating CPU buffer from size 0.00 MiB to 0.88 MiB
whisper_init_state: compute buffer (decode) =   96.81 MB

system_info: n_threads = 4 / 16 | AVX = 1 | AVX2 = 1 | AVX512 = 1 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | METAL = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | CUDA = 0 | COREML = 0 | OPENVINO = 0 | CANN = 0

main: processing '.\jfk.wav' (176000 samples, 11.0 sec), 4 threads, 1 processors, 5 beams + best of 5, lang = ja, task = transcribe, timestamps = 1 ...

PS C:\Users\HidetoshiMATSUO\Desktop\whisper.cpp\test1> .\main.exe -m .\ggml-tiny.bin -l ja .\jfk.wav
whisper_init_from_file_with_params_no_state: loading model from '.\ggml-tiny.bin'
whisper_init_with_params_no_state: use gpu    = 1
whisper_init_with_params_no_state: flash attn = 0
whisper_init_with_params_no_state: gpu_device = 0
whisper_init_with_params_no_state: dtw        = 0
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = AMD Radeon(TM) 780M (AMD proprietary driver) | uma: 1 | fp16: 1 | warp size: 64
whisper_init_with_params_no_state: backends   = 2
whisper_model_load: loading model
whisper_model_load: n_vocab       = 51865
whisper_model_load: n_audio_ctx   = 1500
whisper_model_load: n_audio_state = 384
whisper_model_load: n_audio_head  = 6
whisper_model_load: n_audio_layer = 4
whisper_model_load: n_text_ctx    = 448
whisper_model_load: n_text_state  = 384
whisper_model_load: n_text_head   = 6
whisper_model_load: n_text_layer  = 4
whisper_model_load: n_mels        = 80
whisper_model_load: ftype         = 1
whisper_model_load: qntvr         = 0
whisper_model_load: type          = 1 (tiny)
whisper_model_load: adding 1608 extra tokens
whisper_model_load: n_langs       = 99
ggml_vulkan: Compiling shaders.............................Done!
whisper_model_load:  Vulkan0 total size =    77.11 MB
whisper_model_load: model size    =   77.11 MB
whisper_backend_init_gpu: using Vulkan backend
whisper_init_state: kv self size  =    3.15 MB
whisper_init_state: kv cross size =    9.44 MB
whisper_init_state: kv pad  size  =    2.36 MB
whisper_init_state: compute buffer (conv)   =   14.15 MB
whisper_init_state: compute buffer (encode) =   64.79 MB
whisper_init_state: compute buffer (cross)  =    3.88 MB
whisper_init_state: compute buffer (decode) =   96.81 MB

system_info: n_threads = 4 / 16 | AVX = 1 | AVX2 = 1 | AVX512 = 1 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | METAL = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | CUDA = 0 | COREML = 0 | OPENVINO = 0 | CANN = 0

main: processing '.\jfk.wav' (176000 samples, 11.0 sec), 4 threads, 1 processors, 5 beams + best of 5, lang = ja, task = transcribe, timestamps = 1 ..

[00:00:00.000 --> 00:00:30.000]  !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

whisper_print_timings:     load time =    63.88 ms
whisper_print_timings:     fallbacks =   5 p /   0 h
whisper_print_timings:      mel time =     5.75 ms
whisper_print_timings:   sample time =  3073.97 ms /  6600 runs (    0.47 ms per run)
whisper_print_timings:   encode time =    92.47 ms /     1 runs (   92.47 ms per run)
whisper_print_timings:   decode time =     0.00 ms /     1 runs (    0.00 ms per run)
whisper_print_timings:   batchd time = 13938.27 ms /  6588 runs (    2.12 ms per run)
whisper_print_timings:   prompt time =     0.00 ms /     1 runs (    0.00 ms per run)
whisper_print_timings:    total time = 17224.50 ms

Then with Debug build to investigate:

cmake -B build -DGGML_VULKAN=ON
cmake --build build --config Debug
.\main.exe -m .\ggml-tiny.bin -l ja .\jfk.wav
# Results in assertion error

Error in Debug Build

Debug Assertion Failed!
Program: ...whisper.dll
File: C:\Program Files\Microsoft VisualStudio\2022\Community\VC\Tools\MSVC\14.41.34120\include\random
Line:4924
Expression: invalid probability vector for discrete_distribution

This issue appears to be related to whisper.cpp#2400. I thought it might be connected to llama.cpp#10434, so I tried applying the same fix, but it didn't improve the situation. Any suggestions on how to resolve this would be appreciated.

DickyQi commented 3 days ago

MacBook Pro 16 inch 2019 with AMD 5500M has issue with Vulkan-loader 1.3.302 input data is Chinese, but auto language detect as af(p=0.01000) and no output test log info as following:

ggml_vulkan: Found 1 Vulkan devices: ggml_vulkan: 0 = AMD Radeon Pro 5500M (MoltenVK) | uma: 0 | fp16: 1 | warp size: 64 whisper_init_with_params_no_state: devices = 3 whisper_init_with_params_no_state: backends = 3 whisper_model_load: loading model whisper_model_load: n_vocab = 51865 whisper_model_load: n_audio_ctx = 1500 whisper_model_load: n_audio_state = 384 whisper_model_load: n_audio_head = 6 whisper_model_load: n_audio_layer = 4 whisper_model_load: n_text_ctx = 448 whisper_model_load: n_text_state = 384 whisper_model_load: n_text_head = 6 whisper_model_load: n_text_layer = 4 whisper_model_load: n_mels = 80 whisper_model_load: ftype = 1 whisper_model_load: qntvr = 0 whisper_model_load: type = 1 (tiny) whisper_model_load: adding 1608 extra tokens whisper_model_load: n_langs = 99 whisper_default_buffer_type: using device Vulkan0 (AMD Radeon Pro 5500M) ggml_vulkan: Compiling shaders..............................Done! whisper_model_load: Vulkan0 total size = 77.11 MB whisper_model_load: model size = 77.11 MB whisper_backend_init_gpu: using Vulkan0 backend whisper_backend_init: using BLAS backend whisper_init_state: kv self size = 3.15 MB whisper_init_state: kv cross size = 9.44 MB whisper_init_state: kv pad size = 2.36 MB whisper_init_state: compute buffer (conv) = 14.15 MB whisper_init_state: compute buffer (encode) = 64.79 MB whisper_init_state: compute buffer (cross) = 3.88 MB whisper_init_state: compute buffer (decode) = 96.81 MB

system_info: n_threads = 4 / 16 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | COREML = 0 | OPENVINO = 0 |

main: processing '/Volumes/Share/Streams/audio/voices/c4.wav' (160000 samples, 10.0 sec), 4 threads, 1 processors, 5 beams + best of 5, lang = auto, task = transcribe, timestamps = 1 ...

whisper_full_with_state: auto-detected language: af (p = 0.010000)

whisper_print_timings: load time = 368.12 ms whisper_print_timings: fallbacks = 5 p / 0 h whisper_print_timings: mel time = 8.73 ms whisper_print_timings: sample time = 5.37 ms / 30 runs ( 0.18 ms per run) whisper_print_timings: encode time = 218.31 ms / 2 runs ( 109.15 ms per run) whisper_print_timings: decode time = 6.32 ms / 1 runs ( 6.32 ms per run) whisper_print_timings: batchd time = 97.96 ms / 18 runs ( 5.44 ms per run) whisper_print_timings: prompt time = 0.00 ms / 1 runs ( 0.00 ms per run) whisper_print_timings: total time = 734.51 ms

DickyQi commented 3 days ago

BTW, if I revert to version 1.7.2 release(git: 6266a9f), then result is OK, and output right. And same code in Linux with 1080Ti vulkan backend also is correct。

gn64 commented 1 day ago

The issue was fixed in my environment by modifying the line const uint rowy = rowx % p.KY; to const uint rowy = (p.KY > 0) ? (rowx % p.KY) : 0; in the void soft_max(uint num_iters) function within ggml-vulkan/vulkan-shaders/soft_max.comp. This change prevents a division by zero error when p.KY is 0.