Open thewh1teagle opened 3 weeks ago
@ggerganov
Do you have any suggestions on how we can improve the stability of ggml and whisper.cpp to reduce crashes (aborts) and ensure they consistently return errors instead?
Hm, I haven't tested the Vulkan backend with whisper.cpp
at all, so cannot recommend any way to improve the stability. But looking at the error - this seems like its trying to load an invalid mode, no?
The other error seems like the GPU device runs out of memory. I think your application can check if there is enough available memory before trying to load the Whisper model.
@ggerganov
There's a lot of different issues with vulkan. for instance new issue reported that vulkan failed because it doesn't support fp16 storage https://github.com/ggerganov/llama.cpp/issues/7620
How can we fallback to CPU in case it failed? Vulkan is really important on Windows, that's the only wide GPU optimization we have currently on Windows.
I consider using OpenVino instead on Windows, but last time I checked it requires special files to be installed / special model file so it won't work better than Vulkan in dekstop app.
@ggerganov
I've noticed that CoreML/Metal includes a fallback mechanism to CPU. Since Vulkan has compatibility issues on many modern PCs, it would be great if Vulkan could have a similar fallback.
Would you be able to outline the steps needed to implement a CPU fallback for Vulkan? I'm willing to work on it and collaborate with others to push this forward. Should I focus on this in the ggml repository or in whisper.cpp?
Thanks!
I think the fallback mechanism only applies to operators that are not yet implemented on the backend. Are there such operators in the Vulkan backend?
With the change that I just pushed, the memory usage should be reduced significantly. I will make a new whisper.cpp
release in the following days, and after that, if the issues still persist, we can discuss how to improve the Vulkan state.
@ggerganov
Tiny model still fail to load on latest commit with vulkan. 1GB of gpu is available
C:\ReallyTempEmptyEveryDay\vibe.test>.\vibe.exe
C:\ReallyTempEmptyEveryDay\vibe.test>ggml_vulkan: Found 1 Vulkan devices:
Vulkan0: NVIDIA GeForce GTX 1660 Ti (NVIDIA) | uma: 0 | fp16: 1 | warp size: 32
ggml_gallocr_needs_realloc: graph has different number of nodes
ggml_gallocr_alloc_graph: cannot reallocate multi buffer graph automatically, call reserve
ggml_backend_sched_alloc_splits: failed to allocate graph, reserving (backend_ids_changed = 0)
ggml_gallocr_reserve_n: reallocating NVIDIA GeForce GTX 1660 Ti buffer from size 0.00 MiB to 11.08 MiB
ggml_gallocr_reserve_n: reallocating CPU buffer from size 0.00 MiB to 0.00 MiB
ggml_gallocr_needs_realloc: graph has different number of nodes
ggml_gallocr_alloc_graph: cannot reallocate multi buffer graph automatically, call reserve
ggml_backend_sched_alloc_splits: failed to allocate graph, reserving (backend_ids_changed = 0)
ggml_gallocr_reserve_n: reallocating NVIDIA GeForce GTX 1660 Ti buffer from size 0.00 MiB to 60.29 MiB
ggml_gallocr_reserve_n: reallocating CPU buffer from size 0.00 MiB to 0.00 MiB
ggml_gallocr_needs_realloc: graph has different number of nodes
ggml_gallocr_alloc_graph: cannot reallocate multi buffer graph automatically, call reserve
ggml_backend_sched_alloc_splits: failed to allocate graph, reserving (backend_ids_changed = 0)
ggml_gallocr_reserve_n: reallocating NVIDIA GeForce GTX 1660 Ti buffer from size 0.00 MiB to 2.20 MiB
ggml_gallocr_reserve_n: reallocating CPU buffer from size 0.00 MiB to 0.00 MiB
ggml_backend_sched_alloc_splits: failed to allocate graph, reserving (backend_ids_changed = 1)
ggml_gallocr_reserve_n: reallocating NVIDIA GeForce GTX 1660 Ti buffer from size 0.00 MiB to 89.95 MiB
ggml_vulkan: Device memory allocation of size 94318336 failed.
ggml_vulkan: vk::Device::allocateMemory: ErrorOutOfDeviceMemory
ggml_gallocr_reserve_n: failed to allocate NVIDIA GeForce GTX 1660 Ti buffer of size 94318336
I think the fallback mechanism only applies to operators that are not yet implemented on the backend. Are there such operators in the Vulkan backend?
Not that I'm aware of. I thought that it fallback completely to cpu. That should be useful
@thewh1teagle Can you confirm that the memory allocation issue is now fixed with the latest commit on master
?
Vulkan has a lot of bugs on Windows / Linux. but when it works, it works much faster than CPU. (10-20x faster) I'm forced to use Vulkan in the project vibe but many users report that it's crash on Windows / Linux.
Some of the errors:
PopOS https://github.com/thewh1teagle/vibe/issues/269
Ubuntu
Arch https://github.com/thewh1teagle/vibe/issues/267
Windows https://github.com/thewh1teagle/vibe/issues/266
https://github.com/thewh1teagle/vibe/issues/263
Windows