Add OneDNN or DirectML support

thewh1teagle commented 1 month ago

Currently the best results we can get with whisper.cpp is with Cuda (Nvidia) or CoreML (macOS).

On Windows there's only OpenBlas and it works slow, maybe 2 times of the duration of the audio (amd ryzen 5 4500u, medium model). When using ctranslate2 on the same machine it works 2-3 times faster than the audio duration on CPU only!

Since recently whisper.cpp removed support for OpenCL, I think that it's important having good alternative to Windows users with Intel / AMD CPUs / TPUs.

There's few different options that can be added: oneDNN-ExecutionProvider.html DirectML-ExecutionProvider.html

In addition ctranslate2 uses ruy

WilliamTambellini commented 1 month ago

+1 for oneDNN

WilliamTambellini commented 1 month ago

https://github.com/ggerganov/ggml/pull/855

WilliamTambellini commented 1 month ago

cf @rfsaliev

thewh1teagle commented 1 month ago

Update: meanwhile I'm sticking to release v1.6.2 which still have OpenCL support. Otherwise as I said the speed is too much slow and not usable (2-5x time more than the audio duration).

Now with OpenCL it takes 40s to transcribe 47s audio on the same normal TPU hardware (amd ryzen5 4500u)

By the way there was weird issues with OpenCL that prevent it from work. the solution I found is to set CMAKE_BUILD_TYPE to RelWithDebInfo

ggerganov / whisper.cpp

Add OneDNN or DirectML support #2303