SYSTRAN / faster-whisper

Faster Whisper transcription with CTranslate2
MIT License
11.57k stars 960 forks source link

please update readme regarding comparison with whisper.cpp and possibly others #592

Open BBC-Esq opened 10 months ago

BBC-Esq commented 10 months ago

Hello, I noticed the readme compares whisper.cpp and gives certain transcription times for it. Can someone please update that perhaps? There's been a recent release that addresses the beam size slowdown with whisper.cpp that I believe is significant...See here:

https://github.com/ggerganov/whisper.cpp/releases/tag/v1.5.0

It states:

At last, whisper.cpp now supports efficient Beam Search decoding. The missing piece was the implementation of batched decoding...On modern NVIDIA hardware, the performance with 5 beams is the same as 1 beam thanks to the large amount of computing power available. With Metal, the speed with 5 beams is a bit slower compared to 1 beam, but it is significantly faster compared to 5x times the time for single batch which was observed with the old naive implementation.

Also, if possible can there be direct "apples to apples" comparisons with other options as well such as:

https://github.com/sanchit-gandhi/whisper-jax

https://github.com/Vaibhavs10/insanely-fast-whisper

EDITED:

I'm messaged the fine folks over at Huggingface as well for clarification, the goal being to get accurate numbers and not hyperbole. Thanks!

SEE HERE AND SEE HERE

And I'd like to get comparisons from the whisper.cpp people as well, but they don't compare their implementation to faster-whisper nor jax nor insanely-fast-whisper so...not as concerned.

BBC-Esq commented 10 months ago

Here's a link to my updated conversation with the developer of insanely-faster-whisper if anyone is interested:

https://github.com/Vaibhavs10/insanely-fast-whisper/issues/82

Also, I'm not even sure if the newer GGUF format from llama.cpp uses float 32 anymore let alone float 16. The only overlap with faster-whisper might be 8-bit quant...so if that's the case I'd ask for some re-testing of 8-bit for a true comparison.