Open evanarlian opened 8 months ago
Absolutely! So glad you asked! lol. Ctranslate2 actually does support true batching, but at the C++ level. I'll give you my repository that uses it via the amazing WhisperS2T
as well as a direct link to that repository. It's my understanding that there's been a fair amount of discussion on the faster-whisper
repository about "batch" processing but that it's not feasible right now due to the state of the repository. On the other hand, faster-whisper
has multiple other functionalities that WhisperS2T
does not. Keep in mind that my repository is a "little" outdated since I haven't updated it to the most recent WhisperS2T
, so consult the upstream with any changes to the API.
However, if you use my repo for sample scripts and keep the versioning the same, you should be fine. I have a lot of experience with WhisperS2T
now so feel free to hit me up.
https://github.com/BBC-Esq/WhisperS2T-transcriber
...and the amazing...
https://github.com/shashikg/WhisperS2T
At ~150 stars it flies under the radar...but yet it beats huggingface's "insanely" (hate that name) implementation of Whisper that has thousands of stars. Just goes to show how many stars the stereotypical huggingface repo gets is NOT AT ALL related to the quality of their product, but is more boosted by marketing and networking buddy referrals...Give credit where credit is due is what I say. Try whisperS2T
and interested in your feedback!
BTW, just haven't had the time to update my whispers2t batch repo with this bad boy so stay tuned. ;-)
It allows you to specify task, choose any ctranslate2 quantization you want, process all sub-directories recursively, exclude certain file extensions from being processed, change beam size, batch size (courtesy of WhisperS2T), and so on.
Last post I promise...but here's my analysis of WhisperS2T
. I believe my repo uses a traditional "loop" to process using WhisperS2T...but you can also send a batch of information directly to ctranslate2
to process, which is inherently the way WhisperS2T
is meant to run. HOWEVER, I opted for the "loop" method because if you send all audio files at once...if ONE fails they all fail, you get ZERO transcriptions. What I found is that if I process, say, 500 audio files, ONE might have corrupted data, thus the entire process triggers an error...
This is supposed to be fixed, however, per this discussion:
https://github.com/shashikg/WhisperS2T/issues/50
Anyways, expand below to see my analysis of the library (not most current version, however):
Thank you for telling me about WhisperS2T. I'll take a look later. Currently I'm not using faster-whisper
, but instead directly using CTranslate2. The hope is that batching could be used to speed up generation, but right now it does not have the speedups vs just using the standard loop.
Whisper S2T uses ctransalate 2 directly basically.
In CTranslate2 Whisper model, batch generate is not faster than looping one by one. I tried the same thing on Translator model and it shows batching is far superior (a lot faster). I used Whisper small converted to int8 using ct2 tool. Also, GPU memory is higher when batching so I thought CTranslate2 is doing "proper" batching (and not a looping wrapper). Here is my simple Whisper code.
When I ran the code on colab (T4 GPU), it outputs:
Is there anything I could do to increase the speed of Whisper batch generation?