OpenNMT / CTranslate2

Fast inference engine for Transformer models
https://opennmt.net/CTranslate2
MIT License
3.43k stars 306 forks source link

Whisper encode roughly 4x slower than openai/pytorch #1699

Open whispy-woods opened 6 months ago

whispy-woods commented 6 months ago

Obviously the encoding time is almost a non-issue, only when you are working on very small audio chunks it could even hope to shave off some meaningful total percentage of runtime.

I just wanted to mention it in case it is flying under the radar and there might be a quick fix to it. For example, on RTX 4080 both Linux/Windows the encode takes around 0.08s in ctranslate2 and 0.02s with the openAI reference implementation, same 4x difference on two other systems with RTX 4090 and RTX 4060. Thanks for all the work!

BBC-Esq commented 6 months ago

Do you have some of the code that you could share so we can see what might be going on?