SYSTRAN / faster-whisper

Faster Whisper transcription with CTranslate2
MIT License
12.43k stars 1.04k forks source link

Bad performance #1073

Open 1848 opened 3 weeks ago

1848 commented 3 weeks ago

Hi,

I already found #279 and I think I have the same issue. I am using the code from #279 for benchmarking (only checking faster-whisper). A 2 seconds wav-file needs nearly 3 seconds to process with the "small" model. Using large-v3 takes 12 seconds. Running a VM with 8 cores (2,4ghz) and 16GB memory, no GPU. Python 3.12 and faster-whisper 1.0.3

I dont think this performance is expected, right?

rjwilmsi commented 2 weeks ago

A 2-second clip isn't a great test - the model takes time to load. Your measured time may be mostly model load rather than transcription. I would test on at least a 30-second clip.

The OpenAI Whisper models are optimized for GPUs. The original whisper ones are much slower on CPUs - though the difference is reduced with faster-whisper.

Some reference numbers from my testing with the "small" model: 6 min 30 second audio in English

Original OpenAI Whisper: small, beam size 5. Ryzen 5 5600G desktop CPU: 4m 22s NVIDIA GTX 1050 Ti Max-Q: 1m31s

Faster-whisper: int8 quantized model, beam size 5, 4 CPU threads for CPU mode: Ryzen 5 5600G desktop CPU: 0m54s Ryzen 5 5600U laptop CPU: 1m3s NVIDIA GTX 1050 Ti Max-Q: 0m28s

For faster-whisper I found that more than 4 CPU threads made little difference. Memory bandwidth matters as single RAM stick increased time by 50% or so.