Bad performance - Githubissues

A 2-second clip isn't a great test - the model takes time to load. Your measured time may be mostly model load rather than transcription. I would test on at least a 30-second clip.

The OpenAI Whisper models are optimized for GPUs. The original whisper ones are much slower on CPUs - though the difference is reduced with faster-whisper.

Some reference numbers from my testing with the "small" model: 6 min 30 second audio in English

Original OpenAI Whisper: small, beam size 5. Ryzen 5 5600G desktop CPU: 4m 22s NVIDIA GTX 1050 Ti Max-Q: 1m31s

Faster-whisper: int8 quantized model, beam size 5, 4 CPU threads for CPU mode: Ryzen 5 5600G desktop CPU: 0m54s Ryzen 5 5600U laptop CPU: 1m3s NVIDIA GTX 1050 Ti Max-Q: 0m28s

For faster-whisper I found that more than 4 CPU threads made little difference. Memory bandwidth matters as single RAM stick increased time by 50% or so.

SYSTRAN / faster-whisper

Bad performance #1073