Long audio needs much more time to transcribe

SYSTRAN / faster-whisper

Faster Whisper transcription with CTranslate2

MIT License

12.66k stars 1.06k forks source link

Long audio needs much more time to transcribe #1156

Closed terryops closed 3 days ago

terryops commented 5 days ago

For example, a 30 minute audio takes 5 minutes to transcribe, but a 3 hour audio takes an hour to do so. Why? How can I make it faster? Should I split them up and transcribe them separately? If so, how can I pass the last window's prompt to the next split file?

MahmoudAshraf97 commented 5 days ago

Transcription time depends on the audio duration AND content, so comparing by duration alone isn't relevant, what is happening is that you are probably using a low quality audio with a lot of fallbacks, set temperature=[0] to avoid that on the expense of probably lower transcription quality, or use batched transcription