Whisper-live taking same time on CPU and GPU to transcribe an audio

collabora / WhisperLive

A nearly-live implementation of OpenAI's Whisper.

MIT License

2.09k stars 283 forks source link

Whisper-live taking same time on CPU and GPU to transcribe an audio #204

Closed prem1303 closed 6 months ago

prem1303 commented 7 months ago

I am using whisper-live==0.2.1 , faster-whisper==0.10.0 and Ctranslate2==4.0.0

Transcribing a 30-second audio file currently requires the same amount of time whether processed on a CPU or GPU, approximately 2 minutes. Any guidance on enhancing GPU performance to expedite this task would be greatly valued.

makaveli10 commented 7 months ago

https://github.com/collabora/WhisperLive/issues/201

makaveli10 commented 6 months ago

@prem1303 which model are you using?

prem1303 commented 6 months ago

I solved the issues. Thank you very much for your support and time.

makaveli10 commented 6 months ago

Glad that you could solve the issue @prem1303, feel free to post the solution here or open a Pull request if WhisperLive's code still has the problem since we are busy with other projects so, we rely on the community for this time to help us in improving WhisperLive.

alpcansoydas commented 6 months ago

@prem1303 Can you share the solution please ? :)

makaveli10 commented 6 months ago

@alpcansoydas @prem1303 Now that i put a thought to it. I guess it should take same amount of time no? because its real-time transcription and you process audio frames as you receive it not before that. So, if your CPU is good enough and you use a smaller size model it should be fast on both CPU and GPU. Although, that shouldnt be the case with small, medium or large-v2/large-v3.