collabora / WhisperLive

A nearly-live implementation of OpenAI's Whisper.
MIT License
1.59k stars 214 forks source link

API: Return transcribed text #220

Open powellnorma opened 1 month ago

powellnorma commented 1 month ago

Looking at the code, I don't see how the library user is supposed to access the transcribed text? It looks like it just gets printed?

https://github.com/collabora/WhisperLive/blob/e1a42c22d2de65303ec34f54805ade0e84a80d09/whisper_live/client.py#L123

I think a workaround would be to read the output.srt - But maybe we could also just return the transcribed text as string?

makaveli10 commented 1 month ago

@powellnorma Thanks for using the library. I think you make a good point, we can bring this feature in an upcoming release.

tidymonkey81 commented 1 month ago

i've just got my custom fast-whisper model working on a docker server and am looking where i can implement this myself. i haven't changed volume threshold settings for VAD yet but i get a lot of junk tokens. with slow whisper i implemented a black list for phrases like "Thank you", "Thanks very much", etc that get thrown out by the model. I think i can see where to look at transcribe() in transcriber.py to maybe select phrases and so expose them but the process seems expensive so i might need to look further.