collabora / WhisperLive

A nearly-live implementation of OpenAI's Whisper.
MIT License
2.13k stars 289 forks source link

Getting the text #145

Open christophschuhmann opened 9 months ago

christophschuhmann commented 9 months ago

When I run: from whisper_live.client import TranscriptionClient client = TranscriptionClient( "localhost", 9090, lang="hi", translate=True, model="small" ) client()

I get the transcribed text printed to the screen. But how can I receive it as string or list of strings in the python script that calls the client() ? E.G. if i want to send it to an LLM.

makaveli10 commented 9 months ago

@christophschuhmann Hello, thanks for the interest in the project. To get the list of segments from whisper you can take a look at this for loop which reads and displays the segment on the terminal: https://github.com/collabora/WhisperLive/blob/9d29b08cea2fc6224cf3d27cf97fbeef55b137a7/whisper_live/client.py#L210

We send last 10 segments to the client from the whole session. The last segment in the list could change because there is a possibility that the last word might be truncated in the audio chunk but will be completed in the next chunk.

fallenangel3k commented 9 months ago

it would be cool to have those variables exposed to a command-line-argument, too. so it's easy to alter without changing the code itself. there are multiple other variables which sould be exposed this way, too, as an optional command line argument. should be possible and would be a "nice to have" <3

please show a way, like the OP asked, to interfere with an external application like an LLM etc. with a) file-input b) realtime (like hls/extension) ... (e.g. i want to use whisperlive as the input for my personal ai-assistant)

edit: i tried to alter line 210 manually, but the line is total other one. also searching for the specific code-part you mention did bring up no results --> "for i, seg in enumerate(message)"