collabora / WhisperLive

A nearly-live implementation of OpenAI's Whisper.
MIT License
1.52k stars 203 forks source link

Add Speaker Diarization #90

Open mrtoorich opened 6 months ago

mrtoorich commented 6 months ago

Can we incorporate speaker identification into the transcription results?

I found a project called whisper-diarization from Faster Whisper's Community integrations section.

Is it possible for us to integrate it?

zoq commented 6 months ago

The reason why we haven't added the diarization part is most of them don't work in a live setting. Usually you do a pass over the entire audio to figure out how many different speakers you have followed by the speaker identification. That said, would be worth to benchmark some solutions and run two models in parallel, one for the transcription and one for the diarization to reduce the latency.

mrtoorich commented 6 months ago

I understand, thank you very much for the information. However, in my use case, the ability to differentiate between speakers would indeed enhance the experience. Perhaps a lightweight solution could be employed to distinguish voiceprints. I plan to follow up on this and conduct extensive research over the long term.