chidiwilliams / buzz

Buzz transcribes and translates audio offline on your personal computer. Powered by OpenAI's Whisper.
https://chidiwilliams.github.io/buzz
MIT License
12.56k stars 945 forks source link

Is Speaker Diarization Available or Planned? #961

Open Abandersen04 opened 3 weeks ago

Abandersen04 commented 3 weeks ago

I’m currently using Buzz for transcribing interviews with multiple speakers. However, I’ve noticed that the transcription doesn’t differentiate between different voices or speakers in the audio. Is speaker diarization (speaker identification) available or on the roadmap as a feature?

Additionally, I noticed the "prompt" feature, but it doesn't seem to affect speaker recognition. Could you clarify its purpose and if it might relate to this?

Thanks in advance for your help!

raivisdejus commented 3 weeks ago

Yes this is one of idea that has been requested previously. It is in the list of things that would be nice to have in the future. Also seems that https://github.com/MahmoudAshraf97/whisper-diarization works quite well, so it could be implemented in the Buzz at some future day.

Prompt feature of the whisper models is described here https://cookbook.openai.com/examples/whisper_prompting_guide In my testing it has not showed super meaningful results, but other may get better results. Feel free to share feedback on the results of prompting as it may be useful to others

adijahangir123 commented 3 weeks ago

Isn't possible to utilize pyannote.audio as it is claiming to show good results.