Open Abandersen04 opened 3 weeks ago
Yes this is one of idea that has been requested previously. It is in the list of things that would be nice to have in the future. Also seems that https://github.com/MahmoudAshraf97/whisper-diarization works quite well, so it could be implemented in the Buzz at some future day.
Prompt feature of the whisper models is described here https://cookbook.openai.com/examples/whisper_prompting_guide In my testing it has not showed super meaningful results, but other may get better results. Feel free to share feedback on the results of prompting as it may be useful to others
Isn't possible to utilize pyannote.audio as it is claiming to show good results.
I’m currently using Buzz for transcribing interviews with multiple speakers. However, I’ve noticed that the transcription doesn’t differentiate between different voices or speakers in the audio. Is speaker diarization (speaker identification) available or on the roadmap as a feature?
Additionally, I noticed the "prompt" feature, but it doesn't seem to affect speaker recognition. Could you clarify its purpose and if it might relate to this?
Thanks in advance for your help!