Any suggestions for improving speaker diarization!!

MahmoudAshraf97 / whisper-diarization

Automatic Speech Recognition with Speaker Diarization based on OpenAI Whisper

BSD 2-Clause "Simplified" License

3.43k stars 288 forks source link

Any suggestions for improving speaker diarization!! #180

Closed utility-aagrawal closed 4 months ago

utility-aagrawal commented 5 months ago

I ran diarize_parallel.py on this video [https://www.youtube.com/watch?v=b3ZdSWhU5vA] and the pipeline only identified one speaker which is clearly wrong. I am wondering if you can provide any suggestions to improve this.

Thanks!

transcriptionstream commented 5 months ago

I pulled the audio for that video and ran it through and got a single speaker as well. I don't think the audio quality is good enough for the identification of the multiple speakers.

utility-aagrawal commented 5 months ago

Thanks @transcriptionstream ! This is how the audio quality is going to be for my use case. From your experience, do you have any suggestions maybe some kind audio preprocessing, hyperparameter tuning etc. that I can use to improve the diarization? I have tried multiple solutions (whisperx, whisper-diarization, AWS transcribe, AssemblyAI, picovoice) but nothing seems to work good enough. Not just on this video, I am yet to find a video on which diarization is nearly accurate.

corneliusgerico commented 5 months ago

Thanks @transcriptionstream ! This is how the audio quality is going to be for my use case. From your experience, do you have any suggestions maybe some kind audio preprocessing, hyperparameter tuning etc. that I can use to improve the diarization? I have tried multiple solutions (whisperx, whisper-diarization, AWS transcribe, AssemblyAI, picovoice) but nothing seems to work good enough. Not just on this video, I am yet to find a video on which diarization is nearly accurate.

Try https://huggingface.co/spaces/vumichien/Whisper_speaker_diarization