Closed utility-aagrawal closed 4 months ago
I pulled the audio for that video and ran it through and got a single speaker as well. I don't think the audio quality is good enough for the identification of the multiple speakers.
Thanks @transcriptionstream ! This is how the audio quality is going to be for my use case. From your experience, do you have any suggestions maybe some kind audio preprocessing, hyperparameter tuning etc. that I can use to improve the diarization? I have tried multiple solutions (whisperx, whisper-diarization, AWS transcribe, AssemblyAI, picovoice) but nothing seems to work good enough. Not just on this video, I am yet to find a video on which diarization is nearly accurate.
Thanks @transcriptionstream ! This is how the audio quality is going to be for my use case. From your experience, do you have any suggestions maybe some kind audio preprocessing, hyperparameter tuning etc. that I can use to improve the diarization? I have tried multiple solutions (whisperx, whisper-diarization, AWS transcribe, AssemblyAI, picovoice) but nothing seems to work good enough. Not just on this video, I am yet to find a video on which diarization is nearly accurate.
Try https://huggingface.co/spaces/vumichien/Whisper_speaker_diarization
I ran diarize_parallel.py on this video [https://www.youtube.com/watch?v=b3ZdSWhU5vA] and the pipeline only identified one speaker which is clearly wrong. I am wondering if you can provide any suggestions to improve this.
Thanks!