MahmoudAshraf97 / whisper-diarization

Automatic Speech Recognition with Speaker Diarization based on OpenAI Whisper
BSD 2-Clause "Simplified" License
3.58k stars 313 forks source link

Poor diarization. #254

Closed Oguret2 closed 1 hour ago

Oguret2 commented 2 days ago

Hello, I'm a newbie and just started using the program today. I somehow managed to set everything up, but Speaker Diarization isn't working very well. Firstly, the program doesn't recognize more than three speakers, and secondly, phrases from one speaker are often attributed to another. In other words, the voice separation is very poor. My audio is in Russian. Maybe I need to enable something or tweak some parameter to improve the result? Thanks for any advice.

My launch command: python diarize.py -a "D:\Temp2\97\31231.mp3" --whisper-model large-v3 --language ru --device cuda

MahmoudAshraf97 commented 2 days ago

Hello, you can try playing with these parameters: https://github.com/MahmoudAshraf97/whisper-diarization/blob/23c104ab6272d4663fd5766bbae373cf9d78352d/nemo_msdd_configs/diar_infer_telephonic.yaml#L39-L44 they are responsible for voice separation