Closed Oguret2 closed 1 hour ago
Hello, you can try playing with these parameters: https://github.com/MahmoudAshraf97/whisper-diarization/blob/23c104ab6272d4663fd5766bbae373cf9d78352d/nemo_msdd_configs/diar_infer_telephonic.yaml#L39-L44 they are responsible for voice separation
Hello, I'm a newbie and just started using the program today. I somehow managed to set everything up, but Speaker Diarization isn't working very well. Firstly, the program doesn't recognize more than three speakers, and secondly, phrases from one speaker are often attributed to another. In other words, the voice separation is very poor. My audio is in Russian. Maybe I need to enable something or tweak some parameter to improve the result? Thanks for any advice.
My launch command:
python diarize.py -a "D:\Temp2\97\31231.mp3" --whisper-model large-v3 --language ru --device cuda