MahmoudAshraf97 / whisper-diarization

Automatic Speech Recognition with Speaker Diarization based on OpenAI Whisper
BSD 2-Clause "Simplified" License
2.53k stars 243 forks source link

diarization wrongly assigns speaker 0 and 1 sometimes. #112

Open manjunath7472 opened 9 months ago

manjunath7472 commented 9 months ago

Transcription is good but diarisation speaker labels are wrong sometimes, speaker 0 mapped as speaker 1 down the line and vice versa. Am using Indian English conversation as audio input. Its conversation between a teacher teaching and student online. Could you suggest any more precise methods or alterations. Any other nemo configs available other than telephonic. Does it require any additional training for indian english accent? Could anyone suggest some near perfect pipeline for this? @MahmoudAshraf97

v-nhandt21 commented 9 months ago

It seems the diarization from Nemo is not good enough, anyone else got the same problem?

manjunath7472 commented 8 months ago

This might be the solution. https://github.com/google/uis-rnn