Open clstaudt opened 1 year ago
Hey, yes, speaker diarization really needs to be improved. The current models varies widely in its results depending on the difference of the speakers within the audio and the level of background noise etc. I think the most interesting way forward for me right now is using the models proposed in https://github.com/audapolis/audapolis/issues/366
I started playing around with them a while ago, but didn't find time yet to integrate them into audapolis
@pajowu Yes, I also believe pyannote is worth exploring.
First test of audapolis with the German (big) language model. While the transcription is remarkably accurate, the speaker diarization is not plausible, since it splits the same speaker into many with very small segments of text.
It also seems to detect new "speakers" who do not actually say something (Speaker 7 in the screenshot).
Any ideas about what could be done to improve diarization / speaker detection?