bugbakery / audapolis

an editor for spoken-word audio with automatic transcription
GNU Affero General Public License v3.0
1.69k stars 40 forks source link

improve diarization #406

Open clstaudt opened 1 year ago

clstaudt commented 1 year ago

First test of audapolis with the German (big) language model. While the transcription is remarkably accurate, the speaker diarization is not plausible, since it splits the same speaker into many with very small segments of text.

It also seems to detect new "speakers" who do not actually say something (Speaker 7 in the screenshot).

Any ideas about what could be done to improve diarization / speaker detection?

Screenshot 2022-11-13 at 23 17 33
pajowu commented 1 year ago

Hey, yes, speaker diarization really needs to be improved. The current models varies widely in its results depending on the difference of the speakers within the audio and the level of background noise etc. I think the most interesting way forward for me right now is using the models proposed in https://github.com/audapolis/audapolis/issues/366

I started playing around with them a while ago, but didn't find time yet to integrate them into audapolis

clstaudt commented 1 year ago

@pajowu Yes, I also believe pyannote is worth exploring.