bootphon / pygamma-agreement

Gamma Agreement in Python
MIT License
43 stars 8 forks source link

Evaluating speaker diarization systems with γ inter-annotator agreement #16

Open hbredin opened 3 years ago

hbredin commented 3 years ago

Nice package 👍

I am wondering whether it would make sense to use γ inter-annotator agreement for evaluation speaker diarization systems (in place of good old diarization error rate, aka DER):

I understand (maybe incorrectly) that both annotators need to use the same set of speaker labels.

How would you handle the case where both annotators use different sets of labels? Would you need to match them first (like what is already done in DER)?

How would you choose (temporal) alpha and (categorical) beta weights?

hadware commented 3 years ago

Thanks sensei!

It would indeed make sense, and we've intensely thought about it! We're just not entirely sure yet...

Rachine commented 3 years ago

Hello Hervé! Thank you! This is a question we want to explore and we discuss a lot!

We tried to apply the γ to replace IER, the behaviors were not consistent at all. I think the framework are very similar, but there are differences to take into account. I think the gamma has some limitations and need adaptations.

hbredin commented 3 years ago

Thank you both for your detailed answers.

To summarize my understanding: using this metric for speaker diarization is not that obvious and remains an open research question.

Thinking out loud: maybe its use for combining multiple speaker diarization systems would be something to look at, as well (in the same spirit as in https://github.com/desh2608/dover-lap/ by @desh2608)