catalpa-cl / inceptalytics

An easy-to-use API for analyzing INCEpTION annotation projects.
16 stars 7 forks source link

Computation of IAA in inceptalytics #23

Open stbachinger opened 1 year ago

stbachinger commented 1 year ago

Hello :) I am currently in the process of calculating different values of the IAA for an annotation project in inception. We used inceptalytics and just to make sure, I manually calculated Cohens kappa and Krippendorffs alpha. But the values I get from inceptalytics are vastly different than the ones from my manual calculation. There, I use the IOB files and add for each annotator the annotation to an individual array ( arr1 for annotator 1, arr2 for annotator 2, etc.). So for each token, there is an annotation. And those arrays are used to get the following results:

Manuell calculation: Pair-wise Cohen-Kappa-Scores for all annotations (including O) Ann1: Ann2 (pairwise Cohen-Kappa-Score) 0.27756503652905506 Ann1: Ann3 (pairwise Cohen-Kappa-Score) 0.12717909466991562 Ann2: Ann3 (pairwise Cohen-Kappa-Score) 0.24369264052470352 Krippendorff: 0.22980529268056016

Inceptalytics' calculations Ann1: Ann2 (pairwise Cohen-Kappa-Score) 0.8176 Ann1: Ann3 (pairwise Cohen-Kappa-Score) 0.8185 Ann2: Ann3 (pairwise Cohen-Kappa-Score) 0.9121 Krippendorff: 0.8258

I pulled the newest version of inceptalytics and am using the template from examples/dashboard.py without changing anything. My guess is that inceptalytics is using another way of creating the arrays from the source files, which would also explain why n in the Cohen's kappa is relatively low. But so far, I wasn't able to find out why that is. I would be very grateful for any help or insights into the generation of input values for the IAA scores and the reasons why they were implemented the way they were. Thank you so much!

zesch commented 1 year ago

IAA for sequence labeled data is complicated ... We are internally not using an IOB coded version, but the span annotations that are created in Inception (I am assuming that you are not labeling every token manually with IOB in Inception, but only annotate the spans where your target construct is to be found). In that case, I would suggest to try the (experimental) integration of the gamma IAA measure.

stbachinger commented 1 year ago

Thank you so much! So if we have the following text: "I love my cat" and annotator 1 would annotate "cat" with label "animal" and annotator 2 would annotate "my cat" with label "animal", how exactly would inceptalytics handle that in regards to IAA? In my manual calculation, I would have the following IOB file for annotator 1: I O love O my O cat B-animal

and for annotator 2: I O love O my B-animal cat I-animal

resulting in ['O', 'O', 'O', 'B-animal] and ['O', 'O', 'B-animal', 'I-animal'] or easier as ['O', 'O', 'O', 'animal] and ['O', 'O', 'animal', 'animal'] as input for Cohen's Kappa calculation Any help is appreciated. Thank you so much!

zesch commented 1 year ago

As said above, you could use the gamma IAA measure.

If you want kappa, you would currently need a single annotation layer e.g. IOB with annotation values "I", "O", B-X" etc. (similar to having a POS annotation with values "N", "V", "A" etc.). Computing kappa should then get you the same values as in your example.