dice-group / gerbil

GERBIL - General Entity annotatoR Benchmark
GNU Affero General Public License v3.0
224 stars 58 forks source link

[Question] D2KB - Removal of non-matching response before the evaluation #127

Closed tiepm closed 8 years ago

tiepm commented 8 years ago

Hi, I would like to ask about the GERBIL-D2KB. I have read the paper, the task documentation (https://github.com/AKSW/gerbil/wiki/D2KB) and a related issue (https://github.com/AKSW/gerbil/issues/119) but am still uncertain whether the non-matching responses are counted as false positives or not.

Let's have an example. The golden standard is for mentions G=(a,b,d) The system returns queries for A=(a,b,c). Let's say the queries for a,b are correct by strong annotation matching.

I know that c is removed by strong annotation matching. My question is whether c is counted as a false positive during the calculation of precision. When I read the task documentation, I thought c is not counted as a false positive as "all entities that do not exactly match one of the marked entities in the gold standard are removed from the response of the annotator BEFORE IT IS EVALUATED."

So there are two ways 1/if c is counted as false positive. Precision is 2/3=len([a,b])/len([a,b,c,]). Recall=2/3=len([a,b])/len([a,b,d]) 2/if c is not counted as false positive. Precision is 1=len([a,b])/len([a,b]). Recall=2/3=len([a,b])/len([a,b,d])

Could you confirm which is correct?

Thank you very much.

MichaelRoeder commented 8 years ago

C is removed and not counted as false positive. Thus, the second result (P=1.0, R=2/3) is correct.

Thanks for raising this question. I added an example similar to yours to the wiki page to make this clearer.

Cheers, Michael Röder