[Question] D2KB - Removal of non-matching response before the evaluation

Hi, I would like to ask about the GERBIL-D2KB. I have read the paper, the task documentation (https://github.com/AKSW/gerbil/wiki/D2KB) and a related issue (https://github.com/AKSW/gerbil/issues/119) but am still uncertain whether the non-matching responses are counted as false positives or not.

Let's have an example. The golden standard is for mentions G=(a,b,d) The system returns queries for A=(a,b,c). Let's say the queries for a,b are correct by strong annotation matching.

I know that c is removed by strong annotation matching. My question is whether c is counted as a false positive during the calculation of precision. When I read the task documentation, I thought c is not counted as a false positive as "all entities that do not exactly match one of the marked entities in the gold standard are removed from the response of the annotator BEFORE IT IS EVALUATED."

So there are two ways 1/if c is counted as false positive. Precision is 2/3=len([a,b])/len([a,b,c,]). Recall=2/3=len([a,b])/len([a,b,d]) 2/if c is not counted as false positive. Precision is 1=len([a,b])/len([a,b]). Recall=2/3=len([a,b])/len([a,b,d])

Could you confirm which is correct?

Thank you very much.

dice-group / gerbil

[Question] D2KB - Removal of non-matching response before the evaluation #127