Closed jvamvas closed 3 years ago
You're right, the evaluation scripts use a different definition from the one listed in the papers. The formula looks correct, and I cannot reconstruct anymore why the scripts use a different one. I've adapted the scripts to produce both variants: RecallA = pos / (pos+unk)
and RecallB = pos / (pos+unk+neg)
. I would recommend you to refer to RecallB
if you plan to use MuCoW.
@yvesscherrer Thank you for the reply!
Thank you for providing this great resource! Could you help me understand the meaning of recall in your evaluation protocol?
In your papers, you define precision and recall as follows:
Given those definitions, I would expect that recall is <= precision. However, in the results there are some rows where recall is actually greater than precision.
In the evaluation scripts, recall is computed as
pos / (pos+unk)
. This formula makes the results more plausible, but I have a hard time understanding it intuitively (whereas the definitions from the paper would make more sense to me, with the F1 as a middle stance between ignoring UNK, and treating all UNK as incorrect).