chrisjbryant / errant

ERRor ANnotation Toolkit: Automatically extract and classify grammatical errors in parallel original and corrected sentences.
MIT License
440 stars 107 forks source link

Not comparing the actual correction tokens between hypothesis and reference edits in compare_m2.py #3

Closed gurunathparasaram closed 6 years ago

gurunathparasaram commented 6 years ago
chrisjbryant commented 6 years ago

Heya,

So I think the information you're missing is that extractEdits produces different outputs depending on the command line args. The default option actually does compare (start, end, correction) using line 80-83 and so does produce correction scores.

If you use the -ds or -dt flag however, you can switch the scorer into span-based or token-based detection mode, which is more like the situation you described where we only compare (start, end) edits. This is useful if you want to evaluate a system in terms of how many errors it detected, even if it got the correction wrong.

Hope that helps!

gurunathparasaram commented 6 years ago

Thanks, Chris.Didn't understand it properly(my bad) for the token-based method and expected token-level correction. First, I thought of the possibility of comparing token-level edits based on categories(like comparing (start,end,cat,correction)), but I think we can't attribute error category of a correction of multi token edits to each particular token(Correct me if I am wrong). Span based method seems better. Thanks for the explanation.