Questions about evaluating duplicate corrections

gotutiyan commented 8 months ago

Hi, I have a question about duplicate corrections.

errant_parallel sometimes makes duplicate corrections, e.g.

echo "If you want to actally know somebody you can spend the whole day with that person or place but if you do not , you do not even speak to that person or even go there . " > orig.txt
echo "If you want to actually know somebody , you can spend the whole day with that person or place , but if you do not , you do not even speak to that person or even go there . " > sys.txt
echo "If you want to actually get to know someone , or something , you can spend the whole day with that person , or place , and if you do not , you would n't have reason to even speak to that person , or even go there . " > ref.txt
errant_parallel -orig orig.txt -cor sys.txt -out hyp.m2
errant_parallel -orig orig.txt -cor ref.txt -out ref.m2
errant_compare -hyp hyp.m2 -ref ref.m2

(The above is line 612 of JFLEG-dev. The reference is the first annotation.) In the above case, errant_compare shows

=========== Span-Based Correction ============
TP      FP      FN      Prec    Rec     F0.5
4       0       9       1.0     0.3077  0.6897
==============================================

However, hyp.m2 has only three correction, so TP=4 is strange.

hyp.m2

S If you want to actally know somebody you can spend the whole day with that person or place but if you do not , you do not even speak to that person or even go there .
A 4 5|||R:SPELL|||actually|||REQUIRED|||-NONE-|||0
A 7 7|||M:PUNCT|||,|||REQUIRED|||-NONE-|||0
A 18 18|||M:PUNCT|||,|||REQUIRED|||-NONE-|||0

The reason of this is the duplicate corrections in the reference.
Actually, ref.m2 has two lines of A 7 7|||M:PUNCT|||,|||REQUIRED|||-NONE-|||0. (I don't know why such duplication appears.)

ref.m2

S If you want to actally know somebody you can spend the whole day with that person or place but if you do not , you do not even speak to that person or even go there .
A 4 5|||R:SPELL|||actually|||REQUIRED|||-NONE-|||0
A 5 5|||M:VERB|||get to|||REQUIRED|||-NONE-|||0
A 6 7|||R:NOUN|||someone|||REQUIRED|||-NONE-|||0
A 7 7|||M:PUNCT|||,|||REQUIRED|||-NONE-|||0
A 7 7|||M:CONJ|||or|||REQUIRED|||-NONE-|||0
A 7 7|||M:NOUN|||something|||REQUIRED|||-NONE-|||0
A 7 7|||M:PUNCT|||,|||REQUIRED|||-NONE-|||0
A 16 16|||M:PUNCT|||,|||REQUIRED|||-NONE-|||0
A 18 18|||M:PUNCT|||,|||REQUIRED|||-NONE-|||0
A 18 19|||R:CONJ|||and|||REQUIRED|||-NONE-|||0
A 25 27|||R:OTHER|||would n't have|||REQUIRED|||-NONE-|||0
A 27 27|||M:OTHER|||reason to|||REQUIRED|||-NONE-|||0
A 32 32|||M:PUNCT|||,|||REQUIRED|||-NONE-|||0

During errant_compare, the coder_dict[coder][(7, 7, ',')] has multiple values: ['M:PUNCT', 'M:PUNCT']. This adds two points to the evaluation score because ref_edits[h_edit] has two values (in here)

Is it expected? Personally, I do not think it is desirable for the number of TP to exceed the number of edits of a hypothesis. Possible solutions would be to

Prevent errant.Annotator.annotate() from outputting duplicate corrections.
Ensure that coder_dict variable in errant.commands.compare_m2.py only has a single value (now it is a list).

Thank you for your development of ERRANT! (This is an aside, but I am developing an API-based errant_compare and noticed this problem because the my results did not match the official results.)

chrisjbryant commented 8 months ago

Heya! That's a good question. It looks like a case of ERRANT not merging something I would have hoped it would merge.

Specifically, it seems the human annotator wanted to add , or something , (note the insertion of 2 commas) into the reference to make it make sense with the reference to a place later in the sentence. I would have hoped ERRANT would group this together as a single multi-word insertion edit: e.g.

A 7 7|||M:OTHER|||, or something,|||REQUIRED|||-NONE-|||0

... but it instead chooses to split it into 4 separate insertion edits at the same place.

A 7 7|||M:PUNCT|||,|||REQUIRED|||-NONE-|||0
A 7 7|||M:CONJ|||or|||REQUIRED|||-NONE-|||0
A 7 7|||M:NOUN|||something|||REQUIRED|||-NONE-|||0
A 7 7|||M:PUNCT|||,|||REQUIRED|||-NONE-|||0

Since the hypothesis correctly inserted a comma after somebody however, it looks as though it matches both the comma at the start of the phrase and the comma at the end of the phrase. The fix would thus be to make sure the reference edits are not split into several smaller edits, but sadly, I know I can't do that without negatively affecting other edit alignments.

I agree that it doesn't make sense for the number of TPs to exceed the number of edits, but this looks like a one-in-a-million kind of edge case to me. The errant.Annotator.annotate() has actually output the correct number of edits from the reference, and if the hypothesis matched the reference, then it should be rewarded with 2 TPs for matching both commas.

In short - there's not a lot I can do about it other than say you found a needle in a haystack!

gotutiyan commented 8 months ago

Thank you for your reply. Now I understand that this behavior is reasonable from your kind explanation.

this looks like a one-in-a-million kind of edge case

I agree. We would almost never encounter such a case :joy:

Thanks again!

chrisjbryant / errant

Questions about evaluating duplicate corrections #49