AppraiseDev / Appraise

Appraise code used as part of WMT21 human evaluation campaign
BSD 3-Clause "New" or "Revised" License
22 stars 12 forks source link

MQM/ESA duplicate document entries #140

Closed zouharvi closed 7 months ago

zouharvi commented 7 months ago

There are allegedly duplicate entries in the first document (reported by @martinpopel on EnCs WMT testrun). Could be caused by data generation here or somehow interact with the duplication of the first item to fill 100 segments (likely when the documents are sorted again).

martinpopel commented 7 months ago

Details for debugging: It was the first document (after the tutorial) in RETRACTED. The first and last sentence of the document was src="So anyway, since it does hold true here, to fix it..." trg="Takže každopádně, protože to platí zde, abych to opravil,..." (I have just a screenshot and Appraise does not allow me to go back to the first document, so I include just few words of the sentence.)

Based on the context, I think the sentence was actually the last sentence of the document. It did not make sense as the first sentence. @zouharvi can you check the raw input files - if the duplication is not already there?

zouharvi commented 7 months ago

Fixed in 0f1044bed1447fb8a13788ad5730bd10aabc557b.