Closed zouharvi closed 7 months ago
Details for debugging: It was the first document (after the tutorial) in RETRACTED
. The first and last sentence of the document was src="So anyway, since it does hold true here, to fix it..." trg="Takže každopádně, protože to platí zde, abych to opravil,..."
(I have just a screenshot and Appraise does not allow me to go back to the first document, so I include just few words of the sentence.)
Based on the context, I think the sentence was actually the last sentence of the document. It did not make sense as the first sentence. @zouharvi can you check the raw input files - if the duplication is not already there?
There are allegedly duplicate entries in the first document (reported by @martinpopel on EnCs WMT testrun). Could be caused by data generation here or somehow interact with the duplication of the first item to fill 100 segments (likely when the documents are sorted again).