I'm running the evaluation script on my en-de system according to the steps in the LREC2020 directory. This test set has 35315 examples which is how many I've translated:
However, the eval script tells me that my lines mismatch:
$ cat eval.sh
#!/usr/bin/env bash
python3 evaluate.py \
--ref-testsuite en-de.test.txt.gz \
--sense-file senses.en-de.txt \
--dist-file distances.en-de.txt \
--src-segmented src_segmented.txt \
--tgt-segmented my_out.de.tok \
--tgt-lemmatized my_out.de.conllu
$
$ bash eval.sh
Number of sentences does not match
Reference file: 35315
Segmented source file: 43481
Lemmatized system output: 43481
Segmented system output: 43481
When I print line before this message, I see defaultdict(<class 'int'>, {'total': 43481, 'missing_ref': 8166}), but there don't seem to be any missing refs:
OK, I found the issue. My CONLL file was being sentence-split on some punctuation (namely ;) so replacing \n\n with \n to account for the over-splitting seems to have gotten things into better shape!
I'm running the evaluation script on my en-de system according to the steps in the
LREC2020
directory. This test set has 35315 examples which is how many I've translated:However, the eval script tells me that my lines mismatch:
When I print
line
before this message, I seedefaultdict(<class 'int'>, {'total': 43481, 'missing_ref': 8166})
, but there don't seem to be any missing refs:Is there something obviously wrong here?