Closed Gldkslfmsd closed 3 years ago
cat xx_20090309_015_013_EN_Deva.en.en.asr.BLEU.out Signature: -- WordCount tt1 64 avg wordCount tt* 64
1) it's not clear whether this number corresponds to gold transcript, reference, or the finalized asr output. The message should clarify it.
2) I found out it's OSt, and the number is not true. Number of space-separated words is 56. The number 64 is number of tokens, so the label is wrong.
$ wc -w < ../man-orto-dev2/xx_20090309_015_013_EN_Deva.en.OSt 56 $ moses-tokenizer en < ../man-orto-dev2/xx_20090309_015_013_EN_Deva.en.OSt | wc -w 64
I will replace "Wordcount" with Tokencount and "tt" with reference in the next version. Thanks.
Improved.
1) it's not clear whether this number corresponds to gold transcript, reference, or the finalized asr output. The message should clarify it.
2) I found out it's OSt, and the number is not true. Number of space-separated words is 56. The number 64 is number of tokens, so the label is wrong.