ELITR / SLTev

SLTev is a tool for comprehensive evaluation of (simultaneous) spoken language translation.
8 stars 3 forks source link

words or tokens? #46

Closed Gldkslfmsd closed 3 years ago

Gldkslfmsd commented 3 years ago
cat xx_20090309_015_013_EN_Deva.en.en.asr.BLEU.out
Signature:  
--       WordCount     tt1                    64
avg      wordCount     tt*                    64

1) it's not clear whether this number corresponds to gold transcript, reference, or the finalized asr output. The message should clarify it.

2) I found out it's OSt, and the number is not true. Number of space-separated words is 56. The number 64 is number of tokens, so the label is wrong.

$ wc -w < ../man-orto-dev2/xx_20090309_015_013_EN_Deva.en.OSt
56 
$ moses-tokenizer en < ../man-orto-dev2/xx_20090309_015_013_EN_Deva.en.OSt | wc -w
64
mohammad2928 commented 3 years ago

I will replace "Wordcount" with Tokencount and "tt" with reference in the next version. Thanks.

mohammad2928 commented 3 years ago

Improved.