ELITR / SLTev

SLTev is a tool for comprehensive evaluation of (simultaneous) spoken language translation.
8 stars 3 forks source link

ASRev: Accented characters #60

Closed mzilinec closed 3 years ago

mzilinec commented 3 years ago

Could we have an option (switch) to ignore accented characters in ASRev? In the case that reference transcripts are not accented, word error rate would be reported much higher than it actually is. I have just discovered this issue while evaluating for german (e.g. uber vs über).

(This can be done in one function call using unidecode)

mohammad2928 commented 3 years ago

Hi, There is no such option, but please explain more about the problem for solving it. I guess we need to add references with accents in addition to without accents. Anyway, if you have an idea for solving it please state it.

obo commented 3 years ago

As we discussed in the call, missing accents should be seen as a problem of the dataset that deserves a fix. SLTev should thus not have an option to ignore accents; instead remove accents from all your files and evaluate non-accented manually, if that is the only option.