Open dchaplinsky opened 11 months ago
I used the native transformers library for decoding the testsets and beam size 1 (if I remember correctly). BLEU scores are computed with sacrebleu and default settings. There are no individual scores per sentence pair.
Thanks. No source code left for the eval, so I can dig it myself rather than bothering you?
Hello @jorgtied!
I'm trying to reproduce the reported results for eng-ukr language pair for m2m100 on flores200 dataset but the score I get is much lower (26.8->21.0).
My setup is: cTranslate2, this model and HF's evaluate (the code is available here. The dataset is the same (Flores200, devtest).
My main suspects are:
I've browsed the repos I found on opus-mt leaderboard and other seemingly relevant repos from Helsinki-NLP account. I also glimpsed through the main paper.
Could you please advise on the following things?
Thanks in advance!