Helsinki-NLP / OPUS-MT-leaderboard

Creative Commons Attribution Share Alike 4.0 International
1 stars 1 forks source link

Having troubles reproducing results for m2m100 1.2b #3

Open dchaplinsky opened 11 months ago

dchaplinsky commented 11 months ago

Hello @jorgtied!

I'm trying to reproduce the reported results for eng-ukr language pair for m2m100 on flores200 dataset but the score I get is much lower (26.8->21.0).

My setup is: cTranslate2, this model and HF's evaluate (the code is available here. The dataset is the same (Flores200, devtest).

My main suspects are:

I've browsed the repos I found on opus-mt leaderboard and other seemingly relevant repos from Helsinki-NLP account. I also glimpsed through the main paper.

Could you please advise on the following things?

Thanks in advance!

jorgtied commented 11 months ago

I used the native transformers library for decoding the testsets and beam size 1 (if I remember correctly). BLEU scores are computed with sacrebleu and default settings. There are no individual scores per sentence pair.

dchaplinsky commented 10 months ago

Thanks. No source code left for the eval, so I can dig it myself rather than bothering you?