Helsinki-NLP / OPUS-MT-leaderboard

Creative Commons Attribution Share Alike 4.0 International
1 stars 1 forks source link

Having troubles reproducing results for m2m100 1.2b #3

Open dchaplinsky opened 7 months ago

dchaplinsky commented 7 months ago

Hello @jorgtied!

I'm trying to reproduce the reported results for eng-ukr language pair for m2m100 on flores200 dataset but the score I get is much lower (26.8->21.0).

My setup is: cTranslate2, this model and HF's evaluate (the code is available here. The dataset is the same (Flores200, devtest).

My main suspects are:

I've browsed the repos I found on opus-mt leaderboard and other seemingly relevant repos from Helsinki-NLP account. I also glimpsed through the main paper.

Could you please advise on the following things?

Thanks in advance!

jorgtied commented 6 months ago

I used the native transformers library for decoding the testsets and beam size 1 (if I remember correctly). BLEU scores are computed with sacrebleu and default settings. There are no individual scores per sentence pair.

dchaplinsky commented 6 months ago

Thanks. No source code left for the eval, so I can dig it myself rather than bothering you?