Helsinki-NLP / OPUS-MT-leaderboard

Creative Commons Attribution Share Alike 4.0 International
1 stars 1 forks source link

Reproducibility information benchmarks #4

Open BramVanroy opened 4 months ago

BramVanroy commented 4 months ago

Hello

I've been looking to do a large-scale comparison of (m)any MT model that has to do with Dutch (xx->NL, NL->XX) with all the test sets that I can find. The OPUS leaderboard is a great starting point for me. In a first step, I would like to reproduce the scores in the OPUS leaderboard. For reproducibiliy sake it would therefore be useful if there is an overview of some meta information on the benchmarks:

If you can share any info about this, I'd be grateful!

Bram

jorgtied commented 4 months ago

We should make this more transparent. Part of the answers are hidden in the scripts we use for evaluation. Look at those makefile targets: https://github.com/Helsinki-NLP/OPUS-MT-leaderboard-recipes/blob/master/eval.mk

For huggingface models we mainly use this script: https://github.com/Helsinki-NLP/External-MT-leaderboard/blob/master/models/huggingface/translate.py called by https://github.com/Helsinki-NLP/External-MT-leaderboard/blob/master/models/huggingface/Makefile

Alternatively also scripts from here: https://github.com/Helsinki-NLP/External-MT-leaderboard/tree/master/models/huggingface-accelerate

For NLLB and m2m100 from facebook, look at https://github.com/Helsinki-NLP/External-MT-leaderboard/blob/master/models/huggingface/facebook/Makefile

For COMET: This is such a moving target with all kinds of models coming out and also slight changes with the implementation.

For newer OPUS-MT models there are also logfiles like this one: https://opus.nlpl.eu/dashboard/logfile.php?model1=unknown&model2=unknown&test=newstest2018&scoreslang=all&model=Tatoeba-MT-models%2Ffin-eng%2FopusTCv20210807%2Bnopar%2Bft95-sepvoc_transformer-tiny11-align_2023-07-03&src=fin&trg=eng&pkg=opusmt

This does not tell you everything you want to know but at least some partial information and pointers.

jorgtied commented 3 months ago

I forgot to mention that signatures for sacrebleu are available from the repo, for example, https://github.com/Helsinki-NLP/OPUS-MT-leaderboard/blob/master/models/Tatoeba-MT-models/afr-deu/opus-2021-02-18/ntrex128.afr-deu.eval