Helsinki-NLP / OPUS-MT-leaderboard

Creative Commons Attribution Share Alike 4.0 International
1 stars 1 forks source link

Excluding multi30k_task2_test_2016 dataset from leaderboard fro eng-deu/deu-eng language pair #2

Open schniewmatz opened 1 year ago

schniewmatz commented 1 year ago

I am using the leaderboard to decide which model to choose for which language pair. I find it a very good basis as one obtains an average over a whole set of benchmarks and can - to some extend -judge how stable a model performs.
Going through the example outputs in detail, I nevertheless realized, that the multi30k_task2_test_2016 dataset mostly contains pairs of - almost - unrelated source and reference, for example:

SOURCE: The man with pierced ears is wearing glasses and an orange hat. REFERENCE: Der Mann trägt eine orange Wollmütze.

Here the pierced ears and the glasses are not present in the reference.

Or even worse:

SOURCE: Two men sitting on the roof of a house while another one stands on a ladder. REFERENCE: Dachdecker bei der Arbeit.

Here the reference would be transated as "roofers at work". This is similar for the other examples in the dataset.

I do not know if this dataset has any other valid use case, but I don't find it useful to judge machine translation quality. Could you remove it from the leaderboard?