Unbelievably High BLEU scores from finetuning...

Helsinki-NLP / OPUS-MT-train

Training open neural machine translation models

MIT License

323 stars 40 forks source link

Unbelievably High BLEU scores from finetuning... #90

Closed hdeval1 closed 1 year ago

hdeval1 commented 1 year ago

Hi! I have gotten OPUS-MT-Train working with the current models being finetuned utilizing my own data sets (TMX) format. I did one language recently, and it looked like the BLEU score went from 14 on the baseline to 80 on the finetuned. I didn't see any errors in the code and utilized the best-dist-tune recipe with a separate data set not utilizing in the tuning process. I was wondering if you have seen this before or ever experienced a similar issue?

jorgtied commented 1 year ago

That sounds suspicious. I would not trust those scores. It sounds like heavy overfitting. Could it be that dev or test data are included in your fine-tuning data?

hdeval1 commented 1 year ago

I don't think so, I check for duplicates before starting the tuning process...I will double check that though. Other languages we have tuned did not seem to have drastic BLEU score changes like this one though. I am doing the 80-10-10 split of the data, and default tuning parameters if that gives you any more clue to what it could be.

hdeval1 commented 1 year ago

Closing the loop here - I was skipping some pre/post processing data steps which was messing things up. It seems to be good now!