Closed hdeval1 closed 1 year ago
That sounds suspicious. I would not trust those scores. It sounds like heavy overfitting. Could it be that dev or test data are included in your fine-tuning data?
I don't think so, I check for duplicates before starting the tuning process...I will double check that though. Other languages we have tuned did not seem to have drastic BLEU score changes like this one though. I am doing the 80-10-10 split of the data, and default tuning parameters if that gives you any more clue to what it could be.
Closing the loop here - I was skipping some pre/post processing data steps which was messing things up. It seems to be good now!
Hi! I have gotten OPUS-MT-Train working with the current models being finetuned utilizing my own data sets (TMX) format. I did one language recently, and it looked like the BLEU score went from 14 on the baseline to 80 on the finetuned. I didn't see any errors in the code and utilized the best-dist-tune recipe with a separate data set not utilizing in the tuning process. I was wondering if you have seen this before or ever experienced a similar issue?