Closed HuihuiChyan closed 4 years ago
Hello,
Thanks for the interest. We will release the test set labels after the deadline for system submission. We will look into the discrepancies.
Best, Lucia
On Sat, 11 Jul 2020 at 03:58, HuihuiChyan notifications@github.com wrote:
I finetuned Bert on en-de data, and achieve pearson coefficient of 0.31 on dev set. Since the golden label is not available for test set, so I tested my model on en-de training set, totally 7000 sentence pairs (I did not use this training set so it is Okay to use it to do testing), but I got only 0.12 pearson coefficient.
So for en-de, it is 0.31 on dev set, and 0.12 on training set. Why the gap is so big?
I tried the same procedure on en-zh data, and it was 0.41 on dev set and 0.38 on training set. It seems for en-zh data there is no problem.
Besides, when will the test label be available? I really want to make comparison with the results in your paper.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/facebookresearch/mlqe/issues/1, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAKFUBEXLZ5RUO2PWVRLHU3R27IPBANCNFSM4OXC5C6A .
-- Lucia www.dcs.shef.ac.uk/~lucia/
I finetuned Bert on en-de data, and achieve pearson coefficient of 0.31 on dev set. Since the golden label is not available for test set, so I tested my model on en-de training set, totally 7000 sentence pairs (I did not use this training set so it is Okay to use it to do testing), but I got only 0.12 pearson coefficient.
So for en-de, it is 0.31 on dev set, and 0.12 on training set. Why the gap is so big?
I tried the same procedure on en-zh data, and it was 0.41 on dev set and 0.38 on training set. It seems for en-zh data there is no problem.
Besides, when will the test label be available? I really want to make comparison with the results in your paper.