Closed xiamengzhou closed 3 years ago
downsized the max-token from 20000 to 6000 for translation
I agree that the down-sizing the inference shouldn't change the result. Few questions to help me identify the problem: 1) What's the performance of fully supervised model you have? 2) Could you share the full training logs of each iteration?
I tried to reproduce the semi-supervised training the flores en-ne datasets using the reproduce.sh script. I used 4 V-100 gpus and it took around 2 days to finish the experiment. But unfortunately, our score (3.69) is worse than the reported one in the paper (6.8). I basically just ran the script following the instructions on the github repo. The only minor change I did is During back-translation, I downsized the max-token from 20000 to 6000 for translation (not for training so it does not matter) because otherwise I ran out of gpu memory easily.
I am just wondering if there's anything wrong with my experimental settings?