Closed wangyirui closed 5 years ago
In my experiment setting, 4 K80 GPU, it only takes around several hours to pre-train the discriminator. Sorry, I don't remember it clearly. But I am sure that the training time is not long.
OK. So in each epoch, you randomly sample 5000 samples from the whole training set (4.5M sentence pairs as mentioned in your paper), right? Currently, I use about 150k training samples to pre-train it. After achieving about 72% accuracy on the validation set, it starts overfitting.
And did you use any dropout or l2 regularization?
@wangyirui Not right. randomly sample 5000 examples is only utilized in the process of jointly training. During pre-training the discriminator, we collect 1M positive data and 1M negative samples. We didn't use any dropout or l2 regularization. Have you tested your translation performance? I mean, not just focus on the accuracy of the discriminator.
OK, I got it. So the 1 million positive samples and 1 million negative samples are fixed, and will not be re-sampled during the pre-train, right? In addition, I want to confirm that if we denote the positive sample pairs as (S_pos, T_pos), and negative sample pairs as (S_neg, T_neg). So the negative samples source tokens S_pos is exactly the same as negative samples source tokens S_neg, right? In other words, given 1M source tokens, we have corresponding 1M ground truth tokens and 1M machine-translation tokens, right? Thanks!!!
Yes, you are right. The positive and negative examples correspond to the same source sentence.
Thanks. I think the problem is my training set is relatively small (only 153K, roughly as 1/10 as your data size) compared to your 1M training data, so using the similar discriminator model might get overfitting. I will try to increase the training set. Thanks again!
@wangyirui - were you able to make the discriminator work? Mine also starts to overfit it hits around 0.71 accuracy, with 1M training data @ZhenYangIACAS - looks like wangyirui and I may have had the same issue.
@kellymarchisio I have the same problem with you.... starts to overfit after achieved 0.71 accuracy....
@kellymarchisio Have you figured out the problem?
Unfortunately not. I suppose you haven't either?
On Fri, May 25, 2018 at 7:28 PM, wangyirui notifications@github.com wrote:
@kellymarchisio https://github.com/kellymarchisio Have you figured out the problem?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ZhenYangIACAS/NMT_GAN/issues/6#issuecomment-392220226, or mute the thread https://github.com/notifications/unsubscribe-auth/ADcg6Nk2TE41HFCV3QGUWgDEnWvjE4-_ks5t2KGdgaJpZM4TYNB5 .
@kellymarchisio yeah, I still can't find out the problem.....I always get overfitting around 71% accuracy on val set
@ZhenYangIACAS the accuracy of 0.82,Does this accuracy rate refer to the prediction accuracy rate on the training set or the prediction accuracy rate on the verification set? The correct rate on the training set as seen in your discriminant model pre-training code.
@kellymarchisio Did you get the fake examples by greedy search? or how is the quality of your true data? I believe this accuracy is correlated with the quality of your own data.
@JianWenJun, In our code, I delete the validation process since I have know how much steps should I run my code. Without validation process, we can save some time for training. Actually, for your running this code at the first time, you should test its accuracy on the validation set.
Hi, how long will it take to pre-train the discriminator to achieve the accuracy of 0.82? Thanks!