ZhenYangIACAS / NMT_GAN

generative adversarial nets for neural machine translation
Apache License 2.0
119 stars 37 forks source link

Discriminator pre-training time #6

Closed wangyirui closed 5 years ago

wangyirui commented 6 years ago

Hi, how long will it take to pre-train the discriminator to achieve the accuracy of 0.82? Thanks!

ZhenYangIACAS commented 6 years ago

In my experiment setting, 4 K80 GPU, it only takes around several hours to pre-train the discriminator. Sorry, I don't remember it clearly. But I am sure that the training time is not long.

wangyirui commented 6 years ago

OK. So in each epoch, you randomly sample 5000 samples from the whole training set (4.5M sentence pairs as mentioned in your paper), right? Currently, I use about 150k training samples to pre-train it. After achieving about 72% accuracy on the validation set, it starts overfitting.

wangyirui commented 6 years ago

And did you use any dropout or l2 regularization?

ZhenYangIACAS commented 6 years ago

@wangyirui Not right. randomly sample 5000 examples is only utilized in the process of jointly training. During pre-training the discriminator, we collect 1M positive data and 1M negative samples. We didn't use any dropout or l2 regularization. Have you tested your translation performance? I mean, not just focus on the accuracy of the discriminator.

wangyirui commented 6 years ago

OK, I got it. So the 1 million positive samples and 1 million negative samples are fixed, and will not be re-sampled during the pre-train, right? In addition, I want to confirm that if we denote the positive sample pairs as (S_pos, T_pos), and negative sample pairs as (S_neg, T_neg). So the negative samples source tokens S_pos is exactly the same as negative samples source tokens S_neg, right? In other words, given 1M source tokens, we have corresponding 1M ground truth tokens and 1M machine-translation tokens, right? Thanks!!!

ZhenYangIACAS commented 6 years ago

Yes, you are right. The positive and negative examples correspond to the same source sentence.

wangyirui commented 6 years ago

Thanks. I think the problem is my training set is relatively small (only 153K, roughly as 1/10 as your data size) compared to your 1M training data, so using the similar discriminator model might get overfitting. I will try to increase the training set. Thanks again!

kellymarchisio commented 6 years ago

@wangyirui - were you able to make the discriminator work? Mine also starts to overfit it hits around 0.71 accuracy, with 1M training data @ZhenYangIACAS - looks like wangyirui and I may have had the same issue.

wangyirui commented 6 years ago

@kellymarchisio I have the same problem with you.... starts to overfit after achieved 0.71 accuracy....

wangyirui commented 6 years ago

@kellymarchisio Have you figured out the problem?

kellymarchisio commented 6 years ago

Unfortunately not. I suppose you haven't either?

On Fri, May 25, 2018 at 7:28 PM, wangyirui notifications@github.com wrote:

@kellymarchisio https://github.com/kellymarchisio Have you figured out the problem?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ZhenYangIACAS/NMT_GAN/issues/6#issuecomment-392220226, or mute the thread https://github.com/notifications/unsubscribe-auth/ADcg6Nk2TE41HFCV3QGUWgDEnWvjE4-_ks5t2KGdgaJpZM4TYNB5 .

wangyirui commented 6 years ago

@kellymarchisio yeah, I still can't find out the problem.....I always get overfitting around 71% accuracy on val set

JianWenJun commented 6 years ago

@ZhenYangIACAS the accuracy of 0.82,Does this accuracy rate refer to the prediction accuracy rate on the training set or the prediction accuracy rate on the verification set? The correct rate on the training set as seen in your discriminant model pre-training code.

ZhenYangIACAS commented 6 years ago

@kellymarchisio Did you get the fake examples by greedy search? or how is the quality of your true data? I believe this accuracy is correlated with the quality of your own data.

ZhenYangIACAS commented 6 years ago

@JianWenJun, In our code, I delete the validation process since I have know how much steps should I run my code. Without validation process, we can save some time for training. Actually, for your running this code at the first time, you should test its accuracy on the validation set.