Open dberma15 opened 5 years ago
Hi,
1) The input sequences are the odd lines and the paired ground-truths are the even lines. 2) The dev data is "development set" or "validation set". It is used to early-stop the training.
Hope you find it helpful!
Thanks so much. That did help. I was wondering if you could explain a bit about how the reward works in this? What are the rewards at each time step and how are they determined?
The rewards for the GAN-based models are predicted by the discriminator (D). Each reward can be interpreted as the expected D's score the current sub-sequence will obtain. Then, the generator will tend to generate sequence s that can achieve higher expected D's scores.
Is the algorithm using teacher forcing when training the decoder?
Hi, I'm looking to apply this to my own data after reading the paper "Improving Conditional Sequence Generative Adversarial Networks by Stepwise Evaluation". I'm a bit confused in looking at the code, though. Where exactly are the for the input sequence and the ground-truth is read into the code? I see the training and test sequences, but those appear to be the input sequences. I'm not sure where the ground-truth responses are. As a follow up, what is the dev data?
Thanks