LantaoYu / SeqGAN

Implementation of Sequence Generative Adversarial Nets with Policy Gradient
2.09k stars 710 forks source link

Question about Rollout #19

Closed nathan-whitaker closed 7 years ago

nathan-whitaker commented 7 years ago

In this loop:

https://github.com/LantaoYu/SeqGAN/blob/5f2c0a5c978826febe94864da69c77c00f237f81/rollout.py#L79

This is N Time Monte Carlo sampling with n = 16 in the code. But how are the different samples generated? given_num represents how many tokens to use from the input, and irepresents the i'th sample. Why are the samples different for different values of i? Is the rollout network being updated somewhere within call to get_reward and I'm missing it? I also don't see where the randomness is coming in for the Monte Carlo estimation of the partial sequence reward.

From my examination of the code, the network doesn't get updated and the session parameters are the same so I'm not sure how different samples are being generated.

Can someone help me understand how a) different samples are being generated, b) where is the randomness coming from, c) if the rollout network has the same parameters as the Generator network, how is it generating different samples than the generator?

Any help is greatly appreciated! Thank you for providing this code it has been very helpful to me.

nathan-whitaker commented 7 years ago

I think I found what I was looking for. https://github.com/LantaoYu/SeqGAN/blob/5f2c0a5c978826febe94864da69c77c00f237f81/rollout.py#L58

This line contains a call to https://www.tensorflow.org/api_docs/python/tf/multinomial

Which performs a sample over the logits generated from the network instead of taking the max like the generator network does.

guotong1988 commented 7 years ago

Great investigation!

guotong1988 commented 7 years ago

I still don't know exactly what N Time Monte Carlo sampling is.. Could you please explain? Thank you @LantaoYu @nathan-whitaker