Closed nathan-whitaker closed 7 years ago
I think I found what I was looking for. https://github.com/LantaoYu/SeqGAN/blob/5f2c0a5c978826febe94864da69c77c00f237f81/rollout.py#L58
This line contains a call to https://www.tensorflow.org/api_docs/python/tf/multinomial
Which performs a sample over the logits generated from the network instead of taking the max like the generator network does.
Great investigation!
I still don't know exactly what N Time Monte Carlo sampling is.. Could you please explain? Thank you @LantaoYu @nathan-whitaker
In this loop:
https://github.com/LantaoYu/SeqGAN/blob/5f2c0a5c978826febe94864da69c77c00f237f81/rollout.py#L79
This is N Time Monte Carlo sampling with n = 16 in the code. But how are the different samples generated?
given_num
represents how many tokens to use from the input, andi
represents the i'th sample. Why are the samples different for different values of i? Is the rollout network being updated somewhere within call toget_reward
and I'm missing it? I also don't see where the randomness is coming in for the Monte Carlo estimation of the partial sequence reward.From my examination of the code, the network doesn't get updated and the session parameters are the same so I'm not sure how different samples are being generated.
Can someone help me understand how a) different samples are being generated, b) where is the randomness coming from, c) if the rollout network has the same parameters as the Generator network, how is it generating different samples than the generator?
Any help is greatly appreciated! Thank you for providing this code it has been very helpful to me.