Open mshislam opened 3 years ago
Not the author, but this might be a better question for StackOverflow or the like. Essentially, gradient descent cannot be applied when there are stochastic nodes in the computation graph (what is the gradient through the sampling step? this is equivalent to asking what is the derivative of a random variable with respect to the source of the randomness, which can't be computed [in general] although there have been attempts to smartly work around it in specific cases). I think a more common way this problem manifests itself is in the context of the reparameterization trick, so it might help to look into that too.
Is the code using reinforce for adversarial training publicly available?
In the paper you mentioned about experiment of training Adversarially and you said "it is challenging because it is impossible to backpropagate through sampling from the generator" could you please elaborate more on this issue and whether you have found a solution?