google-research / electra

ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators
Apache License 2.0
2.33k stars 352 forks source link

Adversarially training #117

Open mshislam opened 3 years ago

mshislam commented 3 years ago

In the paper you mentioned about experiment of training Adversarially and you said "it is challenging because it is impossible to backpropagate through sampling from the generator" could you please elaborate more on this issue and whether you have found a solution?

gokart23 commented 3 years ago

Not the author, but this might be a better question for StackOverflow or the like. Essentially, gradient descent cannot be applied when there are stochastic nodes in the computation graph (what is the gradient through the sampling step? this is equivalent to asking what is the derivative of a random variable with respect to the source of the randomness, which can't be computed [in general] although there have been attempts to smartly work around it in specific cases). I think a more common way this problem manifests itself is in the context of the reparameterization trick, so it might help to look into that too.

afcruzs commented 3 years ago

Is the code using reinforce for adversarial training publicly available?