enricivi / adversarial_training_methods

Implementation of the methods proposed in **Adversarial Training Methods for Semi-Supervised Text Classification** on IMDB dataset (without pre-training)
MIT License
40 stars 11 forks source link

The question about gradient in VAT #7

Open RenShuhuai-Andy opened 5 years ago

RenShuhuai-Andy commented 5 years ago

Hi, I have some questions about the gradient in VAT. The functrion get_v_adv_loss(self, ul_batch, p_mult, power_iterations=1) in paper_network.ipynb has one statement:

gradient = tf.gradients(kl, [d], aggregation_method=tf.AggregationMethod.EXPERIMENTAL_ACCUMULATE_N)[0]

Is this used for calculating $g=\nabla_{s+d} KL[p(·|s;\hat{\theta})||p(·|s+d;\hat{\theta})]$ defined in Eq.(7) in the original paper?

But in the original paper, the author cal the gradient of KL to $s+d$, while what you did in your code was the gradient of KL to [d], why? Are they the same? Thanks!