Implementation of the methods proposed in **Adversarial Training Methods for Semi-Supervised Text Classification** on IMDB dataset (without pre-training)
Hi, I have some questions about the gradient in VAT.
The functrion get_v_adv_loss(self, ul_batch, p_mult, power_iterations=1) in paper_network.ipynb has one statement:
Is this used for calculating $g=\nabla_{s+d} KL[p(·|s;\hat{\theta})||p(·|s+d;\hat{\theta})]$ defined in Eq.(7) in the original paper?
But in the original paper, the author cal the gradient of KL to $s+d$, while what you did in your code was the gradient of KL to [d], why? Are they the same? Thanks!
Hi, I have some questions about the gradient in VAT. The functrion
get_v_adv_loss(self, ul_batch, p_mult, power_iterations=1)
inpaper_network.ipynb
has one statement:Is this used for calculating $g=\nabla_{s+d} KL[p(·|s;\hat{\theta})||p(·|s+d;\hat{\theta})]$ defined in Eq.(7) in the original paper?
But in the original paper, the author cal the gradient of KL to $s+d$, while what you did in your code was the gradient of KL to
[d]
, why? Are they the same? Thanks!