Closed dongqian0206 closed 5 years ago
Hello,
This annealing strategy basically increases KL weight linearly to 1.0 in the first warm_up
epochs. By tuning warm_up
parameter the annealing procedure can be made more aggressive or conservative. I don't think there is a standard way to implement annealing, but our implementation is a pretty common strategy which has been also used in the following papers (from what I know):
(Yang et al., 2017) Improved Variational Autoencoders for Text Modeling using Dilated Convolutions (Kim et al., 2018) Semi-Amortized Variational Autoencoders (He et al., 2019) Lagging Inference Networks and Posterior Collapse in Variational Autoencoders
Hello,
This annealing strategy basically increases KL weight linearly to 1.0 in the first
warm_up
epochs. By tuningwarm_up
parameter the annealing procedure can be made more aggressive or conservative. I don't think there is a standard way to implement annealing, but our implementation is a pretty common strategy which has been also used in the following papers (from what I know):(Yang et al., 2017) Improved Variational Autoencoders for Text Modeling using Dilated Convolutions (Kim et al., 2018) Semi-Amortized Variational Autoencoders (He et al., 2019) Lagging Inference Networks and Posterior Collapse in Variational Autoencoders
Thanks for your reply.
Hi, Zhiting.
I noticed that you used KL annealing strategy like this in vae_text.py
anneal_r = 1.0 / (config.kl_anneal_hparams["warm_up"] * (train_data.dataset_size() / config.batch_size)) opt_vars["kl_weight"] + anneal_r
Is this a common way to do like this? or you just chose one way? Different kl strategy is likely to affect the final performance.
best, Dong Qian