About KL annealing issue

asyml / texar

Toolkit for Machine Learning, Natural Language Processing, and Text Generation, in TensorFlow. This is part of the CASL project: http://casl-project.ai/

https://asyml.io

Apache License 2.0

2.39k stars 374 forks source link

About KL annealing issue #87

Closed dongqian0206 closed 5 years ago

dongqian0206 commented 5 years ago

Hi, Zhiting.

I noticed that you used KL annealing strategy like this in vae_text.py

anneal_r = 1.0 / (config.kl_anneal_hparams["warm_up"] * (train_data.dataset_size() / config.batch_size)) opt_vars["kl_weight"] + anneal_r

Is this a common way to do like this? or you just chose one way? Different kl strategy is likely to affect the final performance.

best, Dong Qian

jxhe commented 5 years ago

Hello,

This annealing strategy basically increases KL weight linearly to 1.0 in the first warm_up epochs. By tuning warm_up parameter the annealing procedure can be made more aggressive or conservative. I don't think there is a standard way to implement annealing, but our implementation is a pretty common strategy which has been also used in the following papers (from what I know):

(Yang et al., 2017) Improved Variational Autoencoders for Text Modeling using Dilated Convolutions (Kim et al., 2018) Semi-Amortized Variational Autoencoders (He et al., 2019) Lagging Inference Networks and Posterior Collapse in Variational Autoencoders

dongqian0206 commented 5 years ago

Hello,

This annealing strategy basically increases KL weight linearly to 1.0 in the first warm_up epochs. By tuning warm_up parameter the annealing procedure can be made more aggressive or conservative. I don't think there is a standard way to implement annealing, but our implementation is a pretty common strategy which has been also used in the following papers (from what I know):

(Yang et al., 2017) Improved Variational Autoencoders for Text Modeling using Dilated Convolutions (Kim et al., 2018) Semi-Amortized Variational Autoencoders (He et al., 2019) Lagging Inference Networks and Posterior Collapse in Variational Autoencoders

Thanks for your reply.