Closed BimDav closed 3 years ago
Yeah, we added this with the hope of more model stability. It doesn't completely fix the stability issues but it did help a little bit. This processing ensures that mu and log_sigma stay in [-5, 5] so that the KL divergence doesn't blow up.
Thank you. Are all experiments in the paper done with this pre processing ? I don't really understand the Unbounded KL problem if the parameters are clamped: isn't the KL bounded because of this ?
Yes, all experiments are done with this.
KL per latent variable is bounded but it can be still large for each variable. We have many latent variables in the model, and when instability happens, with a small mismatch between encoder and prior, these KL values per latent variable add up together and become extremely large.
Thank you
Hi, thank you for your outstanding work in making VAEs great again !
My question is about the pre processing of Gaussian parameters in distributions.py:
I don't think this is discussed in the paper, what is the role of this pre processing ? It seems to be linked with the model's stability when I remove it. Do you have results on the relationship between this and the other stabilization methods discussed in the paper ?
Thank you