YannDubs / disentangling-vae

Experiments for understanding disentanglement in VAE latent representations
Other
794 stars 146 forks source link

Error in losses explanation? #55

Closed christopher-beckham closed 4 years ago

christopher-beckham commented 4 years ago

Hi,

Looking at this part of the readme, this doesn't seem right:

Standard VAE Loss: α=β=ɣ=1. Each term is computed exactly by a closed form solution (KL between the prior and the posterior). Tightest lower bound.
β-VAEH: α=β=ɣ>1. Each term is computed exactly by a closed form solution. Simply adds a hyper-parameter (β in the paper) before the KL.
β-VAEB: α=β=ɣ>1. Same as β-VAEH but only penalizes the 3 terms once they deviate from a capacity C which increases during training.

The standard VAE is simply gamma=1 with no alpha or beta. For Beta-VAE it is simply gamma > 0 with again no alpha or beta. Did I miss something?

Thanks.

YannDubs commented 4 years ago

Hey @christopher-beckham, I think you are mixing up the KL[q(z|x) || p(z)] (which is usually written in losses for VAE), and \Sum_j KL[q(z_j) || p(z_j)] which is scaled by gamma in the readme. The key thing to understand is that KL[q(z|x) || p(z)] can be decomposed in the three terms used in the readme (cf. https://arxiv.org/abs/1802.04942 and http://approximateinference.org/accepted/HoffmanJohnson2016.pdf ).

Feel free to reopen if I misunderstood what you were saying.

christopher-beckham commented 4 years ago

Ah I see now, that makes sense. Thanks!