negetive total correlation loss for btc-vae

YannDubs / disentangling-vae

Experiments for understanding disentanglement in VAE latent representations

Other

785 stars 143 forks source link

negetive total correlation loss for btc-vae #54

Closed Alan-Qin closed 3 years ago

Alan-Qin commented 4 years ago

Nice Work!!!!!!!!! I tried the beta-TC VAE, but I found that tc_loss is negetive. Actually, this term is KL divergence which is always positive. I am confused about it. Thanks!!!!

YannDubs commented 4 years ago

I don't remember having that issue, but it's important to notice that while the KL is always positive its estimate might not be. Try increasing the batch size, which should improve the estimate.

Justin-Tan commented 4 years ago

I find the same thing on DSprites if beta is set too high, tc_loss becomes negative, even with large batch sizes (O(1e3)). It also seems to affect disentanglement negatively, at least by visual inspection of latent traversals.

Alan-Qin commented 4 years ago

And, I also noticed that in case of Beta-TC VAE if we use MSS to estimate mutual information, it could be positive (50 or 40). But if we just use MWS, the mutual information is close to 0. And if we test vanilla VAE, the mutual information estimated by MWS is also close to 0. Just mnist

sisodia-a commented 3 years ago

I don't remember having that issue, but it's important to notice that while the KL is always positive its estimate might not be. Try increasing the batch size, which should improve the estimate.

It is negative in the results you have also posted. https://github.com/YannDubs/disentangling-vae/blob/master/results/btcvae_dsprites/train_losses.log

For factorVAE, tc_loss is +ve. https://github.com/YannDubs/disentangling-vae/blob/master/results/factor_dsprites/train_losses.log

YannDubs commented 3 years ago

closing in favour of #60