Closed Alan-Qin closed 3 years ago
I don't remember having that issue, but it's important to notice that while the KL is always positive its estimate might not be. Try increasing the batch size, which should improve the estimate.
I find the same thing on DSprites if beta is set too high, tc_loss becomes negative, even with large batch sizes (O(1e3)). It also seems to affect disentanglement negatively, at least by visual inspection of latent traversals.
And, I also noticed that in case of Beta-TC VAE if we use MSS to estimate mutual information, it could be positive (50 or 40). But if we just use MWS, the mutual information is close to 0. And if we test vanilla VAE, the mutual information estimated by MWS is also close to 0. Just mnist
I don't remember having that issue, but it's important to notice that while the KL is always positive its estimate might not be. Try increasing the batch size, which should improve the estimate.
It is negative in the results you have also posted. https://github.com/YannDubs/disentangling-vae/blob/master/results/btcvae_dsprites/train_losses.log
For factorVAE, tc_loss is +ve. https://github.com/YannDubs/disentangling-vae/blob/master/results/factor_dsprites/train_losses.log
closing in favour of #60
Nice Work!!!!!!!!! I tried the beta-TC VAE, but I found that tc_loss is negetive. Actually, this term is KL divergence which is always positive. I am confused about it. Thanks!!!!