Nan in loss function in TC-Beta VAE

AntixK / PyTorch-VAE

A Collection of Variational Autoencoders (VAE) in PyTorch.

Apache License 2.0

6.46k stars 1.05k forks source link

Nan in loss function in TC-Beta VAE #16

Open arijitthegame opened 3 years ago

arijitthegame commented 3 years ago

Hi,

I am running TC-Beta VAE on my data and I changed my architecture to an MLP encoder and Decoder. But I am getting nan in the loss function. And it seems I am getting nans for log_importance_weights, log_q_z and log_prod_q_z. Should I just add an epsilon to each of these quantities before taking log or there is some other issue that I am missing.

gaffli commented 3 years ago

I too have problem with my InfoVAE implementation. I use kernels of size 7 instead of the original size in the implementation and a latent dim size of 30 instead of 128. My kld_loss is growing very fast from ~2000 to 2*e^35 to infinite to nan in about 5 training steps. I use the kld_weight which is about 1/1400 but the loss does not seem affected. Maybe there is a similar reason for our NaN's?

HaoZhang990127 commented 2 years ago

I guess may be you can reduce the learning rate to solve this problem.