Reconstruction loss in ELBO

Paulmzr commented 5 days ago

Hi, thanks for your great work!

I notice there is a discretized_gaussian_log_likelihood function to estimate the log-likelihood of the reconstructed representation from $x_1$. As the VAE has already encoded the images to continuous latent space, I am confused why we need this function to estimate the log-likelihood of a Gaussian distribution discretizing to an image ground truth? Why not we directly use the MSE loss ( i.e., $|x_0 - {x^{reconstruct}_0}|$) to optimize the log-likelihood of the reconstructed latent representation?

Looking forward to your reply. Thanks in advance!

LTH14 commented 5 days ago

Thanks for your interest! This VLB loss exactly follows the iDDPM and DiT design. However, we also conducted experiments without the VLB loss (reconstruction loss only), and the performance is the same.

Paulmzr commented 4 days ago

@LTH14 thanks for your response!

LTH14 / mar

Reconstruction loss in ELBO #51