Open Paulmzr opened 5 days ago
Thanks for your interest! This VLB loss exactly follows the iDDPM and DiT design. However, we also conducted experiments without the VLB loss (reconstruction loss only), and the performance is the same.
@LTH14 thanks for your response!
Hi, thanks for your great work!
I notice there is a
discretized_gaussian_log_likelihood
function to estimate the log-likelihood of the reconstructed representation from $x_1$. As the VAE has already encoded the images to continuous latent space, I am confused why we need this function to estimate the log-likelihood of a Gaussian distribution discretizing to an image ground truth? Why not we directly use the MSE loss ( i.e., $|x_0 - {x^{reconstruct}_0}|$) to optimize the log-likelihood of the reconstructed latent representation?Looking forward to your reply. Thanks in advance!