Open bcpenggh opened 6 years ago
Hi @bcpenggh,
The former term is a likelihood term. You can choose any distribution as your "belief". For example, I can "believe" that my likelihood should follow a Gaussian distribution with N(o1(x), sigma^2 * I) where o1 is unknown parameters and approximated by a neural network. Then, based on the idea of "Maximum Likelihood (MLE)", we should try to maximize the probability that the original training data appears. Note that our likelihood now is believed to be a "Gaussian". According to the formulation,
, to maximize the probability, our prediction should be as close as possible to the training data. That is, we should minimize the MSE between the prediction and the training data.
A similar story can be applied to BCE.
By the way, a likelihood function, p(x|z), means given an observation z how much likely an even, x, will happen.
Thanks
Dear TA,
In the example code of VAE, the loss function is the summation of cross entropy and KL divergence.
On the other hand in VAE-GAN, the loss function is the summation of MSE and KL divergence.
Both of them seems work well. Are they equivalent? And which one matches the equation in the handout?