bjlkeng / bjlkeng.github.io

My Github Pages Blog
16 stars 5 forks source link

Possible typo in the VAE blog #1

Closed haoma7 closed 5 years ago

haoma7 commented 5 years ago

http://bjlkeng.github.io/posts/variational-autoencoders/

Thank you for your wonderful blog.

I am not sure whether the first item in the Equation (14) (Actually there is no Equation 14, I am referring to the one below Equation 13) is completely correct.

bjlkeng commented 5 years ago

Hi @haoma7, thanks for the kind words! I think it's correct (unless there is a typo I'm missing).

Are you referring to the /frac{1}{2\sigma^2}(x_i - \mu_{z|X})^2 term? This should just be the \log of the PDF of a normal distribution (minus some constant terms because they don't matter in the optimization), see here: https://en.wikipedia.org/wiki/Normal_distribution#Estimation_of_parameters

We're assuming a constant standard deviation here, which is a hyper parameter if we model the outputs (\mu_x|Z) as a normal distribution.

haoma7 commented 5 years ago

Yes, I am referring to the $/frac{1}{2\sigma^2}(xi - \mu{z|X})^2$ term. I kind of feel that the $\mu{z|X}$ should instead be $\mu{x|Z}$. My understanding is that, $\mu{x|Z} = g{X|z}(z;\theta) $, instead of $\mu_{z|X}$, should be the mean of X, in order to be consistent with:

  1. As you wrote in Section 2.1, "observed data follows a isotropic normal distribution \mathcal{N}(g(z), \sigma^2 * I), with mean following our learned latent random variable from the output of g, and identity covariance matrix scaled by a hyperparameter \sigma^2.

  2. Item 4 in the ordered list below Figure 3: "Compute \mu{X|z} from g{X|z}(X;\theta) to produce the (mean of the) reconstructed output." (BTW, I think it should be g{X|z}(z;\theta) here, instead of g{X|z}(X;\theta)).

What do you think?

Your blogs are just gold...I think the VAE one is way clearer than Carl Doersch's tutorial. I am currently learning Generative Model all by myself, and your technical blog series, including the VAE one, is the best I found online. It seems that nobody else cares that much about math derivations.. Your blogs deserve much higher search engine ranking :)

bjlkeng commented 5 years ago

Yep, you're absolutely right! Didn't see it on my first read. I love it when people find bugs in my stuff. I'm uploading a new post soon, so I'll fix it when I merge that in.

Thanks for saying that! I'm pretty proud of some of the posts I've written but I'm really writing it more for myself. I find writing things out in detail helps me understand the topic more, and I try to write it in a way I wish was available when I was studying it. VAEs are much better studied now than they were 3 years ago but most of the treatment still glosses over some of the details. Glad you liked it and found it helpful!

bjlkeng commented 5 years ago

Updated the post, closing issue.

haoma7 commented 5 years ago

Yes, there are a lot of tutorials online now... but none is as clear as the original paper and Carl's tutorial, let alone yours. In terms of clearness and organization, my personal ranking would be Your tutorial > the original paper > Carl's tutorial :)

I find it the same. I recently wrote a tutorial on the restricted Boltzmann machine that contains all the detailed derivation, which is missing in most of the online tutorials. It's good for my own understanding, also good for future reference.

Some other comments:

  1. I think it should be g{X|z}(z;\theta), instead of g{X|z}(X;\theta in Item 4 of the ordered list below Figure 3.

  2. In your tutorial, you treat the $\sigma$ as a hyperparameter. As far as I understand, the original paper considered both the $\mu_{x|Z}$ and $\sigma$ as the output of the decoder, instead of treating the $\sigma$ as a hyperparameter. In that case, those constants you dropped from $\log P(x_i|z)$ should not be dropped any more since they contain $\sigma$ which is related to $\theta$ and $\phi$. Just want to confirm with you.

Only after reading your tutorial did I realize that those nodes in the last layer of the decoder are some "stochastic nodes", just like the ones in restricted Boltzmann machine. I am planning to read your other posts. Really really appreciate your posts.

bjlkeng commented 5 years ago

Thanks again for flattering words. Interesting you like the original paper, I felt it was a bit terse, and skipped over a lot of detail. I think they hadn't fully digested the idea and didn't do a great job with the intuition (which is common for the first paper in an area).

Comments:

  1. Fixed, thanks for the pointer!
  2. Yeah, for that post (my first one on VAEs), I was basing it on Doersch's tutorial primarily, which assumes a constant variance. I think for some of the later posts, I use a non-constant variance (and hopefully explain it).

I find all this probabilistic modelling very interesting. The mix of probability with deep learning is one of the most interesting areas (although, jury's still out on whether or not it's useful).

Thanks again for finding these bugs and let me know if you find anymore!

haoma7 commented 5 years ago

http://bjlkeng.github.io/posts/variational-autoencoders/

Hi Brian, could you take another look at formula (7), is the \frac{1}{M} supposed to be inside the \log ?

bjlkeng commented 5 years ago

Yep, another good find, keep them coming! Fixed it: http://bjlkeng.github.io/posts/variational-autoencoders/