VQ-VAE Image reconstruction

ahmed-fau commented 6 years ago

Thanks for this clear notebook example, I have one question regarding construction of the quantized image:

In the paper, it was mentioned that we could use 'prior' like PixelCNN to model the constructed image. Actually I didn't understand what does it mean by prior in that case because the signal reconstruction should be directly done via the decoder network like what I found in your implementation.

My question: can we consider PixelCNN as an "alternative" model for image reconstruction trained on the encoder's quantized output (embedding 'Z') ? Or is it mandatory to apply that step after training the whole network (in that case I will not understand what is the benefit from decoder network) ?

Best Regards

avdnoord commented 6 years ago

First the VQ-VAE is trained (as in this notebook). This allows you to compress images into discrete codes with the encoder, and reconstruct with the decoder. If you want to sample new images you can train a generative model (e.g., PixelCNN) on the discrete codes (instead of on the pixels). Once the PixelCNN is trained you take the generated codes and use the decoder network to get the image. This is what is meant with a prior, as in the VAE sense: a model on top of the latents. So in the paper the PixelCNN is not used as a decoder (but that's also possible).

ahmed-fau commented 6 years ago

Once the PixelCNN is trained you take the generated codes and use the decoder network to get the image

Do you mean the decoder network of the trained PixelCNN ?

avdnoord commented 6 years ago

You first sample from the PixelCNN to get generated discrete codes. Then those are put in the decoder network of the VQ-VAE (a convnet with strides) to go from codes to images.

google-deepmind / sonnet

VQ-VAE Image reconstruction #90