anantzoid / Conditional-PixelCNN-decoder

Tensorflow implementation of Gated Conditional Pixel Convolutional Neural Network
485 stars 83 forks source link
convolution deep-learning deepmind generative-algorithm paper tensorflow

Image Generation with Gated PixelCNN Decoders

This is a Tensorflow implementation of Conditional Image Generation with PixelCNN Decoders which introduces the Gated PixelCNN model based on PixelCNN architecture originally mentioned in Pixel Recurrent Neural Networks. The model can be conditioned on latent representation of labels or images to generate images accordingly. Images can also be modelled unconditionally. It can also act as a powerful decoder and can replace deconvolution (transposed convolution) in Autoencoders and GANs. A detailed summary of the paper can be found here.

These are some conditioned samples generated by the authors of the paper:

Paper Sample

Architecture

This is the architecture for Gated PixelCNN used in the model:

Gated PCNN

The gating accounts for remembering the context and model more complex interactions, like in LSTM. The network stack on the left is the Vertical stack that takes care of blind spots that occure while convolution due to the masking layer (Refer the Pixel RNN paper to know more about masking). Use of residual connection significantly improves the model performance.

Usage

This implementation consists of the following models based on the Gated PixelCNN architecture:

To only generate images append the --epochs=0 flag after the command.

To train the any model on CIFAR-10 dataset, add the --data=cifar flag.

Refer main.py for other available flags for hyperparameter tuning.

Training Details

The system was trained on a single AWS p2.xlarge spot instance. The implementation was only done on MNIST dataset. Generation of samples based on CIFAR-10 images took the authors 32 GPUs trained for 60 hours.

To visualize the graph and loss during training, run:

tensorboard --logdir=logs

Loss minimization for the autoencoder model:

Loss