aleju / face-generator

Generate human faces with neural networks
MIT License
312 stars 70 forks source link
deep-learning face gan lua torch

About

This is a script to generate new images of human faces using the technique of generative adversarial networks (GAN), as described in the paper by Ian J. Goodfellow. GANs train two networks at the same time: A Generator (G) that draws/creates new images and a Discriminator (D) that distinguishes between real and fake images. G learns to trick D into thinking that his images are real (i.e. learns to produce good looking images). D learns to prevent getting tricked (i.e. learns what real images look like). Ideally you end up with a G that produces beautiful images that look like real ones. On human faces that works reasonably well, probably because they contain a lot of structure (autoencoders work well on them too).

The code in this repository is a modified version of facebook's eyescream project.

Example images

The following images were generated by a model trained with th train.lua --D_L1=0 --D_L2=0 --D_iterations=2.

32x32 color

1024 randomly generated 32x32 face images.

64 color images rated as good

64 generated 32x32 images, rated by D as the best images among 1024 randomly generated ones.

Nearest neighbours of generated 32x32 images

16 generated images (each pair left) and their nearest neighbours from the training set (each pair right). Distance was measured by 2-Norm (torch.dist()). The 16 selected images were the "best" ones among 1024 images according to the rating by D, hence some similarity with the training set is expected.

Requirements

To generate the dataset:

To run the GAN part:

Usage

Building the dataset:

To train a new model, follow these steps:

You might have to work with the command line parameters --D_iterations and --G_iterations to get decent performance. Sometimes you also might have to change --D_L2 (D's L2 norm) or --G_L2 (G's L2 norm). (Similar parameters are available for L1.)

Architecture

G's architecture is mostly copied from the blog post by Anders Boesen Lindbo Larsen and Søren Kaae Sønderby. It is basically a full laplacian pyramid in one network. The network starts with a small linear layer, which roughly generates 8x8 images. That is followed by upsampling layers, which increase the image size to 16x16 and then 32x32 pixels.

local model = nn.Sequential()
model:add(nn.Linear(noiseDim, 128*8*8))
model:add(nn.View(128, 8, 8))
model:add(nn.PReLU(nil, nil, true))

model:add(nn.SpatialUpSamplingNearest(2))
model:add(cudnn.SpatialConvolution(128, 256, 5, 5, 1, 1, (5-1)/2, (5-1)/2))
model:add(nn.SpatialBatchNormalization(256))
model:add(nn.PReLU(nil, nil, true))

model:add(nn.SpatialUpSamplingNearest(2))
model:add(cudnn.SpatialConvolution(256, 128, 5, 5, 1, 1, (5-1)/2, (5-1)/2))
model:add(nn.SpatialBatchNormalization(128))
model:add(nn.PReLU(nil, nil, true))

model:add(cudnn.SpatialConvolution(128, dimensions[1], 3, 3, 1, 1, (3-1)/2, (3-1)/2))
model:add(nn.Sigmoid())

where noiseDim is 100 and dimensions[1] is 3 (color mode) or 1 (grayscale mode).

D is a standard convolutional neural net.

local conv = nn.Sequential()
conv:add(nn.SpatialConvolution(dimensions[1], 64, 3, 3, 1, 1, (3-1)/2))
conv:add(nn.PReLU(nil, nil, true))
conv:add(nn.SpatialDropout(0.2))
conv:add(nn.SpatialAveragePooling(2, 2, 2, 2))

conv:add(nn.SpatialConvolution(64, 128, 3, 3, 1, 1, (3-1)/2))
conv:add(nn.PReLU(nil, nil, true))
conv:add(nn.SpatialDropout(0.2))
conv:add(nn.SpatialAveragePooling(2, 2, 2, 2))

conv:add(nn.SpatialConvolution(128, 256, 3, 3, 1, 1, (3-1)/2))
conv:add(nn.PReLU(nil, nil, true))
conv:add(nn.SpatialDropout(0.2))
conv:add(nn.SpatialAveragePooling(2, 2, 2, 2))

conv:add(nn.SpatialConvolution(256, 512, 3, 3, 1, 1, (3-1)/2))
conv:add(nn.PReLU(nil, nil, true))
conv:add(nn.SpatialDropout(0.2))
conv:add(nn.SpatialAveragePooling(2, 2, 2, 2))

conv:add(nn.View(512 * 0.25 * 0.25 * 0.25 * 0.25 * dimensions[2] * dimensions[3]))
conv:add(nn.Linear(512 * 0.25 * 0.25 * 0.25 * 0.25 * dimensions[2] * dimensions[3], 512))
conv:add(nn.PReLU(nil, nil, true))
conv:add(nn.Dropout())
conv:add(nn.Linear(512, 512))
conv:add(nn.PReLU(nil, nil, true))
conv:add(nn.Dropout())
conv:add(nn.Linear(512, 1))
conv:add(nn.Sigmoid())

where dimensions[1] is 3 (color) or 1 (grayscale), and dimensions[2] is the height of 32 (same as dimensions[3]).

Training is done with Adam (by default).

Command Line Parameters

The train.lua script has the following parameters:

Other