ConnorJL / WGAN-Tensorflow

Wasserstein GAN implemented in Tensorflow Slim
8 stars 3 forks source link

Implementing weight clipping #1

Open PatrykChrabaszcz opened 7 years ago

PatrykChrabaszcz commented 7 years ago

In tensorflow I just do this for weights clipping:

t_vars = tf.trainable_variables() critic_vars = [var for var in t_vars if 'crit' in var.name] self.clip_critic = [] for var in critic_vars: self.clip_critic.append(tf.assign(var, tf.clip_by_value(var, -0.1, 0.1)))

Here is my repo where I try to implement WGAN: https://github.com/PatrykChrabaszcz/WGan Did you get any good results ? This is what I get for mnist: res_20

ConnorJL commented 7 years ago

Wow, that is a much much nicer implementation! Thanks so much! My code now easily runs 20-50x faster. I've added the fix to the repo.

I haven't trained a good model for MNIST, because honestly I think that MNIST is too simple to really show whether an image generation technique is good or not. I'm still running my code on a self collected dataset of ImageNet-like images. We'll see how it goes.

PatrykChrabaszcz commented 7 years ago

I was not able to train WGAN on CelebA resized to 32x32. It gave me worse results than standard GAN. Probably experimenting with hyperparameters would help.

res_10

ConnorJL commented 7 years ago

Very interesting! The paper said WGANs should be able to avoid this kind of mode collapse if I remember correctly, so this is definitely worth investigating. I'm going to pause my high res experiment for a while and run some tests on MNIST, CelebA and CIFAR, if I can find the time. Might take me a day or two to get representative results.

PatrykChrabaszcz commented 7 years ago

Ill run it again for little bit different architecture and original hyperparameters settings. This image I posted was from one of experiments I think it wasn't according to original settings.

Ok this is what I got so far. And I think it's still training: res_10

ConnorJL commented 7 years ago

Ah, those are some nice results! Did you find any culprit hyperparameter or was the network just under trained? I was curious at the report that a simple MLP architecture could lead to good results using WGAN, so I ran it on MNIST, take a look:

test_490000

I'm pretty impressed by the quality, fully connected neural nets have a bad reputation nowadays, and training only took a few hours on my consumer grade computer. But for some reason it seems dead set on occasionally producing totally black images, I'm not quite sure why. I'm probably going to try it on CIFAR next, see what happens.

PatrykChrabaszcz commented 7 years ago

I don't remember exactly now, but I think that I first ran network with original proposed learning hyperparameters, network was similar to dcgan but had 4x less features in each layer. It gave me images with something face-like. Then I was experimenting with different settings it and I got those strage images that you can see in my second post. Then I got back to the original settings but I changed network structure (Adding more kernels, changing batch norm) and now I can see something like this: res_10

I got much better faces using Adversarial Autoencoder + feature matching using GAN. So for me those look quite bad. Here are samples from AAE+GAN:

aaegan

Those black images are strange. Did you try to find which part of latent space produces black images? Do you "switch batch norm off" during sampling after you train?

ConnorJL commented 7 years ago

Hmm yea the results aren't bad but they aren't significantly better than a normal GAN, AAE+GAN does look much nicer. I am beginning to wonder where WGAN has larger benefits. From what I understand (and I am not the greatest expert) it may be useful in stabilizing training in difficult domains, so maybe a test on ImageNet or similar with a larger DCGAN architecture would show its benefits. I was quite surprised it got my tiny MLP model to make decent enough results, but as said before MNIST is a very simple dataset. More testing tomorrow.

The MLP model as per the original paper actually doesn't use batch norm, so that isn't it. I might look into it a little more tomorrow.