Understanding input to mu and logsd layers

cdoersch / vae_tutorial

Caffe code to accompany my Tutorial on Variational Autoencoders

MIT License

503 stars 134 forks source link

Understanding input to mu and logsd layers #5

Open nathanin opened 7 years ago

nathanin commented 7 years ago

Hi thanks for the great tutorial. I have trouble understanding math. What is the reason to pass in encode3 to logsd before the nonlinearity is applied? Why not give encode3neur to both mu and logsd? I would ask if it's a typo, but running the reference prototxt, I can make it converge.

I have combined the VAE layers with convolution and deconvolution layers, and am having trouble training MNIST with this new architecture. (Using Sigmoid neurons instead of ReLU, if that matters).

cdoersch commented 7 years ago

You are correct: logsd should be connected to encode3neuron, not encode3. It's a typo. I doubt it will make very much difference in behavior (the VAE works just fine with this minor bug), and unfortunately it may be a while before I have time to check that fixing this doesn't break things.

In terms of your difficulties in training, I suspect that the switch from ReLU to sigmoid is far more important. I've heard many people in the vision community attribute the success of AlexNet to the fact that it swapped the sigmoids of previous networks for ReLU's, because ReLU's do a better job ensuring that gradients don't vanish. Making sure that gradients don't vanish or explode is a problem in every deep net, and VAEs are no exception. Check your initialization, and consider using batch norm.