Better Initial Weights - Githubissues

christianbuck / miniNN

Small and simple neural network implementation for research

0 stars 0 forks source link

Better Initial Weights #3

Open christianbuck opened 11 years ago

christianbuck commented 11 years ago

Initial weights should probably depend on the activation function used in the hiden layer. Also we switched from standard sigmoid, 1/(1+exp(-x)), to tanh but weights are still initialized using a sigmoid receipe

hosang commented 11 years ago

Where did you get the recipe from?

I tried 1/sqrt(fan_in) [1] and it performs better: for batchsize 4, the xor works most of the times, for batchsize 1 not so much. It was sqrt(6)/sqrt(fin_in+fan_out) before.

I pushed it, so you can try.

[1] http://www.willamette.edu/~gorr/classes/cs449/precond.html

christianbuck commented 11 years ago

I think I got this from Andrew Ngs examples but I am not sure. What about the hidden2output weights?

The cuda-convnet people seem to just use very small numbers and drop the dependence on layer size: http://code.google.com/p/cuda-convnet/wiki/LayerParams