Element-Research / rnn

Recurrent Neural Network library for Torch7's nn
BSD 3-Clause "New" or "Revised" License
939 stars 313 forks source link

Fix implementation of batch normalization #370

Closed rachtsingh closed 7 years ago

rachtsingh commented 7 years ago

Small fixes to the implementation of batch normalization (which I've measured to have good effect on its efficacy):

  1. Initialize the gamma parameter (which is the nn.BatchNormalization's weight parameter, as seen here) to 0.1 as recommended in Cooijmans et. al.
  2. Unset the default values of self.eps and self.momentum because those parameters can be passed as nil into nn.BatchNormalization, so unless there's a good reason for setting rnn-specific initializations it should be a transparent API. Also, the default value of 0.1 for eps is too high - the default inside nn is 1e-5, which seems more appropriate for a divide-by-zero fudge factor. In my experiment setting it to 0.1 caused the network to converge slower.
  3. Remove :noBias() since self.o2g is already a LinearNoBias (this is just for code cleanliness).
nicholas-leonard commented 7 years ago

@rachtsingh Great fix. Thanks for taking the time. And sorry for the long wait!