Fix implementation of batch normalization

Small fixes to the implementation of batch normalization (which I've measured to have good effect on its efficacy):

Initialize the gamma parameter (which is the nn.BatchNormalization's weight parameter, as seen here) to 0.1 as recommended in Cooijmans et. al.
Unset the default values of self.eps and self.momentum because those parameters can be passed as nil into nn.BatchNormalization, so unless there's a good reason for setting rnn-specific initializations it should be a transparent API. Also, the default value of 0.1 for eps is too high - the default inside nn is 1e-5, which seems more appropriate for a divide-by-zero fudge factor. In my experiment setting it to 0.1 caused the network to converge slower.
Remove :noBias() since self.o2g is already a LinearNoBias (this is just for code cleanliness).

Element-Research / rnn