Closed botev closed 9 years ago
It would be nice if the BatchNormalizationLayer, rather than supporting a "single_pass" would support "collect" where it will collect the variables in minibatches.
I think it does. If you set alpha to e.g. 0.5 ? This is untested, but should work?
That would take geometric average over the minibatches, e.g. for 3 minibatches:
mean = 0.125*m_1 + 0.25*m_2 + 0.5*m_3
where you want in the end is
mean = 0.333 * m_1 + 0.333 * m_2 + 0.333 * m_3
Or am I missunderstanding what the point of the "single_pass" is?
Single pass assumes that you pass the entire dataset through the network in a single batch. That way you collect the correct statistics for all batch normalization layers.
It's probably correct that alpha!='single_pass'
uses a geometric average. Its copied from @f0k 's implementation.
Aha, I get it. Well then what I suggest is just convinience for collecting it after training is done! Thanks for the clarification.
It would be nice if the BatchNormalizationLayer, rather than supporting a "single_pass" would support "collect" where it will collect the variables in minibatches. If the dataset is big enough the "single_pass" could fail. Alternative would be something like this: