Closed wuaalb closed 8 years ago
The parmesan batchnormalization is split in a normalization layer and a scale layer, which is needed for ladder networks.
It does also support a "collect" keyword in which assumes a single large batch where statistics are collected. Lasagne collect the statistics as moving average.
In other cases you are probably better off with lasagnes implementation.
Ok, thanks for the clarification. Sounds reasonable.
Now that Lasagne includes support for batch normalization, I was wondering if it could replace Parmesan's BN. As far as I understand both were derived from f0k's initial BN gist, but I haven't looked in detail at the differences. Did you make any significant modifications?