Closed lruthotto closed 6 years ago
Could you submit this to the dev branch?
@lruthotto not sure if you can see my other comments, but what exactly do we need to store when running batch norm? In the backward pass functions we are running batchNormNN with doDerivatives=true which uses a lot of memory storing the 3 Ys at each point of batch norm (one at start, one after norm layer, one after affine scaling layer). Causing memory footprint to grow a lot.