giotto-ai / giotto-deep

Deep learning made topological.
Other
79 stars 11 forks source link

Accumulate gradients is not compatible with BatchNorm #75

Closed raphaelreinauer closed 2 years ago

raphaelreinauer commented 2 years ago

When I use n_accumulated_grads with a value bigger than 1 with batch norm layers, the batch normalization is computed only over the micro-batches and not the whole batch. This could cause problems with training stability and validation results.

I think that batch norm layers should be treated specially to compute the mean and variance over the whole batch finally. I think this is a very hard problem and I don't have a solution. Maybe it would be good to look at how they're doing it in the Pytorch-lightning library.

raphaelreinauer commented 2 years ago

This is no longer relevant.