Correct Batch Normalization Implementation

danielenricocahall / elephas

Distributed Deep learning with Keras & Spark

MIT License

17 stars 5 forks source link

Overview _The elephas calculates the deltas of the weights by simple subtraction. This strategy is efficient for the conv2d layers and dense layers. However, modern neural networks always contain batch normalization layers in which the subtraction strategy fails. Thus, elephas could not be used for distributed training some popular networks such as google inceptions and YOLO.

Can this be solved?_

(Copied from https://github.com/maxpumperla/elephas/issues/57)

Possible Approaches We need to investigate what the "correct" implementation of batch normalization would be in a distributed setting.

Alternative Options The current approach may already be correct? Or work "well enough".

Additional context Add any other context or screenshots about the feature request here.

danielenricocahall / elephas

Correct Batch Normalization Implementation #18