danielenricocahall / elephas

Distributed Deep learning with Keras & Spark
MIT License
17 stars 5 forks source link

Correct Batch Normalization Implementation #18

Closed danielenricocahall closed 1 year ago

danielenricocahall commented 1 year ago

Overview _The elephas calculates the deltas of the weights by simple subtraction. This strategy is efficient for the conv2d layers and dense layers. However, modern neural networks always contain batch normalization layers in which the subtraction strategy fails. Thus, elephas could not be used for distributed training some popular networks such as google inceptions and YOLO.

Can this be solved?_

(Copied from https://github.com/maxpumperla/elephas/issues/57)

Possible Approaches We need to investigate what the "correct" implementation of batch normalization would be in a distributed setting.

Alternative Options The current approach may already be correct? Or work "well enough".

Additional context Add any other context or screenshots about the feature request here.

danielenricocahall commented 1 year ago

After thinking about this for a bit, I actually believe Elephas is handling distributed batch normalization correctly. I don't know if there is an alternative implementation that would make more sense based on the distributed paradigm we're using. For that reason, I am closing this issue out. I invite anyone to reopen the issue (or even a PR) if I am wrong about this or if there's a proposal.