PPPLDeepLearning / plasma-python

PPPL deep learning disruption prediction package
http://tigress-web.princeton.edu/~alexeys/docs-web/html/
79 stars 43 forks source link

Do not broadcast weights after all-reduce #34

Closed ASvyatkovskiy closed 5 years ago

ASvyatkovskiy commented 5 years ago

The data parallel training algorithm implemented here uses all_reduce to get global weight updates (by summing and averaging them), as such the global weight updates become available to all the ranks, not only the root rank (which would be the case with reduce). There is no need to broadcast the global weights after each update iteration, but we need to broadcast the initial weights to all workers to ensure identical starting point for training.