can sparse all-reduce keep efficiency with large number of gpu workers？

apache / singa

a distributed deep learning platform

Apache License 2.0

3.35k stars 1.24k forks source link

can sparse all-reduce keep efficiency with large number of gpu workers？ #1140

Open Eiji911 opened 6 months ago

Eiji911 commented 6 months ago

in my opinion, when the gpu cluster scaled up to several hundred workers, high sparsification ratios still generate significant communication overheads, which even worst than DenseAllReduce.