apache / singa

a distributed deep learning platform
Apache License 2.0
3.35k stars 1.24k forks source link

can sparse all-reduce keep efficiency with large number of gpu workers? #1140

Open Eiji911 opened 6 months ago

Eiji911 commented 6 months ago

in my opinion, when the gpu cluster scaled up to several hundred workers, high sparsification ratios still generate significant communication overheads, which even worst than DenseAllReduce.