Open Eiji911 opened 6 months ago
in my opinion, when the gpu cluster scaled up to several hundred workers, high sparsification ratios still generate significant communication overheads, which even worst than DenseAllReduce.
in my opinion, when the gpu cluster scaled up to several hundred workers, high sparsification ratios still generate significant communication overheads, which even worst than DenseAllReduce.