bytedance / byteps

A high performance and generic framework for distributed DNN training
Other
3.62k stars 487 forks source link

Is it right to do allreduce immediately for non-zero ranks in bytescheduler? #422

Closed sywang0111 closed 2 years ago

sywang0111 commented 2 years ago

https://github.com/bytedance/byteps/blob/2749848707a414e7dee24a16852575340b6ddd47/bytescheduler/bytescheduler/common/bytecore.py#L192 I see in you code, for rank0, check whether the tensor is ready for allreduce; for other ranks, let them do allreduce immediately. So the rank0 can control other ranks' gradient communication. But suppose these are some stragglers in non-zero ranks, allreduce occurs as long as the tensor is ready in rank0 although the tensor is not ready in those stragglers . So can you tell me if there's a problem with my understanding or if it might be a bug here. @pengyanghua Thank you.

pengyanghua commented 2 years ago

@sywang0111 Hi, all-reduce will be started when all ranks are ready. Non-zero ranks check their readiness here and rank 0 checks the readiness here. We only do scheduling on rank 0 since all-reduce is synchronized.

See this issue https://github.com/bytedance/byteps/issues/351 for more information.

sywang0111 commented 2 years ago

Understood. Thank you for your reply.