Open Rivendile opened 3 years ago
The three parameters are enough. BYTEPS_SERVER_ENABLE_SCHEDULE
is for the server process, but it is enabled by default though. The other two are similar to those used in ByteScheduler.
What's the difference between server schedule and bytescheduler?
ByteScheduler does not support scheduling at the server side.
Thanks to your timely reply. Does byteps have scheduling at the worker side by default? Is there any parameter to control this feature?
Does byteps have scheduling at the worker side by default? Is there any parameter to control this feature?
Yes, BYTEPS_SCHEDULING_CREDIT
controls the scheduling at the worker side.
Thanks a lot. Is the priority used for worker-side scheduling the index of the layers? What is the priority used for server-side scheduling?
For both sides, the priority is determined by the tensor index.
You can refer to here if you are interested in how it works: https://github.com/bytedance/byteps/blob/master/byteps/server/server.cc#L473
Hi, @ymjiang ,I tried to set these parameters to control scheduling behavior, which is successful while using MxNet BytePS. However, when I set byteps_server_enable_schedule=1
and byteps_scheduling_credit=4
, the timeline looks like this:
It seems that there is no scheduling at all. Any suggestions?
It seems that there is no scheduling at all. Any suggestions?
Actually I am not sure how you get this conclusion from this figure.. Enabling the scheduling only means that former layers are preferentially selected for communication, but this is not guaranteed.
I'm a little confused. First of all, in MxNet, when I use these parameters, the former layers are selected for communication, and the latter ones that are not sent while the former one comes will be delayed. As shown above, in MxNet, the tensor of gradient 28 wait for former layers. But in the figure https://github.com/bytedance/byteps/issues/348#issuecomment-755952360, the tensors of features 28, 26, 24, 17 show the similar pattern. They all have pull operations that are done early or late. On the other hand, don't the credit size and priority queue used in https://github.com/bytedance/byteps/blob/249006c9105d7b4fd09962eb133c3e76de1c8656/byteps/common/scheduled_queue.cc guarantee the preference, at least in the range of the credit size? Because the GetTask function gets the ready tensor with the highest priority.
I think there is a problem with the for loop in GetTask function in https://github.com/bytedance/byteps/blob/249006c9105d7b4fd09962eb133c3e76de1c8656/byteps/common/scheduled_queue.cc. For example, when a large tensor with high priority is not chosen because of credit limitation, some small tensors with lower priority may be chosen. Thus the priority is not guaranteed. So why does not BytePS use true priority queue used in ByteScheduler (https://github.com/bytedance/byteps/blob/33fe89f5a6a691ec562ad2b0167f0192fd8ced7d/bytescheduler/bytescheduler/common/bytecore.py#L213) to get rid of this problem?
For example, when a large tensor with high priority is not chosen because of credit limitation, some small tensors with lower priority may be chosen. Thus the priority is not guaranteed.
We partition the large tensors to equal size to avoid this. (small tensors do not matter much according to our tests)
Thanks to your explanation! Could you please tell me what else makes the priority unguaranteed?
Does byteps have scheduling at the worker side by default? Is there any parameter to control this feature?
Yes,
BYTEPS_SCHEDULING_CREDIT
controls the scheduling at the worker side.
Hi, since scheduling at the worker side is the default manner, could you share the differences between import byteps.torch as bps
and import byteps.torch.cross_barrier as bps
?
Hello, I would like to run communication scheduling with BytePS. What parameters should I set? Is setting BYTEPS_SERVER_ENABLE_SCHEDULE=1, BYTEPS_SCHEDULING_CREDIT and BYTEPS_PARTITION_BYTES enough? What's the difference between server schedule and bytescheduler? Any help would be appreciated. Thanks a lot.