bytedance / byteps

A high performance and generic framework for distributed DNN training
Other
3.62k stars 489 forks source link

is BytePS already including Bytedance Scheduler? Or we need to use them separately? #367

Open nishantagrawalgit opened 3 years ago

nishantagrawalgit commented 3 years ago

I wanted to check if "byteps" and "bytedance scheduler" are separate software modules? Or if we use byteps, scheduler will automatically will be included in byteps.

Reason for this query that when we run byteps, we need to run scheduler so wanted to check if all scheduler benefits will be used for large scale AI training? OR we need to launch "bytedance scheduler" seperately.

Is there any difference between "bytedance scheduler" vs scheduler we run with byteps?

ymjiang commented 3 years ago

A short answer is they are separate software modules.

To use ByteScheduler, you need to checkout this: https://github.com/bytedance/byteps/tree/bytescheduler/bytescheduler.

However, BytePS incorporates the basic idea of ByteScheduler (prioritized scheduling + tensor partition). You can manually enable/disable them if you want to check the benefits.

nishantagrawalgit commented 3 years ago

thanks a lot @ymjiang for your response.

If I understood correctly, can I say with BytePS alone I can get basic benefits of prioritized scheduling + tensor partition? Also curious to know what I will be missing in scheduling if I don't use ByteScheduler with BytePS?

ymjiang commented 3 years ago

can I say with BytePS alone I can get basic benefits of prioritized scheduling + tensor partition?

Yes. You can check that all in BytePS.

Also curious to know what I will be missing in scheduling if I don't use ByteScheduler with BytePS?

ByteScheduler works on top of PS or all-reduce, while BytePS is a new architecture that goes beyond them. If you just want to check the benefit of scheduling, then using ByteScheduler is fine.

lucasleesw commented 3 years ago

However, BytePS incorporates the basic idea of ByteScheduler (prioritized scheduling + tensor partition). You can manually enable/disable them if you want to check the benefits.

Hi, could you share how to enable/disable prioritized scheduling in BytePs PyTorch implementation?

ymjiang commented 3 years ago

Hi, could you share how to enable/disable prioritized scheduling in BytePs PyTorch implementation?

@lucasleesw By default it is disabled. You can enable it by exporting BYTEPS_SCHEDULING_CREDIT to a small value (e.g., 4).

lucasleesw commented 3 years ago

@ymjiang @lucasleesw By default it is disabled. You can enable it by exporting BYTEPS_SCHEDULING_CREDIT to a small value (e.g., 4).

Thank you, it helps a lot!
Could you share the differences between import byteps.torch as bps and import byteps.torch.cross_barrier as bps. Is there anything about prioritized scheduling?

ymjiang commented 3 years ago

The cross_barrier implementation supports crossing the global optimizer().step barrier of PyTorch, as its name suggests. It may improves the bps baseline by a few margin, depending on your models.