BaguaSys / bagua

Bagua Speeds up PyTorch
https://tutorials-8ro.pages.dev/
MIT License
875 stars 83 forks source link

Hi there, does Bagua support model parallel across different nodes? If so, would you mind providing a simple example in examples folder? Thank you. #136

Closed elricwan closed 3 years ago

elricwan commented 3 years ago

Is your feature request related to a problem? Please describe. A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

Describe the solution you'd like A clear and concise description of what you want to happen.

Describe alternatives you've considered A clear and concise description of any alternative solutions or features you've considered.

Additional context Add any other context or screenshots about the feature request here.

NOBLES5E commented 3 years ago

Currently Bagua by itself does not support model parallel. It is on our roadmap though.

At the moment, you can use PyTorch's rpc to do distributed model parallelism. Also, if your model uses both data parallel and model parallel, you can still use Bagua to accelerate the data parallel part.