Lightning-AI / pytorch-lightning

Pretrain, finetune and deploy AI models on multiple GPUs, TPUs with zero code changes.
https://lightning.ai
Apache License 2.0
27.96k stars 3.35k forks source link

Add BMUF Multi-GPU Training Method #13178

Open JusperLee opened 2 years ago

JusperLee commented 2 years ago

🚀 Feature

Motivation

Implements incremental block distributed data parallelism similar to https: ieeexplore.ieee.org/document/7472805. This can mitigate the performance loss caused by multi-card training.

Pitch

This can be found in the Fairseq implementation.

Additional context


If you enjoy Lightning, check out our other projects! âš¡

cc @borda @awaelchli @rohitgr7 @akihironitta

stale[bot] commented 2 years ago

This issue has been automatically marked as stale because it hasn't had any recent activity. This issue will be closed in 7 days if no further activity occurs. Thank you for your contributions, Pytorch Lightning Team!