Lightning-AI / pytorch-lightning

Pretrain, finetune ANY AI model of ANY size on multiple GPUs, TPUs with zero code changes.
https://lightning.ai
Apache License 2.0
28.53k stars 3.39k forks source link

Remove custom `AllGatherGrad` implementation in favor of `torch.distributed`'s #10445

Closed ananthsub closed 1 year ago

ananthsub commented 3 years ago

Proposed refactoring or deprecation

Motivation

Lightning has a utility defined for all gather with gradients here: https://github.com/PyTorchLightning/pytorch-lightning/blob/d515bcac969c2a485ada673e302bfac51f142331/pytorch_lightning/utilities/distributed.py#L200-L222

However, this is already available in torch distributed: https://github.com/pytorch/pytorch/blob/6b44e75f6bccca7acc8ec31a635f1175c265ac54/torch/distributed/nn/functional.py#L82-L94

So there's no need to redefine this in Lightning

Pitch

Remove the custom all gather grad implementation and call torch distributed's functional API

Additional context


If you enjoy Lightning, check out our other projects! ⚡

cc @borda @awaelchli @rohitgr7 @akihironitta @justusschock @tchaton

puhuk commented 3 years ago

Let me take this :)

tchaton commented 2 years ago

Any updates on this issue ?

carmocca commented 2 years ago

Blocked by https://github.com/pytorch/pytorch/issues/73515

stale[bot] commented 2 years ago

This issue has been automatically marked as stale because it hasn't had any recent activity. This issue will be closed in 7 days if no further activity occurs. Thank you for your contributions, Pytorch Lightning Team!