Reduce coupling between strategies, reduce unintentional overrides/inheritance and avoid silent failures
eg:
DeepSpeed inherited configure_ddp() from DDP, which is unnecessary and error-prone.
DDP and FSDP have totally different distributed behavior, but with current inheritance, adding a DDP strategy method will automatically enable this method for FSDP
Pitch
Tasks list:
FSDP inherit ParallelStrategy instead of DDP
DeepSpeed inherit ParallelStrategy instead of DDP
[RFC] DDPShard inherit ParallelStrategy instead of DDP
[RFC] DDPShardSpawn inherit ParallelStrategy instead of DDPSpawn
[RFC] TPUSpawn inherit ParallelStrategy instead of DDPSpawn
Additional context
The downside of this task will be code duplications.
Align DDP/DDPSpawn precess group creation #11643 and Collective refactor #9414 will reduce the duplication.
If you enjoy Lightning, check out our other projects! ⚡
Metrics: Machine learning metrics for distributed, scalable PyTorch applications.
Lite: enables pure PyTorch users to scale their existing code on any kind of device while retaining full control over their own loops and optimization logic.
Flash: The fastest way to get a Lightning baseline! A collection of tasks for fast prototyping, baselining, fine-tuning, and solving problems with deep learning.
Bolts: Pretrained SOTA Deep Learning models, callbacks, and more for research and production with PyTorch Lightning and PyTorch.
Lightning Transformers: Flexible interface for high-performance research using SOTA Transformers leveraging Pytorch Lightning, Transformers, and Hydra.
cc @justusschock @awaelchli @akihironitta @rohitgr7
Proposed refactor
Flatten the Strategy inheritance: Part of #10416
Motivation
Reduce coupling between strategies, reduce unintentional overrides/inheritance and avoid silent failures eg:
Pitch
Tasks list:
Additional context
The downside of this task will be code duplications. Align DDP/DDPSpawn precess group creation #11643 and Collective refactor #9414 will reduce the duplication.
If you enjoy Lightning, check out our other projects! ⚡
Metrics: Machine learning metrics for distributed, scalable PyTorch applications.
Lite: enables pure PyTorch users to scale their existing code on any kind of device while retaining full control over their own loops and optimization logic.
Flash: The fastest way to get a Lightning baseline! A collection of tasks for fast prototyping, baselining, fine-tuning, and solving problems with deep learning.
Bolts: Pretrained SOTA Deep Learning models, callbacks, and more for research and production with PyTorch Lightning and PyTorch.
Lightning Transformers: Flexible interface for high-performance research using SOTA Transformers leveraging Pytorch Lightning, Transformers, and Hydra.
cc @justusschock @awaelchli @akihironitta @rohitgr7