Move strategy-specific dataloader logic to the stategies

Proposed refactor

Motivation

Strategies today interact with dataloading, especially in distributed training. It makes sense for the strategy to directly handle this logic.

This would reduce and simplify interactions elsewhere in the trainer, in particular strategy state -> trainer properties -> data connector logic

And would remove the hacky patching that IPUs do right now 😧 https://github.com/PyTorchLightning/pytorch-lightning/blob/cc43d07db1ab77385feff04c01f040c5cad805a9/pytorch_lightning/strategies/ipu.py#L125-L130

Pitch

The Strategy interface already offers a process_dataloader method: https://github.com/PyTorchLightning/pytorch-lightning/blob/cc43d07db1ab77385feff04c01f040c5cad805a9/pytorch_lightning/strategies/strategy.py#L369-L375

However, there's a ton of strategy-specific logic written in the trainer's data connector:

For example, warnings with DDP spawn: https://github.com/PyTorchLightning/pytorch-lightning/blob/cc43d07db1ab77385feff04c01f040c5cad805a9/pytorch_lightning/trainer/connectors/data_connector.py#L278-L312

generic multi-processing warning: https://github.com/PyTorchLightning/pytorch-lightning/blob/cc43d07db1ab77385feff04c01f040c5cad805a9/pytorch_lightning/trainer/connectors/data_connector.py#L314-L322

the is_distributed flag is primarily used here within the trainer: https://github.com/PyTorchLightning/pytorch-lightning/blob/cc43d07db1ab77385feff04c01f040c5cad805a9/pytorch_lightning/trainer/connectors/data_connector.py#L324-L330

Distributed sampler kwargs is strategy specific: https://github.com/PyTorchLightning/pytorch-lightning/blob/cc43d07db1ab77385feff04c01f040c5cad805a9/pytorch_lightning/trainer/trainer.py#L2126-L2129

IPU check: https://github.com/PyTorchLightning/pytorch-lightning/blob/cc43d07db1ab77385feff04c01f040c5cad805a9/pytorch_lightning/trainer/connectors/data_connector.py#L364

In my opinion we could simplify this by moving relevant logic into strategy.process_dataloader instead. Common logic can still be abstracted out into utility functions to share across different strategy classes.

We could either:

augment process_dataloader to acept more metadata, such as the trainer's running stage and if this is a dataloader being used for training/validation/test/predict

def process_dataloader(self, dataloader, use: str) -> Union[dataloader, iterable]:

or offer multiple APIs that map to the DataLoader hooks: https://github.com/PyTorchLightning/pytorch-lightning/blob/cc43d07db1ab77385feff04c01f040c5cad805a9/pytorch_lightning/core/hooks.py#L406

def process_train_dataloader(self, dataloader):
def process_val_dataloader(self, dataloader):
def process_test_dataloader(self, dataloader):
def process_predict_dataloader(self, dataloader):

Over time, the Trainer flag replace_sampler_ddp makes much more sense on the specific distributed strategy constructors instead of on the trainer.

Additional context

If you enjoy Lightning, check out our other projects! ⚡

Metrics: Machine learning metrics for distributed, scalable PyTorch applications.
Lite: enables pure PyTorch users to scale their existing code on any kind of device while retaining full control over their own loops and optimization logic.
Flash: The fastest way to get a Lightning baseline! A collection of tasks for fast prototyping, baselining, fine-tuning, and solving problems with deep learning.
Bolts: Pretrained SOTA Deep Learning models, callbacks, and more for research and production with PyTorch Lightning and PyTorch.
Lightning Transformers: Flexible interface for high-performance research using SOTA Transformers leveraging Pytorch Lightning, Transformers, and Hydra.

cc @justusschock @awaelchli @akihironitta @rohitgr7 @ninginthecloud @tchaton @borda

Lightning-AI / pytorch-lightning