Open awaelchli opened 2 years ago
@ananthsub feel free to edit this issue/pitch
@awaelchli Would this list every single cpu core?
I'm not sure what should be done for the CPU case. But since we cannot select on which CPU core to run a program, we probably shouldn't try to represent it in the list this way. For DDP on CPU, we can maybe just do [0] * num_devices
.
This issue has been automatically marked as stale because it hasn't had any recent activity. This issue will be closed in 7 days if no further activity occurs. Thank you for your contributions, Pytorch Lightning Team!
Proposed refactoring or deprecation
Introduce a device selection dataclass that holds the device selection in a standardized format. Idea by @ananthsub
Motivation
We have a
_parse_devices
function used in the Trainer and Lite that returns a tuple of parsed device indices.https://github.com/PyTorchLightning/pytorch-lightning/blob/9237106451f97393b17009a0ca571b6ff5ba5484/pytorch_lightning/trainer/trainer.py#L1459-L1470
From @ananthsub in https://github.com/PyTorchLightning/pytorch-lightning/pull/10230#discussion_r738806860
Pitch
The AcceleratorConnector new gets as input the DeviceSelection instance instead of a growing list of arguments. It currently takes: devices, gpus, gpu_ids, tpu_cores, ipus, num_processes
Additional context
Alternative to #10231
If you enjoy Lightning, check out our other projects! ⚡
Metrics: Machine learning metrics for distributed, scalable PyTorch applications.
Flash: The fastest way to get a Lightning baseline! A collection of tasks for fast prototyping, baselining, finetuning and solving problems with deep learning
Bolts: Pretrained SOTA Deep Learning models, callbacks and more for research and production with PyTorch Lightning and PyTorch
Lightning Transformers: Flexible interface for high performance research using SOTA Transformers leveraging Pytorch Lightning, Transformers, and Hydra.
cc @justusschock @awaelchli @akihironitta @rohitgr7 @tchaton @borda