Open MaximilienLC opened 4 months ago
@MaximilienLC Do you mean this?
trainer = Trainer(
accelerator="cpu",
devices=[1, 3],
)
TypeError: `devices` selected with `CPUAccelerator` should be an int > 0.
This is correct. It does not make sense to select device indices on a CPU. Device indices are meant for accelerators like CUDA GPUs or TPUs.
If you select devices=1
(int) on CPU, PyTorch will already use all cores and threads when appropriate. And devices > 1 just simulates DDP on CPU but it does not give you any benefits in terms of speed, it's only meant for debugging DDP.
If there is documentation that contradicts this, please point me to it so we can update it. Thanks!
I guess this section could add that info for CPU.
So you mean devices=1
on CPU equates to using all CPUs?
So you mean devices=1 on CPU equates to using all CPUs?
Yes PyTorch will use all CPUs if it can parallelize the operation accordingly.
I agree that this should not throw an exception assuming the current documentation is correct. For example, the device could come from an environment variable
trainer = Trainer(
accelerator=os.environ.get("DEVICE", "cpu"),
devices=-1, # use all devices
)
So when deploying the code to different machines, it would break for CPU. Maybe a warning would be more adequate to notify the user that something is not optimally configured.
Would devices="auto"
also use all devices?
Bug description
https://github.com/Lightning-AI/pytorch-lightning/blob/3740546899aedad77c80db6b57f194e68c455e28/src/lightning/fabric/accelerators/cpu.py#L75
cpu_cores
being a list of integers will always raise an exception, which shouldn't according to the Trainer documentation/this function signatureWhat version are you seeing the problem on?
master
How to reproduce the bug
No response
Error messages and logs
Environment
Current environment
``` #- Lightning Component (e.g. Trainer, LightningModule, LightningApp, LightningWork, LightningFlow): #- PyTorch Lightning Version (e.g., 1.5.0): #- Lightning App Version (e.g., 0.5.2): #- PyTorch Version (e.g., 2.0): #- Python version (e.g., 3.9): #- OS (e.g., Linux): #- CUDA/cuDNN version: #- GPU models and configuration: #- How you installed Lightning(`conda`, `pip`, source): #- Running environment of LightningApp (e.g. local, cloud): ```More info
No response