This line raises an error if not all devices are used for training, for example devices = [0] in a machine with 4 GPUs.
The flexibility around choosing number of devices would be beneficial for debugging, especially since using ddp with multiple GPUs and breakpoints often gives erratic behaviour.
This line raises an error if not all devices are used for training, for example
devices = [0]
in a machine with 4 GPUs.The flexibility around choosing number of
devices
would be beneficial for debugging, especially since usingddp
with multiple GPUs andbreakpoints
often gives erratic behaviour.https://github.com/Lightning-AI/litdata/blob/cc50222aeeaf725af67fa9a443ed6456edf091f9/src/litdata/utilities/env.py#L53
Could we get this fixed? Thanks!
cc: @tchaton