Lightning-AI / litdata

Streamline data pipelines for AI. Process datasets across 1000s of machines, and optimize data for blazing fast model training.
Apache License 2.0
249 stars 24 forks source link

Set number of devices #152

Closed yhl48 closed 3 weeks ago

yhl48 commented 4 weeks ago

This line raises an error if not all devices are used for training, for example devices = [0] in a machine with 4 GPUs.

The flexibility around choosing number of devices would be beneficial for debugging, especially since using ddp with multiple GPUs and breakpoints often gives erratic behaviour.

https://github.com/Lightning-AI/litdata/blob/cc50222aeeaf725af67fa9a443ed6456edf091f9/src/litdata/utilities/env.py#L53

Could we get this fixed? Thanks!

cc: @tchaton

tchaton commented 4 weeks ago

Hey @yhl48, do you want to submit a fix ?

yhl48 commented 4 weeks ago

@tchaton happy to do that :) will send in a PR soon

tchaton commented 4 weeks ago

@yhl48, you rock ! Thanks.