Lightning-AI / litdata

Transform datasets at scale. Optimize datasets for fast AI model training.
Apache License 2.0
253 stars 23 forks source link

TPU support #79

Open miguelalba96 opened 3 months ago

miguelalba96 commented 3 months ago

🚀 Feature

TPU support

Motivation

Does litdata supports TPU environments, specifically when using lighting fabric?

Additional context

I have >16M image-text pairs I am writing in mosaic-ml streaming format to train contrastive models, I am working with lighting fabric to train using DDP in GCP and I want to move to TPU training. mosaic-ml streaming dataset doesn't support TPU (afaik), all of this bring me to the questions:

github-actions[bot] commented 3 months ago

Hi! thanks for your contribution!, great first issue!

tchaton commented 3 months ago

Hey @miguelalba96,

I haven't tried with TPU. Maybe @carmocca would know more.

carmocca commented 3 months ago

litdata is meant to be used with a regular DataLoader, so there's nothing specific to do on a TPU machine. If you use Fabric or PyTorch Lightning, that will take care of enabling the DistributedSampler or do any required XLA steps, but these are common to all TPU runs, not just those using litdata

dasoto commented 2 months ago

I will recommend setting this env variables: DATA_OPTIMIZER_GLOBAL_RANK DATA_OPTIMIZER_NUM_WORKERS DATA_OPTIMIZER_NUM_NODES

Otherwise the StreamDataloader will not be aware of the distribution.

tchaton commented 2 months ago

Yes, as @dasoto mentioned, I didn't add wiring for TPU env detection. Feel free to contribute support for it if you try litdata on TPUs.