Lightning-AI / litdata

Streamline data pipelines for AI. Process datasets across 1000s of machines, and optimize data for blazing fast model training.
Apache License 2.0
249 stars 24 forks source link

When providing a local path to the optimize method, make it work in a distributed settings for Jobs #193

Open tchaton opened 3 days ago

tchaton commented 3 days ago

🚀 Feature

Motivation

Right now, it is possible to do this in a Lightning Studio

optimize(
    output_dir="./optimized_data"
)

However, when running this code in a machine machine jobs, this won't properly work.

Instead, we should convert the output_dir to an s3 path pointing to the node 0 artifacts path + the user provided output_dir

Pitch

Alternatives

Additional context