Lightning-AI / litdata

Streamline data pipelines for AI. Process datasets across 1000s of machines, and optimize data for blazing fast model training.
Apache License 2.0
249 stars 24 forks source link

Add support for parquet files for storing the chunks #191

Open tchaton opened 3 days ago

tchaton commented 3 days ago

This would enable users to avoid converting their dataset if they already have their dataset as parquet folders. We would need to run an indexation but this isn't too painful.