Lightning-AI / litdata

Transform datasets at scale. Optimize datasets for fast AI model training.
Apache License 2.0
375 stars 43 forks source link

Add Support for Custom S3 Configuration in s5cmd #392

Closed csy1204 closed 1 month ago

csy1204 commented 1 month ago

🚀 Feature

Currently, custom S3 configurations are only supported in the S3 client. I would like to request that s5cmd also support custom S3 configurations to provide consistency across tools and enhance its flexibility for various use cases.

Motivation

https://github.com/Lightning-AI/litdata/blob/b9aa903bd9c98cd96ee989394fdaa1a38f8036f0/src/litdata/streaming/downloader.py#L52-L56

Pitch

Alternatives

Additional context

236

bhimrazy commented 1 month ago

hi @csy1204, Thank you for submitting this feature request! Currently, It seems that s5cmd primarily supports manual configuration through environment variables.

I'm considering whether it would be better to let users handle the configuration of these variables themselves or to set up the environment just before executing the s5cmd command. Do you have any suggestions or insights on this?

cc: @tchaton @deependujha

tchaton commented 1 month ago

Yes, I think we should pipe the env variables to s5cmd command execution as you said @bhimrazy