NVIDIA / aistore

AIStore: scalable storage for AI applications
https://aistore.nvidia.com
MIT License
1.23k stars 164 forks source link

WebDataset may not be needed starting PT 1.12 #106

Closed elgalu closed 2 years ago

elgalu commented 2 years ago

You may want to update this section, PyTorch >= 1.12 can load data directly from an S3-compatible store using https://github.com/aws/aws-sdk-cpp

https://github.com/NVIDIA/aistore/blob/cb8798307c906d730b23e3437a63e65a9b5da570/README.md?plain=1#L60-L62

VirrageS commented 2 years ago

We recently pushed our own custom loaders into PyTorch's DataPipes package, see https://pytorch.org/data/main/generated/torchdata.datapipes.iter.AISFileLoader.html#torchdata.datapipes.iter.AISFileLoader and https://pytorch.org/data/main/generated/torchdata.datapipes.iter.AISFileLister.html#torchdata.datapipes.iter.AISFileLister.

@gaikwadabhishek @alex-aizman I think we should update README.md section to mention that.

gaikwadabhishek commented 2 years ago

Hey @elgalu , These are just different things. We are still developing connectors (plugins) for PyTorch with AIStore which @VirrageS mentioned above. WebDataset has its own advantages.

If you want to load data from remote cloud backends you can try aisio.py. It works very similar to that of s3io.py. The advantages of aisio over s3io -

Example of AIStore Iterable Datapipe: https://aiatscale.org/blog/2022/07/12/aisio-pytorch