NVIDIA / aistore

AIStore: scalable storage for AI applications
https://aistore.nvidia.com
MIT License
1.23k stars 164 forks source link

Query on aistore.pytorch.Dataset #95

Closed KavyaPuranik closed 2 years ago

KavyaPuranik commented 2 years ago

Hi,

I'm following this doc to train Imagenet Dataset: https://github.com/NVIDIA/aistore/blob/cc6e029721ef159f3df516ec9f8e3065ef6ac54d/docs/_posts/2021-10-22-ais-etl-2.md

I've a query specifically related to this part. train_loader = torch.utils.data.DataLoader( aistore.pytorch.Dataset( "http://aistore-sample-proxy:51080", # AIS IP address or hostname Bck("imagenet"), prefix="train/", transform_id="my-first-etl", transform_filter=lambda object_name: object_name.endswith('.jpg'), ), batch_size=args.batch_size, shuffle=True, num_workers=args.workers, pin_memory=True)

I see a type error, when I try to use it as is in the training code. pydantic.main.BaseModel.__init__ TypeError: __init__() takes exactly 1 positional argument (2 given) Do you any insights on how to mitigate this error?

Also I found that the implementation of aistore.pytorch.Dataset is present in one of the development branch(post-3).

gaikwadabhishek commented 2 years ago

Hi Kavya, I see you were trying to use the AIS implementation of Dataset. The team is working on this right now and we have created AISFileLister and AISFileLoader in aistore.pytorch. PyTorch is encouraging to use DataPipes now instead of Datasets. But if you need a good implementation of AISDataset, maybe you can explain your example or use case to us and we can create something which you can use.

gaikwadabhishek commented 2 years ago

Check this https://github.com/NVIDIA/aistore/blob/master/sdk/python/aistore/pytorch/aisio.py

KavyaPuranik commented 2 years ago

Hi,

I was able to resolve the issue by making use of webdataset. I'll try out the fileloader approach as well.

I have couple of other queries related to aistore etl. I have a usecase where in I'm trying to offload the dataset preprocessing part to etl. Will you be able to provide support for etl in the python client?

gaikwadabhishek commented 2 years ago

Hi @KavyaPuranik Yes, we are working on adding support for ETL in Python SDK

gaikwadabhishek commented 2 years ago

Hi @KavyaPuranik https://pypi.org/project/aistore/1.0.3

The latest Python SDK release contains APIs related to ETL.

Documentation and Usage:
https://aiatscale.org/docs/python-api#etl https://github.com/NVIDIA/aistore/blob/master/sdk/python/README.md#etls https://github.com/NVIDIA/aistore/blob/master/sdk/python/sdk-etl-tutorial.ipynb