NVIDIA / aistore

AIStore: scalable storage for AI applications
https://aistore.nvidia.com
MIT License
1.23k stars 164 forks source link

Example Usage with WebDataset #102

Closed mike-gee closed 2 years ago

mike-gee commented 2 years ago

Is there an example of usage of AI Store with WebDataset? I can't seem to find any examples with the new API.

Streaming works using this:

https://github.com/NVIDIA/aistore/tree/master/sdk/python/aistore/pytorch#readme

But it would be great to be able to leverage sharding & WebDataset.

mike-gee commented 2 years ago

It seems it is possible to leverage the REST API. Is this best practice? https://aiatscale.org/docs/http-api

I also notice that files with a forward-slash in their name are stripped when requested.

To reproduce:

ais advanced gen-shards 'ais://demo/parent/shard-{0..9}.tar' --fsize 100 --fcount 100
curl -L -X GET 'http://localhost:8080/ais/demo/parent/shard-0.tar'

{"status":404,"message":"bucket \"ais://demo\" does not exist","method":"GET","url_path":"/v1/objects/demo/parentshard-0.tar","remote_addr":"127.0.0.1:xxxx","caller":"","node":"p[xxxx]"}
tmbdev commented 2 years ago

To access AIStore, just use http: or ais: URLs.

ds = wds.WebDataset("ais://bucket/data.tar")

In older versions (or if you need special options), you can always use a pipe: URL:

pipe:ais get ais://etc -

In torchdata, you can use the AIStore classes or the upcoming PipeOpener class. Until PipeOpener is merged, you can use your own .map(ais_opener) function and add it to the pipeline.