Closed mike-gee closed 2 years ago
It seems it is possible to leverage the REST API. Is this best practice? https://aiatscale.org/docs/http-api
I also notice that files with a forward-slash in their name are stripped when requested.
To reproduce:
ais advanced gen-shards 'ais://demo/parent/shard-{0..9}.tar' --fsize 100 --fcount 100
curl -L -X GET 'http://localhost:8080/ais/demo/parent/shard-0.tar'
{"status":404,"message":"bucket \"ais://demo\" does not exist","method":"GET","url_path":"/v1/objects/demo/parentshard-0.tar","remote_addr":"127.0.0.1:xxxx","caller":"","node":"p[xxxx]"}
To access AIStore, just use http:
or ais:
URLs.
ds = wds.WebDataset("ais://bucket/data.tar")
In older versions (or if you need special options), you can always use a pipe:
URL:
pipe:ais get ais://etc -
In torchdata
, you can use the AIStore classes or the upcoming PipeOpener class. Until PipeOpener is merged, you can use your own .map(ais_opener)
function and add it to the pipeline.
Is there an example of usage of AI Store with WebDataset? I can't seem to find any examples with the new API.
Streaming works using this:
https://github.com/NVIDIA/aistore/tree/master/sdk/python/aistore/pytorch#readme
But it would be great to be able to leverage sharding & WebDataset.