Closed HHenryD closed 1 year ago
Assuming you will use AISFileLister and loads AISFileLoader.
With AIS in front of your Cloud bucket(s), you save both cost and time as AIS issues cold GET requests only once on a per-object basis. Multiple alternative data pre-loading mechanisms are also supported. Users can store data according to the per-bucket configurable policies (erasure coding, LRU, etc.).
I am interested in using PT+AIS to iterate over Google Cloud buckets using the Iterable DataPipes API in PT 1.12, and was wondering if there were retrieval costs in using AIS. Aside from paying for the storage, how cost-friendly would it be to use AIS+Google Cloud (or other services like Webdataset) for training over long durations?