The Dataflux Accelerated Dataloader for PyTorch with GCS is an effort to improve ML-training efficiency when using data stored in GCS for training datasets. Using the Dataflux Accelerated Dataloader for training is up to 3X faster when the dataset consists of many small files (e.g., 100 - 500 KB).
While running these benchmarks for fsspec, Matt observed that the per-step time was much higher than the documented time. I re-ran one of the benchmarks and confirmed that. For now, I'm updating the step time documented here. I will figure out what configured step time will lead to a per-step time of 1s, re-run the benchmarks and update the numbers.
While running these benchmarks for fsspec, Matt observed that the per-step time was much higher than the documented time. I re-ran one of the benchmarks and confirmed that. For now, I'm updating the step time documented here. I will figure out what configured step time will lead to a per-step time of 1s, re-run the benchmarks and update the numbers.