The Dataflux Accelerated Dataloader for PyTorch with GCS is an effort to improve ML-training efficiency when using data stored in GCS for training datasets. Using the Dataflux Accelerated Dataloader for training is up to 3X faster when the dataset consists of many small files (e.g., 100 - 500 KB).
This PR adds basic unit testing and a small scale presubmit integration test for multipart upload. Updates to the demo were made to use command line variables, and README was updated to match new execution methods.
[x] Tests pass
[x] Appropriate changes to documentation are included in the PR
This PR adds basic unit testing and a small scale presubmit integration test for multipart upload. Updates to the demo were made to use command line variables, and README was updated to match new execution methods.