Currently the high-level features of this repository focus on on-the-fly dataloading, which I think is absolutely necessary for the largest of datasets (multidigit terabyte sizes). However, in most cases datasets will probably be much smaller, where it would be easier and faster to clone the entire dataset to local storage before proceeding with some pipeline.
Currently the high-level features of this repository focus on on-the-fly dataloading, which I think is absolutely necessary for the largest of datasets (multidigit terabyte sizes). However, in most cases datasets will probably be much smaller, where it would be easier and faster to clone the entire dataset to local storage before proceeding with some pipeline.