This downloader loads each batch into memory before writing the files to disk, since our initial storage plan was to use HDF5.
Since we decided to transfer the dataset into either HDF5 or webdataset format after completing the download to avoid data backup complications and difficulties with parallel data writes, this setup is open to change.
The question is whether it is worth changing to writing image data as it comes down to reduce the memory bottleneck and utilize cores more efficiently.
This downloader loads each batch into memory before writing the files to disk, since our initial storage plan was to use
HDF5
. Since we decided to transfer the dataset into eitherHDF5
orwebdataset
format after completing the download to avoid data backup complications and difficulties with parallel data writes, this setup is open to change.The question is whether it is worth changing to writing image data as it comes down to reduce the memory bottleneck and utilize cores more efficiently.