Imageomics / distributed-downloader

MPI-based distributed downloading tool for retrieving data from diverse domains.
MIT License
2 stars 0 forks source link

Memory Allocation? #1

Open egrace479 opened 4 months ago

egrace479 commented 4 months ago

This downloader loads each batch into memory before writing the files to disk, since our initial storage plan was to use HDF5. Since we decided to transfer the dataset into either HDF5 or webdataset format after completing the download to avoid data backup complications and difficulties with parallel data writes, this setup is open to change.

The question is whether it is worth changing to writing image data as it comes down to reduce the memory bottleneck and utilize cores more efficiently.