AllenInstitute / bmtk

Brain Modeling Toolkit
https://alleninstitute.github.io/bmtk/
BSD 3-Clause "New" or "Revised" License
272 stars 88 forks source link

gzip compression for `sourceNetwork_targetNetwork_edges.h5`? #271

Closed flomlo closed 1 year ago

flomlo commented 1 year ago

Hi,

the saved edges produced by save_edges tend to consume quite a bit of memory. As an example, the mouse_v1 reconstruction with a fraction=0.50 parameter is 742MB. This quickly becomes a problem (or at least a nuisance) when analyzing bigger networks or a few of them.

This could be easily reduced by a factor of ~10 by enabling gzip-compression on the datasets inside of the hdf5-file. As gzip comes with hdf5, this does not introduce an additional requirement. The hdf5-implementations known to me (for Rust and for python) accept gzip-compressed datasets without any further adaptations to the code.

Are there any principal reasons against using gzip-compressed datasets in the .h5 file?

If not, I'ld volunteer to supply a patch (once I've figured out what to modify. Where the fuck does _save_edges in https://github.com/AllenInstitute/bmtk/blob/2078a4134dba74a89bdb4edc6cf224a65290d782/bmtk/builder/network_adaptors/network.py#L658 lead to?).

shixnya commented 1 year ago

Hi flomlo,

Sorry for our late response. We have implemented options to compress the network h5 files and spikes h5 files (defaulting to gzip level 4). If you pull the most recent code from the 'develop' branch, it'll be available.

For a simple network files that does not contain individual weights, we do indeed see factor of 5-6 compression, and in our environment, it does not seem to impact the execution time.

Thank you for a great suggestion.

flomlo commented 1 year ago

Oh lovely! I'll give it a try soonishly and will report back / reopen the issue if there is an issue (which I think is quite unlikely).

Thanks for implementing it - it will save approx a terrabyte on our side :)

shixnya commented 1 year ago

Yes. Please let us know if there are any issues. Glad to hear that it'll be helpful. We appreciate it. By the way, if you already have many network files, h5repack can do compression of the existing h5 files, and the compressed files can be directly used for simulation (even with an old BMTK) as long as they are compressed with gzip.

I'll close this issue for now, but feel free to reopen if there is more to discuss.