cbyrohl / scida

scida is an out-of-the-box analysis tool for large scientific datasets. It primarily supports the astrophysics community, focusing on cosmological and galaxy formation simulations using particles or unstructured meshes, as well as large observational datasets. This tool uses dask, allowing analysis to scale.
https://scida.io
MIT License
26 stars 4 forks source link

Distributed HDF5 access #157

Open cbyrohl opened 6 months ago

cbyrohl commented 6 months ago

We need to rework distributed hdf5 access once more, which appears to have stopped working.

TypeError: h5py objects cannot be pickle

There exists h5pickle, but this does not currently work as drop-in replacement. The issue appears to be related to https://github.com/DaanVanVugt/h5pickle/issues/14.

A naive strategy would to provide only the filepath and hdf5 path to the dataset, opening/reading/closing the file for every dask chunk. The repeated opening and closing introduces a performance penalty. See discussion here