scida is an out-of-the-box analysis tool for large scientific datasets. It primarily supports the astrophysics community, focusing on cosmological and galaxy formation simulations using particles or unstructured meshes, as well as large observational datasets. This tool uses dask, allowing analysis to scale.
A naive strategy would to provide only the filepath and hdf5 path to the dataset, opening/reading/closing the file for every dask chunk. The repeated opening and closing introduces a performance penalty. See discussion here
We need to rework distributed hdf5 access once more, which appears to have stopped working.
There exists h5pickle, but this does not currently work as drop-in replacement. The issue appears to be related to https://github.com/DaanVanVugt/h5pickle/issues/14.
A naive strategy would to provide only the filepath and hdf5 path to the dataset, opening/reading/closing the file for every dask chunk. The repeated opening and closing introduces a performance penalty. See discussion here