Closed hmenke closed 3 years ago
Dear @hmenke,
Do you have an example where the higher compression levels yield a substantial reduction of the archive size?
Our experience has been that the gains of the higher compression levels are marginal, while resulting in substantial increase of performance cost for read/write operations.
Hm, you're right. I just tested this and while I can not confirm a huge performance reduction, the returns in compression are diminishing (comparing level 1 vs. 6).
Anyway, this PR does not change the default compression level of 1 and other libraries like h5py also provide an interface to compression: https://docs.h5py.org/en/stable/high/dataset.html#filter-pipeline
In fact, this PR can also be viewed the other way around in that it now allows to disable compression resulting in a considerable speedup (2.3x for my little test case).
The goal of the Python layer is not to provide an exhaustive API like the one of h5py, but instead to provide a high-level API that allows for easy read/write of more complicated objects like Green functions.
Do you have a particular use-case where disabling the compression at the Python level would be helpful? To our experience, the hdf5 read/write operations are usually not performance-critical operations, even for large objects.
Looks like this is a lot less useful than I hoped it to be.
The deflate compression of HDF5 supports multiple levels (0-9) but currently only level 1 is being used for array data. Only applying compression for array data is reasonable but sometimes I wish a higher level could be chosen, especially when there are quantities that are mostly zero. This PR makes the compression level adjustable on group creation and adds a few assertions to ensure that a valid value is chosen.
Unfortunately, at this point the compression level does not round-trip through the file, i.e. when loading a dataset and inspecting the compression level it is the one that the group was created with (default: 1) and not the one from the file. However, it seems that currently no filter information is read from the HDF5 file at all, so I left it at that.
https://support.hdfgroup.org/HDF5/doc/RM/RM_H5P.html#Property-SetDeflate