Closed magland closed 2 months ago
Attention: Patch coverage is 47.61905%
with 22 lines
in your changes are missing coverage. Please review.
Project coverage is 81.53%. Comparing base (
bc71a9d
) to head (22fc137
).
Files | Patch % | Lines |
---|---|---|
...indi/LindiH5pyFile/writers/LindiH5pyGroupWriter.py | 46.66% | 16 Missing :warning: |
...ndi/conversion/create_zarr_dataset_from_h5_data.py | 50.00% | 6 Missing :warning: |
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
This looks reasonable to me, but how this would be used? It seems like this would be used for write outside of PyNWB, which is fine, but I am not totally seeing the use case.
This looks reasonable to me, but how this would be used? It seems like this would be used for write outside of PyNWB, which is fine, but I am not totally seeing the use case.
@rly I am thinking specifically about three codecs I would like to use. It's true that this would not be possible via pynwb at this point (it would need to be manual creating of datasets) but maybe at some point in the future this could be somehow supported by pynwb.
The three codecs are:
tag @bendichter
It's true that this would not be possible via pynwb at this point
With the ZarrIO
backend you can specify dataset codecs via ZarrDataIO
https://github.com/hdmf-dev/hdmf-zarr/blob/4c9af4a56336da2c421531dfa4487a4b1ff7977a/src/hdmf_zarr/utils.py#L399 If I remember correctly you are here going through HDF5IO
so you would need a way to pass the code through via H5DataIO
https://github.com/hdmf-dev/hdmf/blob/d85d0cbc36d2e0fdb25e8fbea14d58ba7bf24a40/src/hdmf/backends/hdf5/h5_utils.py#L431 You could make a derived class for H5DataIO
for LINDI.
note: made a plan with ryan...
After chatting with @rly , we decided not to introduce a new special function, but to allow the compression parameter (which is supported by h5py) to be either a string or a numcodecs Codec. In the case of a Codec, this is passed through to zarr as the compressor parameter. In the case of a string, the following logic is used:
elif isinstance(compression, str):
if compression == 'gzip':
if compression_opts is None:
level = 4 # default for h5py
elif isinstance(compression_opts, int):
level = compression_opts
else:
raise Exception(f'Unexpected type for compression_opts: {type(compression_opts)}')
_zarr_compressor = numcodecs.GZip(level=level)
else:
raise Exception(f'Compression {compression} is not supported')
Then _zarr_compressor is passed through to zarr.
Looks good. We can add support for other h5py compressions, including from hdf5plugin, later as needed.
The goal here is to allow writing datasets with Zarr-supported codecs that are not available with hdf5 (e.g., zstd and custom codecs). Obviously, the h5py API does not have such a mechanism.
I considered adding a
zarr_compressor
parameter tocreate_dataset()
but I thought it's best not to make the interface of that function differ between LindiH5pyFile and h5py.File. So instead I made a new function calledcreate_dataset_with_zarr_compressor(..., compressor=codec)