NeurodataWithoutBorders / lindi

Linked Data Interface (LINDI) - cloud-friendly access to NWB data
BSD 3-Clause "New" or "Revised" License
1 stars 1 forks source link

Support additional `create_dataset` kwargs #55

Open rly opened 2 months ago

rly commented 2 months ago

The LINDI group writer create_dataset supports the kwargs chunks, compression, and compression_opts and the compression-related kwargs are allowed only if compression=="gzip". This is the most popular use case.

HDMF supports a few additional kwargs for create_dataset via H5DataIO, e.g., compression that is not gzip, such as through hdf5plugin (see pynwb tutorial), compression_opts, maxshape, fillvalue, shuffle, fletcher32. In practice, fillvalue and shuffle are rarely if ever used. Blosc compression through hdf5plugin is sometimes used because it is usually better than gzip. Maxshape will be used more often, pending an upcoming change https://github.com/hdmf-dev/hdmf/pull/1064. Note that h5py.Dataset supports even more kwargs that HDMF does not currently support.

I started adding support for additional compression algorithms as a nice exercise and will see how far I can get. I'll update here in a few days. This is not high priority but it would be nice to have more complete support for writing/appending datasets via PyNWB to a LINDI file.