Open bnlawrence opened 8 months ago
I've just pushed up a fix for gzip compression in 0d7bf55. This is an interim solution in that it works, but all other compression and filtering options are not well served. It depends on the new hdf2numcodec.py
which is informed by both kerchunks _decode_filter
and the filtering done in pyfive
. We probably need unit tests around all the supported compression and filtering options, recognising we can't fail over to pyfive for "non active" reads if we use more than pyfive supports.
Key spots in the code are :
Ok d73fcd8 has split compression and filters as our API currently requires. It formally only supports shuffle filtering and gzip compression.
If we use pyfive for the HDF file reading, we get a pure python implementation which we can use without any worries about threadsafety. It also allows us direct access to the b-tree without the need for kerchunk.
The use of pyfive in this way depends on the pull request Bryan has submitted to pyfive (or we have to use our own fork).
An initial approach is in a new branch.