NCAS-CMS / PyActiveStorage

Python implementation of Active Storage
2 stars 2 forks source link

Migration (?) to the use of pyfive #188

Open bnlawrence opened 8 months ago

bnlawrence commented 8 months ago

If we use pyfive for the HDF file reading, we get a pure python implementation which we can use without any worries about threadsafety. It also allows us direct access to the b-tree without the need for kerchunk.

The use of pyfive in this way depends on the pull request Bryan has submitted to pyfive (or we have to use our own fork).

An initial approach is in a new branch.

bnlawrence commented 8 months ago

I've just pushed up a fix for gzip compression in 0d7bf55. This is an interim solution in that it works, but all other compression and filtering options are not well served. It depends on the new hdf2numcodec.py which is informed by both kerchunks _decode_filter and the filtering done in pyfive. We probably need unit tests around all the supported compression and filtering options, recognising we can't fail over to pyfive for "non active" reads if we use more than pyfive supports.

Key spots in the code are :

bnlawrence commented 8 months ago

Ok d73fcd8 has split compression and filters as our API currently requires. It formally only supports shuffle filtering and gzip compression.