Closed trmt closed 1 year ago
Thanks for reporting this!
This fixes the class name issue: https://github.com/HDFGroup/h5pyd/pull/129.
But if I run something like:
import h5pyd
f = h5pyd.File("/home/john/fletch32.h5", mode="w") dset = f.create_dataset("dset", (100, 100), dtype="i4", compression="gzip", fletcher32=True)
I still get an error from HSDS;
ERROR:root:POST error: 400
ERROR:root:POST error - status_code: 400, reason: filter {'class': 'H5Z_FILTER_FLETCHER32', 'id': 3, 'name': 'fletcher32'} is not supported
Traceback (most recent call last):
File "/Users/john/projects/h5pyd/make_fletcher32.py", line 4, in <module>
dset = f.create_dataset("dset", (100, 100), dtype="i4", compression="gzip", fletcher32=True)
File "/Users/john/projects/h5pyd/h5pyd/_hl/group.py", line 361, in create_dataset
dsid = dataset.make_new_dset(self, shape=shape, dtype=dtype, data=data, **kwds)
File "/Users/john/projects/h5pyd/h5pyd/_hl/dataset.py", line 294, in make_new_dset
rsp = parent.POST(req, body=body)
File "/Users/john/projects/h5pyd/h5pyd/_hl/base.py", line 1041, in POST
raise IOError(rsp.reason)
OSError: filter {'class': 'H5Z_FILTER_FLETCHER32', 'id': 3, 'name': 'fletcher32'} is not supported
Currently other than compression filters, only shuffle is supported in HSDS.
Could you provide some background on your use case?
When using AWS S3 or. Azure Blob storage, I don't think data corruption should normally be an issue. In these systems each chunk will get replicated across different drives and they internally use checksums (ETags). Using fletcher32 when running HSDS with posix might be beneficial, but even here I think most modern filesystem (Ext4, xfs) use checksum internally.
For compatibility, we could have HSDS accept the filter option and just ignore it when reading and writing data. Not sure if that's acceptable or not.
One area of research is the migration of existing data (which can contains datasets with different filters include fletcher32) from FS to cloud. Since I use older versions of HSDS and h5pyd (Nov 2020) for several reasons, I can't reproduce that error. In my installation (after fix that issue) HSDS saves fletcher32 metadata when loading data with hsload, and loses them after download data with hsget. This algorithm suits me so far.
Ok - since this is working for you I'll close this issue. Change is merged to master. If anyone needs an HSDS update to actually support fletcher, please open an issue in the HSDS repo and we can discuss.
Is that valid filter class https://github.com/HDFGroup/h5pyd/blob/1bd5cf9ce4a8053ecd30e224604bcefc0e567f72/h5pyd/_hl/filters.py#L137 or it should be H5Z_FILTER_FLETCHER32 ?