HDFGroup / hsds

Cloud-native, service based access to HDF data
https://www.hdfgroup.org/solutions/hdf-kita/
Apache License 2.0
131 stars 53 forks source link

Selection can fail for H5D_CONTIGOUS_REF datasets #393

Closed jreadey closed 1 month ago

jreadey commented 2 months ago

In some cases selections on a H5D_CONTIGUOUS_REF dataset can fail. Looks like HSDS is sending a range get pass the end of the HDF5 file.

E.g.:


import h5pyd

filename = "/nrel/ncdb/4km-Hourly-CONUS/v1.0.0/RCP4.5/ncdb_rcp4.5_2006.h5"
f = h5pyd.File(filename, bucket="nrel-pds-hsds")

meta = f["meta"]
print(meta)
chunk_shape = meta.chunks[0]
layout = meta.id.dcpl_json["layout"]
print(layout)
index = 506167
chunk_id = index // chunk_shape
item = meta[index]
print(f"meta[{index}]: {item} chunk: {chunk_id}")
index += 1
chunk_id = index // chunk_shape
print(f"chunk_id: {chunk_id}")
item = meta[index]  # dies here
print(f"meta[{index}]: {item}")

This is the corresponding log from the DN:

INFO> s3Client.get_object(4km-Hourly-CONUS/v1.0.0/RCP4.5/ncdb_rcp4.5_2006.h5[55010906968:55013029608] bucket=nrel-pds-ncdb) start=1726758646.2565 f
inish=1726758651.1452 elapsed=4.8887 bytes=2121210
INFO> read: 2121210 bytes for key: 4km-Hourly-CONUS/v1.0.0/RCP4.5/ncdb_rcp4.5_2006.h5
WARN> requested 2122640 bytes but got 2121210 bytes
DEBUG> _uncompress(compressor=None, shuffle=0)
ERROR> Unable to retrieve chunk array: cannot reshape array of size 16317 into shape (16328,)
jreadey commented 1 month ago

Fix is here: https://github.com/HDFGroup/hsds/pull/396