HDFGroup / h5pyd

h5py distributed - Python client library for HDF Rest API
Other
110 stars 39 forks source link

`hsload` fails on empty data sets (with a dimension of length 0) #116

Closed jonrkarr closed 1 year ago

jonrkarr commented 2 years ago

Below is an error we encountered. The error is that hsload fails on data sets that have a dimension of length 0.

(While I'd expect HSDS to be able to handle this, incidentally this error was actually helpful to us! This alerted us to a case where simulation data unexpectedly wasn't produced due to an error in our code.)

Traceback (most recent call last):
  File "h5py/h5o.pyx", line 302, in h5py.h5o.cb_obj_simple
  File "/home/FCAM/crbmapi/.local/lib/python3.6/site-packages/h5py/_hl/group.py", line 591, in proxy
    return func(name, self[name])
  File "/usr/local/lib/python3.6/site-packages/h5pyd/_apps/utillib.py", line 674, in object_create_helper
    create_dataset(obj, ctx)
  File "/usr/local/lib/python3.6/site-packages/h5pyd/_apps/utillib.py", line 459, in create_dataset
    fillvalue=fillvalue, scaleoffset=scaleoffset)
  File "/usr/local/lib/python3.6/site-packages/h5pyd/_hl/group.py", line 337, in create_dataset
    dsid = dataset.make_new_dset(self, shape=shape, dtype=dtype, **kwds)
  File "/usr/local/lib/python3.6/site-packages/h5pyd/_hl/dataset.py", line 129, in make_new_dset
    raise ValueError(errmsg)
ValueError: Chunk shape must not be greater than data shape in any dimension. (6, 256) is not compatible with (24, 0)

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/bin/hsload", line 33, in 
    sys.exit(load_entry_point('h5pyd==0.8.4', 'console_scripts', 'hsload')())
  File "/usr/local/lib/python3.6/site-packages/h5pyd/_apps/hsload.py", line 314, in main
    load_file(fin, fout, verbose=verbose, dataload=dataload, s3path=s3path, compression=compression, compression_opts=compression_opts)
  File "/usr/local/lib/python3.6/site-packages/h5pyd/_apps/utillib.py", line 714, in load_file
    fin.visititems(object_create_helper)
  File "/home/FCAM/crbmapi/.local/lib/python3.6/site-packages/h5py/_hl/group.py", line 592, in visititems
    return h5o.visit(self.id, proxy)
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
loichuder commented 2 years ago

Could be linked to https://github.com/HDFGroup/h5pyd/pull/114

jonrkarr commented 2 years ago

Yes, #114 sounds very similar.

jreadey commented 1 year ago

This should be fixed in this commit: https://github.com/HDFGroup/h5pyd/commit/5a9193af6ae99a204a7d277d9983431e712f7417.

I'll update the issue when this gets merged with master.

jreadey commented 1 year ago

Fix is in master now.

jreadey commented 1 year ago

Closing - fix is in the 0.12.0 release on PyPI.

loichuder commented 1 year ago

Sorry, I was a bit late to check.

I still get an error when running hsload --link on a file containing an empty dataset (h5py.Empty):

  File ".../h5pyd/_apps/utillib.py", line 737, in create_dataset
    tgt_shape.extend(dobj.shape)
TypeError: 'NoneType' object is not iterable
jreadey commented 1 year ago

Ah, I see - reopening.

jreadey commented 1 year ago

This should fix it: https://github.com/HDFGroup/h5pyd/commit/866c0be4063a1d744df596a8296b95a2b505ee15.

loichuder commented 1 year ago

Nope, still the same error.

Anyway, this is no big deal: hsload now works for scalar datasets and I think h5py.Empty is really uncommon.

jreadey commented 1 year ago

@loichuder - where you testing from master? The commit above was in the aggregate branch. Anyway, I've merged the changes into master and pushed out a new release as 0.12.1.

loichuder commented 1 year ago

Yes tried with the aggregate branch at the time and now with master, still the same issue of https://github.com/HDFGroup/h5pyd/issues/116#issuecomment-1336931275 since dobj.shape is None for h5py.Empty.

No big deal as I said but for the sake of it, here is what I have done to encounter the issue:

with h5py.File('empty.h5', "w") as h5file: h5file.create_dataset("empty", data=h5py.Empty)

- Loading with `hsload --link`:

hsload --link [...] files/empty.h5 [...]

jreadey commented 1 year ago

@loichuder - ok I see. This latest checkin should really fix it now! It's on master and in PyPI as version 0.12.2.

jreadey commented 1 year ago

Closing this issue as it should be fixed in 0.12.2 and later.