HDFGroup / h5pyd

h5py distributed - Python client library for HDF Rest API
Other
110 stars 39 forks source link

iterate over chunks on dataset initialization #99

Closed jananzhu closed 3 months ago

jananzhu commented 3 years ago

Fixes #88 Hi, we are running into the above issue with the HSDS server returning 413 errors when we try to write HDF5 files that are around ~200MB to HSDS. We've implemented the solution suggested in the issue here and it seems to resolve the issue for us.

jananzhu commented 3 years ago

create_dataset is failing for scalar datasets after this change, but I think it's uncovering an existing issue with the ChunkIterator for scalar datasets.

Traceback (most recent call last):
  File "test_complex_numbers.py", line 57, in test_complex_attr
    dset = f.create_dataset('x', data=5)
  File "/home/travis/virtualenv/python3.7.1/lib/python3.7/site-packages/h5pyd-0.8.2-py3.7.egg/h5pyd/_hl/group.py", line 338, in create_dataset
    for chunk in it:
  File "/home/travis/virtualenv/python3.7.1/lib/python3.7/site-packages/h5pyd-0.8.2-py3.7.egg/h5pyd/_apps/chunkiter.py", line 111, in __next__
    if self._chunk_index[0] * self._layout[0] >= self._shape[0]:
IndexError: tuple index out of range

The if self._layout == () block in ChunkIterator.__next__ seems like it's intended to catch this case before it reaches the code in the traceback, but the chunk size tuple for a scalar dataset in HSDS currently returns (1,). Perhaps the check could be replaced by if self._shape == ()?