NeurodataWithoutBorders / lindi

Linked Data Interface (LINDI) - cloud-friendly access to NWB data
BSD 3-Clause "New" or "Revised" License
5 stars 1 forks source link

Fix handling of datasets with shape (0,0,...) #77

Closed rly closed 6 months ago

rly commented 6 months ago

Fix #76.

For an HDF5 dataset with shape (0,0,0), no chunks should be written. This was the previous behavior before #68 and seems reasonable.

For an HDF5 dataset with shape (0,0,0), the chunks key in .zarray was being set to (0,0,0). However, that is not allowed in Zarr. Accessing an array with such chunking results in a ZeroDivisionError: division by zero when trying to access the array. I set it to (1,1,1) instead which is what Zarr does for an array of that shape.

>>> import zarr
>>> a = zarr.empty(shape=(0,0,0))
>>> a.shape
(0, 0, 0)
>>> a.chunks
(1, 1, 1)
>>> a[:]
array([], shape=(0, 0, 0), dtype=float64)
>>> a = zarr.empty(shape=(0,0,0), chunks=(0,0,0))
>>> a.chunks
(0, 0, 0)
>>> a[:]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/rly/mambaforge/envs/test4/lib/python3.11/site-packages/zarr/core.py", line 800, in __getitem__
    result = self.get_basic_selection(pure_selection, fields=fields)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/rly/mambaforge/envs/test4/lib/python3.11/site-packages/zarr/core.py", line 926, in get_basic_selection
    return self._get_basic_selection_nd(selection=selection, out=out, fields=fields)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/rly/mambaforge/envs/test4/lib/python3.11/site-packages/zarr/core.py", line 966, in _get_basic_selection_nd
    indexer = BasicIndexer(selection, self)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/rly/mambaforge/envs/test4/lib/python3.11/site-packages/zarr/indexing.py", line 339, in __init__
    dim_indexer = SliceDimIndexer(dim_sel, dim_len, dim_chunk_len)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/rly/mambaforge/envs/test4/lib/python3.11/site-packages/zarr/indexing.py", line 181, in __init__
    self.nchunks = ceildiv(self.dim_len, self.dim_chunk_len)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/rly/mambaforge/envs/test4/lib/python3.11/site-packages/zarr/indexing.py", line 167, in ceildiv
    return math.ceil(a / b)
                     ~~^~~
ZeroDivisionError: division by zero

I also fixed an issue flagged by pyright in LindiReferenceFileSystemStore.__contains__:

error: Method "__contains__" overrides class "Mapping" in an incompatible manner
    Parameter 2 type mismatch: base parameter is type "object", override parameter is type "str"`
codecov-commenter commented 6 months ago

Codecov Report

Attention: Patch coverage is 91.66667% with 1 lines in your changes are missing coverage. Please review.

Project coverage is 79.47%. Comparing base (731fcb5) to head (ffea54a).

Files Patch % Lines
...ndi/LindiH5pyFile/LindiReferenceFileSystemStore.py 66.66% 1 Missing :warning:
Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #77 +/- ## ========================================== + Coverage 79.43% 79.47% +0.03% ========================================== Files 30 30 Lines 2256 2265 +9 ========================================== + Hits 1792 1800 +8 - Misses 464 465 +1 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

magland commented 6 months ago

Looks good to me!

rly commented 6 months ago

Thanks. Released in version 0.3.9.