Quansight-Labs / ndindex

A Python library for manipulating indices of ndarrays
https://quansight-labs.github.io/ndindex/
MIT License
97 stars 12 forks source link

Fix wrong result with as_subchunks and num_subchunks with array indices #172

Closed asmeurer closed 8 months ago

asmeurer commented 8 months ago

Fixes #170

asmeurer commented 8 months ago

FYI @peytondmurray @ArvidJB, this bug affects versioned-hdf5:

>>> import h5py
>>> import numpy as np
>>> from versioned_hdf5 import VersionedHDF5File
>>> f = h5py.File('test.hdf5', 'w')
>>> file = VersionedHDF5File(f)
>>> with file.stage_version('version1') as vf:
...     data = np.arange(4).reshape((2, 2))
...     d = vf.create_dataset('test', data=data, chunks=(1, 1))
>>> file.close()
>>> f.close()
>>> f = h5py.File('test.hdf5', 'a')
>>> file = VersionedHDF5File(f)
>>> with file.stage_version('version2') as vf:
...     d = vf['test']
...     d[[[False, True], [True, True]]] = -1
Traceback (most recent call last):
  File "<stdin>", line 3, in <module>
  File "/Users/aaronmeurer/Documents/versioned-hdf5/versioned_hdf5/wrappers.py", line 1228, in __setitem__
    self.dataset.__setitem__(index, value)
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "/Users/aaronmeurer/Documents/versioned-hdf5/versioned_hdf5/wrappers.py", line 867, in __setitem__
    index = idx.as_subindex(c)
            ^^^^^^^^^^^^^^^^^^
  File "/Users/aaronmeurer/miniconda3/envs/versioned-hdf5/lib/python3.12/site-packages/ndindex/booleanarray.py", line 168, in as_subindex
    return Tuple(*self.array.nonzero()).as_subindex(index)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/aaronmeurer/miniconda3/envs/versioned-hdf5/lib/python3.12/site-packages/ndindex/tuple.py", line 730, in as_subindex
    raise ValueError("Indices do not intersect")
ValueError: Indices do not intersect

With this PR:

>>> import h5py
>>> import numpy as np
>>> from versioned_hdf5 import VersionedHDF5File
>>> f = h5py.File('test.hdf5', 'w')
>>> file = VersionedHDF5File(f)
>>> with file.stage_version('version1') as vf:
...     data = np.arange(4).reshape((2, 2))
...     d = vf.create_dataset('test', data=data, chunks=(1, 1))
>>> file.close()
>>> f.close()
>>> f = h5py.File('test.hdf5', 'a')
>>> file = VersionedHDF5File(f)
>>> with file.stage_version('version2') as vf:
...     d = vf['test']
...     d[[[False, True], [True, True]]] = -1
>>> file['version2']['test'][:]
array([[ 0, -1],
       [-1, -1]])