NCAS-CMS / cf-python

A CF-compliant Earth Science data analysis library
http://ncas-cms.github.io/cf-python
MIT License
119 stars 19 forks source link

Certain cyclic slices fail when subsetting in 3.16.2 vs 3.16.1 #774

Closed mattjbr123 closed 3 months ago

mattjbr123 commented 3 months ago

Changes introduced in v3.16.2 in this commit as part of #760 has caused calls to the subset command to fail when the subset produces indices for (at least?) one cyclic axis in the form of a slice which doesn't satisfy the condition step > 0 and start < 0 and stop > 0 (e.g. slice(-3, 0, 1)) [ CASE 1a ] OR step < 0 and start > 0 and stop < 0 (e.g. slice(0, 2, -1)) [ CASE 1b ] or does satisfy the condition: step > 0 and start < 0 and stop == -size (e.g. slice(-1,-3,1), size=3) [ CASE 2a ] OR step < 0 and stop < 0 and start == -size (e.g. slice(-3, -1, -1), size=3) [ CASE 2b ]

This arises from the following new code in cf.functions (the rest of the modifications to cf.functions in this commit are just refactors as far as I can tell):

    if not (
        (step > 0 and start < 0 and stop > 0)
        or (step < 0 and start > 0 and stop < 0)
    ):
        raise IndexError(f"{index!r} is not a cyclic slice")

in cf.functions.normalize_slice. This function determines whether or not a slice is cyclic, and these lines are designed to catch the cyclic cases after 'normalization' of the slice's start and stop indices by previous lines of the function. In cases 1a and 1b the normalization is skipped as the slices don't satisfy the conditions, in cases 2a and 2b, normalization is performed modifying the start or stop indices of the slice to 0. In both cases an IndexError is now raised by the function causing further modification of the slice and generation of the 'roll' variable in cf.functions.parse_indices to be skipped, whereas in v3.16.1 this modification would still take place, e.g. changing a slice of (-3, 0, 1) to (0, 3, 1) and generating roll = {:-3} If the condition is modified to

    if not (
        (step > 0 and start < 0 and stop >**=** 0)
        or (step < 0 and start >**=** 0 and stop < 0)
    ):
        raise IndexError(f"{index!r} is not a cyclic slice")

the function correctly determines the slice to be cyclic and thus the slice is modified appropriately. Without this modification, an object with a dimension of size 0 is returned when __getitem__ is called on a field's Data object when __getitem__ is called on a field as part of the cf.field.subset process, and thus the following error is raised in cf.field.__getitem__:

IndexError: Indices [array([0, 1, 2, 3]), slice(-3, 0, 1)] result in a subspaced shape of (4, 0), but can't create a subspace of Field that has a size 0 axis

I have run the same example in a cf v3.16.1 and v3.16.2 environment, with debugging enabled and some additional print statements thrown in for good-measure:

# Create example field
example = cf.Field()

axis_ = example.set_construct(cf.DomainAxis(4))
example.set_construct(
    cf.DimensionCoordinate(
        properties={
            "standard_name": 'latitude',
            "units": 'degrees_north',
            "axis": 'Y',
        },
        data=cf.Data([-67.5, -22.5, 22.5, 67.5]),
        bounds=cf.Bounds(data=cf.Data(
        np.array([[-90., -45.],
               [-45., 0.],
               [0., 45.],
               [45., 90.]]))),
    ),
    axes=axis_,
)

axis_ = example.set_construct(cf.DomainAxis(3))
example.set_construct(
    cf.DimensionCoordinate(
        properties={
            "standard_name": 'longitude',
            "units": 'degrees_east',
            "axis": 'X',
        },
        data=cf.Data([-120., 0., 120.]),
        bounds=cf.Bounds(data=cf.Data(
        np.array([[-180., -60.],
               [-60., 60.],
               [60., 180.]]))),
    ),
    axes=axis_,
)

example.set_data(cf.Data(np.zeros((4,3))), axes=('Y', 'X'))```

kwargs = {
    'longitude': cf.wi(-120, 120),
    'latitude': cf.wi(-67.5, 67.5),
}

Try with cf v3.16.2

example_subset = example.subspace(**kwargs)

field:
Field: 
-------
Data            : (latitude(4), longitude(3))
Dimension coords: latitude(4) = [-67.5, ..., 67.5] degrees_north
                : longitude(3) = [-120.0, 0.0, 120.0] degrees_east
config:
()
data_axes:
('domainaxis0', 'domainaxis1')
  parsed       = {('domainaxis1',): [(('domainaxis1',), 'dimensioncoordinate1', <CF DimensionCoordinate: longitude(3) degrees_east>, <CF Query: (wi [-120.0, 120.0])>, 'longitude')], ('domainaxis0',): [(('domainaxis0',), 'dimensioncoordinate0', <CF DimensionCoordinate: latitude(4) degrees_north>, <CF Query: (wi [-67.5, 67.5])>, 'latitude')]}
  unique_axes  = {'domainaxis1', 'domainaxis0'}
  n_axes       = 2
  item_axes    = ('domainaxis1',)
  keys         = ('dimensioncoordinate1',)
  1 1-d constructs: (<CF DimensionCoordinate: longitude(3) degrees_east>,)
  axis         = 'domainaxis1'
  value        = <CF Query: (wi [-120.0, 120.0])>
  identity     = 'longitude'
  1-d CASE 2:
    index      = slice(-3, 0, 1)
    ind        = None
  create_mask  = False
  item_axes    = ('domainaxis0',)
  keys         = ('dimensioncoordinate0',)
  1 1-d constructs: (<CF DimensionCoordinate: latitude(4) degrees_north>,)
  axis         = 'domainaxis0'
  value        = <CF Query: (wi [-67.5, 67.5])>
  identity     = 'latitude'
  1-d CASE 3:
    index      = [0 1 2 3]
    ind        = None
  create_mask  = False
  indices      = {'indices': {'domainaxis0': array([0, 1, 2, 3]), 'domainaxis1': slice(-3, 0, 1)}, 'mask': {}}
domain_indices:
{'indices': {'domainaxis0': array([0, 1, 2, 3]), 'domainaxis1': slice(-3, 0, 1)}, 'mask': {}}
axis_indices:
{'domainaxis0': array([0, 1, 2, 3]), 'domainaxis1': slice(-3, 0, 1)}
indices:
[array([0, 1, 2, 3]), slice(-3, 0, 1)]
indices function return:
(array([0, 1, 2, 3]), slice(-3, 0, 1))
Indices:
(array([0, 1, 2, 3]), slice(-3, 0, 1))
Field.__getitem__
    input indices = (array([0, 1, 2, 3]), slice(-3, 0, 1))
    shape    = (4, 3)
    indices  = [array([0, 1, 2, 3]), slice(-3, 0, 1)]
    indices2 = (array([0, 1, 2, 3]), slice(-3, 0, 1))
    findices = [array([0, 1, 2, 3]), slice(-3, 0, 1)]

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
/tmp/ipykernel_870/3059849535.py in ?()
----> 1 field_subset = directions.subspace(**kwargs)

~/miniconda3/envs/unifhytest/lib/python3.12/site-packages/cf/subspacefield.py in ?(self, *config, **kwargs)
    355         except (ValueError, IndexError) as error:
    356             if test:
    357                 return False
    358 
--> 359             raise error
    360         else:
    361             if test:
    362                 return True

~/miniconda3/envs/unifhytest/lib/python3.12/site-packages/cf/subspacefield.py in ?(self, *config, **kwargs)
    355         except (ValueError, IndexError) as error:
    356             if test:
    357                 return False
    358 
--> 359             raise error
    360         else:
    361             if test:
    362                 return True

~/miniconda3/envs/unifhytest/lib/python3.12/site-packages/cf/field.py in ?(self, indices)
    428 
    429         new_data = data[tuple(findices)]
    430 
    431         if 0 in new_data.shape:
--> 432             raise IndexError(
    433                 f"Indices {findices!r} result in a subspaced shape of "
    434                 f"{new_data.shape}, but can't create a subspace of "
    435                 f"{self.__class__.__name__} that has a size 0 axis"

IndexError: Indices [array([0, 1, 2, 3]), slice(-3, 0, 1)] result in a subspaced shape of (4, 0), but can't create a subspace of Field that has a size 0 axis

Try with cf v3.16.1.

Ignore the fact that the other axis in indices is now a dask array instead of a np.array or list, I don't think that is relevant.

example_subset = example.subspace(**kwargs)
field:
Field: 
-------
Data            : (latitude(4), longitude(3))
Dimension coords: latitude(4) = [-67.5, ..., 67.5] degrees_north
                : longitude(3) = [-120.0, 0.0, 120.0] degrees_east
config:
()
data_axes:
('domainaxis0', 'domainaxis1')
  parsed       = {('domainaxis1',): [(('domainaxis1',), 'dimensioncoordinate1', <CF DimensionCoordinate: longitude(3) degrees_east>, <CF Query: (wi [-120, 120])>, 'longitude')], ('domainaxis0',): [(('domainaxis0',), 'dimensioncoordinate0', <CF DimensionCoordinate: latitude(4) degrees_north>, <CF Query: (wi [-67.5, 67.5])>, 'latitude')]}
  unique_axes  = {'domainaxis0', 'domainaxis1'}
  n_axes       = 2
  item_axes    = ('domainaxis1',)
  keys         = ('dimensioncoordinate1',)
  1 1-d constructs: (<CF DimensionCoordinate: longitude(3) degrees_east>,)
  axis         = 'domainaxis1'
  value        = <CF Query: (wi [-120, 120])>
  identity     = 'longitude'
  1-d CASE 2:
    index      = slice(-3, 0, 1)
    ind        = None
  create_mask  = False
  item_axes    = ('domainaxis0',)
  keys         = ('dimensioncoordinate0',)
  1 1-d constructs: (<CF DimensionCoordinate: latitude(4) degrees_north>,)
  axis         = 'domainaxis0'
  value        = <CF Query: (wi [-67.5, 67.5])>
  identity     = 'latitude'
  1-d CASE 3:
    index      = dask.array<and_, shape=(4,), dtype=bool, chunksize=(4,), chunktype=numpy.ndarray>
    ind        = None
  create_mask  = False
  indices      = {'indices': {'domainaxis0': dask.array<and_, shape=(4,), dtype=bool, chunksize=(4,), chunktype=numpy.ndarray>, 'domainaxis1': slice(-3, 0, 1)}, 'mask': {}}
domain_indices:
{'indices': {'domainaxis0': dask.array<and_, shape=(4,), dtype=bool, chunksize=(4,), chunktype=numpy.ndarray>, 'domainaxis1': slice(-3, 0, 1)}, 'mask': {}}
axis_indices:
{'domainaxis0': dask.array<and_, shape=(4,), dtype=bool, chunksize=(4,), chunktype=numpy.ndarray>, 'domainaxis1': slice(-3, 0, 1)}
indices:
[dask.array<and_, shape=(4,), dtype=bool, chunksize=(4,), chunktype=numpy.ndarray>, slice(-3, 0, 1)]
indices function return:
(dask.array<and_, shape=(4,), dtype=bool, chunksize=(4,), chunktype=numpy.ndarray>, slice(-3, 0, 1))
Indices:
(dask.array<and_, shape=(4,), dtype=bool, chunksize=(4,), chunktype=numpy.ndarray>, slice(-3, 0, 1))
Field.__getitem__
    input indices = (dask.array<and_, shape=(4,), dtype=bool, chunksize=(4,), chunktype=numpy.ndarray>, slice(-3, 0, 1))
    shape    = (4, 3)
    **##### NOTE CHANGED SLICE IN INDICES FROM HERE ON**
    indices  = [dask.array<and_, shape=(4,), dtype=bool, chunksize=(4,), chunktype=numpy.ndarray>, slice(0, 3, 1)]
    indices2 = (dask.array<and_, shape=(4,), dtype=bool, chunksize=(4,), chunktype=numpy.ndarray>, slice(-3, 0, 1))
    findices = [dask.array<and_, shape=(4,), dtype=bool, chunksize=(4,), chunktype=numpy.ndarray>, slice(0, 3, 1)]
Computing data shape: Performance may be degraded
    dice = (dask.array<and_, shape=(4,), dtype=bool, chunksize=(4,), chunktype=numpy.ndarray>,)
DimensionCoordinate.__getitem__: shape    = (4,)
DimensionCoordinate.__getitem__: indices2 = (dask.array<and_, shape=(4,), dtype=bool, chunksize=(4,), chunktype=numpy.ndarray>,)
DimensionCoordinate.__getitem__: indices  = [dask.array<and_, shape=(4,), dtype=bool, chunksize=(4,), chunktype=numpy.ndarray>]
DimensionCoordinate.__getitem__: findices = (dask.array<and_, shape=(4,), dtype=bool, chunksize=(4,), chunktype=numpy.ndarray>,)
Computing data shape: Performance may be degraded
DimensionCoordinate.__getitem__: findices for bounds = (dask.array<and_, shape=(4,), dtype=bool, chunksize=(4,), chunktype=numpy.ndarray>,)
Computing data shape: Performance may be degraded
    dice = (slice(0, 3, 1),)
DimensionCoordinate.__getitem__: shape    = (3,)
DimensionCoordinate.__getitem__: indices2 = (slice(0, 3, 1),)
DimensionCoordinate.__getitem__: indices  = [slice(0, 3, 1)]
DimensionCoordinate.__getitem__: findices = (slice(0, 3, 1),)
DimensionCoordinate.__getitem__: findices for bounds = (slice(0, 3, 1),)

example_subset
<CF Field: (latitude(4), longitude(3))>

There may be other cases I've missed, but I thought this a suitable place to stop digging and ask for input!! I can see why the cases highlighted are not considered cyclic... A cyclic variable x = [0,1,2] with a slice of (-3, 0, 1) should produce [0,1,2], which happens in v3.16.1 as expected but not in v3.16.2 as explained.

The outputs from the cf.environment calls are:

For v3.16.2: Platform: Linux-3.10.0-1160.114.2.el7.x86_64-x86_64-with-glibc2.34 HDF5 library: 1.14.3 netcdf library: 4.9.2 udunits2 library: /home/users/mattjbr/miniconda3/envs/unifhytest/lib/libudunits2.so.0 esmpy/ESMF: 8.6.1 Python: 3.12.3 dask: 2024.4.2 netCDF4: 1.6.5 psutil: 5.9.8 packaging: 24.0 numpy: 1.26.4 scipy: 1.13.0 matplotlib: not available cftime: 1.6.3 cfunits: 3.3.7 cfplot: not available cfdm: 1.11.1.0 cf: 3.16.2

For v3.16.1: Platform: Linux-3.10.0-1160.114.2.el7.x86_64-x86_64-with-glibc2.34 HDF5 library: 1.14.3 netcdf library: 4.9.2 udunits2 library: /home/users/mattjbr/miniconda3/envs/unifhytest2/lib/libudunits2.so.0 esmpy/ESMF: 8.6.1 Python: 3.12.3 dask: 2024.5.0 netCDF4: 1.6.5 psutil: 5.9.8 packaging: 24.0 numpy: 1.26.4 scipy: 1.13.0 matplotlib: not available cftime: 1.6.3 cfunits: 3.3.7 cfplot: not available cfdm: 1.11.1.0 cf: 3.16.1

davidhassell commented 3 months ago

Hi @mattjbr123 - thanks for the detailed report! I'll look into it ... David

davidhassell commented 3 months ago

Hi Matt - I've had a good look, and agree with you. Your suggested changed change seems to make sense, and all of the current unit tests pass with it in-place. Would you like to submit a PR with the fix (including a change log entry and unit test in cf/test/test_functions.functionTest.test_normalize_slice)?

Thanks, David

mattjbr123 commented 3 months ago

Excellent, thanks David, will do

mattjbr123 commented 3 months ago

Done - #782