NCAS-CMS / cf-python

A CF-compliant Earth Science data analysis library
http://ncas-cms.github.io/cf-python
MIT License
119 stars 19 forks source link

Subspace `ValueError` with cyclic axis not spanned by data #756

Closed sadielbartholomew closed 4 months ago

sadielbartholomew commented 5 months ago

A ValueError: tuple.index(x): x not in tuple is raised when performing a subspace on a Field which should logically work, in the case that there is at least one domain axis construct not spanned by the data which is cyclic.

An example failure is shown below, which was raised in our VISION end-to-end script when it previously worked fine.

This seems to result from an issue introduced in #745, since a git blame shows that line has been changed there and the same subspace worked witout issue before that PR, corresponding to the time I noticed the VISION script throw the error. In which case, apologies I missed this during code review for that PR.

Fix to follow.

Example failure

For a subspace attempt on the following field:

Field: id%UM_m01s51i010_vn1105 (ncvar%UM_m01s51i010_vn1105)
-----------------------------------------------------------
Data            : id%UM_m01s51i010_vn1105(time(5), ncdim%obs(11160)) 1
Cell methods    : time(5): point
Dimension coords: time(5) = [2017-07-03 11:00:00, ..., 2017-07-03 15:00:00] standard
Auxiliary coords: time(ncdim%obs(11160)) = [2017-07-03 11:15:07, ..., 2017-07-03 14:21:06] standard
                : altitude(ncdim%obs(11160)) = [2577.927001953125, ..., 151.16905212402344] m
                : air_pressure(ncdim%obs(11160)) = [751.6758422851562, ..., 1006.53076171875] hPa
                : latitude(ncdim%obs(11160)) = [52.56147766113281, ..., 52.0729866027832] degree_north
                : longitude(ncdim%obs(11160)) = [0.3171832859516144, ..., -0.6249311566352844] degree_east

of:

s0 = m.subspace(
        **{
            f.auxiliary_coordinate("T", key=True): cf.wi(cf.dt(t1), cf.dt(t2)),
            f.dimension_coordinate("T", key=True): [index],
        }
    )

(specific t1, t2 and index values not relevant) results in:

Traceback (most recent call last):
  File "/home/slb93/vision-twine/local-work/cf-vision-flight-e2e.py", line 314, in <module>
    s0 = m.subspace(
         ^^^^^^^^^^^
  File "/home/slb93/git-repos/cf-python/cf/subspacefield.py", line 289, in __call__
    raise error
  File "/home/slb93/git-repos/cf-python/cf/subspacefield.py", line 284, in __call__
    out = field[indices]
          ~~~~~^^^^^^^^^
  File "/home/slb93/git-repos/cf-python/cf/field.py", line 446, in __getitem__
    data_axes.index(axis) for axis in new.cyclic() ###if axis in data_axes
    ^^^^^^^^^^^^^^^^^^^^^
ValueError: tuple.index(x): x not in tuple
sadielbartholomew commented 5 months ago

Actually, looking over my example and the logic (and I should stop now, as it's past the end of the working week...) something weird is going on, probably elsewhere previously in the code, since (quoting from the docs since I went there for confirmation) "The data array of the field construct spans all the domain axis constructs with the possible exception of size one domain axis constructs" but I don't think a size-one axis can (/should) be cyclic. So we shouldn't need the catch that seems to make the fix, as per my draft PR #757, since contextually it shouldn't be needed.

To be investigated on Monday - I'll work out the real culprit!

davidhassell commented 5 months ago

a size-one axis can (/should) be cyclic.

I think that a size 1 axis can be cyclic, such as one with dimension coordinate bounds [[0, 360]] degrees_east.

sadielbartholomew commented 5 months ago

Oh I see - thanks for the example, that makes sense as a cyclic size-1 axis. In which case, the one-line fix in #757 is necessary? Was working on VISION work but I will go back and update that PR and open it once I sort a little testing to cover the case which errored.

sadielbartholomew commented 5 months ago

After more investigation and failure to find a minimal working example, I have done a write up in #758 to outline how I believe the underlying error is introduced because a domain axis is not removed from the cyclic() set when it should be, but I also think from the evidence above, namely conformation that:

a size 1 axis can be cyclic

that adding in a tweak to the logic as I have suggested in the PR #757 will make the present logic more robust, so is a good idea, even if it isn't the fix for the underlying issue at hand. What do you think, @davidhassell? Happy to close #757 though if you think it unsuitable given my investigations from #758.

davidhassell commented 5 months ago

Hi Sadie,

I think your investigation in #758 is on point, thanks. Might it not be better, however, to fix the bug that left the rogue domain axis there in first place, rather than to cover it up with #757?

I think that you've shown convincingly that the source of the problem is del_construct not properly tidying up after itself (here, I reckon: https://github.com/NCAS-CMS/cf-python/blob/main/cf/regrid/regrid.py#L2995), so perhaps the answer could be to overload the inherited cfdm.Field.del_construct (in cf/mixin/fielddomain.py) to check the cyclic axes when it deletes a domain axis construct?

Thanks, David

sadielbartholomew commented 5 months ago

Thanks for your thoughts, David. OK, so I close the currently-attached PR #757 and we make a fix related more directly to #758. That sounds good to me, assuming as I believe you are implying that, unless there is a bug such as #758 which means something wrong gets through, there is never a case whereby, for org_cyclic = [data_axes.index(axis) for axis in new.cyclic()], there can be an axis in cyclic() which isn't in .get_data_axes()?

sadielbartholomew commented 4 months ago

Closed by #758 (my bad not using the keywords properly in the PR comment opening so it automatically shut).