Closed corinnebosley closed 4 years ago
Thanks @corinnebosley. I'm very glad that this issue is getting attention as subtraction between cubes has been my biggest source of frustration as an Iris user. I absolutely agree that the behaviour should be consistent across the binary operations.
I think examples where the checks fail fall into two categories:
cube
has no associated coordinate on other
. Examples of this are quite straight-forward. E.g. using a scalar cube to convert a variable (as linked in the OP):if cube.name() == 'geopotential':
# Convert ERA-I geopotential to geopotential_height.
gravity = iris.cube.Cube(9.80665, units='m s-2')
cube = cube / gravity
Another example might be to compare forecast data from an ensemble (realization x time x latitude x longitude
) with observed data (time x latitude x longitude
).
Here is an example which fails because an auxcoord is missing, even though the dimcoords are identical:
fname = iris.sample_data_path('uk_hires.pp')
cube = iris.load_cube(fname, 'air_potential_temperature')
other = cube.copy()
iris.coord_categorisation.add_hour(cube, 'time')
cube - other
This may happen simply because the coordinates have a different var_name
, which will happen quite often if you're comparing data from different sources. So I definitely think we should relax the requirements as far as the coordinate metadata is concerned.
Bounds also need to match up, which again may not happen if your data come from different sources or have been processed differently, or if you e.g. want to know the difference between some instantaneous daily values and their respective daily means.
fname = iris.sample_data_path('air_temp.pp')
cube = iris.load_cube(fname)
other = cube.copy()
cube.coord('latitude').guess_bounds()
cube - other
If I've understood from #2498, a check on the points was required by a user. I have a few thoughts on this so I'll put them in a separate post...
Thanks @rcomer! Got some really good use cases there, really appreciate your time and effort. I will ping you when we get anywhere with this, which should be either this week or next.
Here are some examples where coordinate points might legitimately differ:
I think it is reasonable for a user to want to do something like cube[n:] - cube[:-n]
.
Here is a recipe for creating 2 coords which represent identical times, but have different points and units:
def time_coord(datetimes, unitstr='hours since 1970-01-01 00:00:00',
calendar='gregorian', standard_name='time'):
"""
Turn a list of datetimes into a DimCoord.
"""
unit = cf_units.Unit(unitstr, calendar)
date_nums = unit.date2num(datetimes)
return iris.coords.DimCoord(date_nums, units=unit, standard_name=standard_name)
datetimes = [datetime.datetime(2017, 6, day) for day in range(1,21)] time1 = time_coord(datetimes) time2 = time_coord(datetimes, unitstr='days since 2017-01-01')
Obviously the above is artificial, but this type of mismatch could easily occur with data from different sources.
3. An example from my recent work. I have daily means of 6-hourly data from the UM, which come ready calculated with the time coordinate point at 12Z for each day. For comparison, I download 6-hourly ERA-Interim data and use `cube.aggregated_by` to create daily means. Having averaged over hours of 0, 6, 12 and 18, the ERA-Interim daily means have the time coordinate point at 09Z for each day.
OTOH, I did spend a lot of time back in October trying to figure out why my code was producing nonsense in its output files. I eventually realised it was the exact issue illustrated by the new `test_reversed_points()` in #2498. So I think I am in favour of a check on the coord points, but I would like a keyword that lets me bypass it if I'm sure my points are different for a good reason.
Note I like the idea of having a keyword which allows a user to bypass checks on points-matching.
Reviewing status for v2.0 inclusion
Not really Dask, but could be a behaviour change so maybe for v2.0 it that will avoid another iris.FUTURE option
the behaviour was not consistent between the two groups of math ops (add/subtract had more strict coordinate matching checks than multiply/divide).
Just to stick my oar in ⛵️ 😄 ...
Fundamentally the behaviour of multiple & divide should be different from add & subtract. Just consider the following:
speed * time = distance
speed + time = nothing meaningful
And for units:
meters * meters = meters ^ 2
meters + meters = meters
That means we need to do very different things to metadata in a cube depending on the operation.
We are in a bit of an unfortunate pickle with Iris on this regard. I think we are going to have to roll with Iris v2.0 in the same state as v1.13. I think we may want to fix the behaviour for v2.1 as we have broken an important use case.
I completely accept the code for this is far more complex that it needs to be (I take the blame for that), and we should put some effort in to rationalising it and sharing as many components between the operators as possible.
Just to be awkward, here is a use case for adding cubes with different units! 😉
More seriously, yes it's true that checks on the units make sense for add/subtract but not for multipy/divide. But I can't think of a reason why the coord checks should be different.
I think we may want to fix the behaviour for v2.1 as we have broken an important use case.
I'm pretty sure I'm not the only one who would rejoice if/when this is done. Workarounds to force the checks to pass are not difficult to write, just fiddly to figure out which piece of metadata is tripping the check and needs deleting or copying to the other cube.
The heat index example is interesting, but of course it isn't really adding percentages to temperatures.
The definition of heat index it links to at http://en.wikipedia.org/wiki/Heat_index#Formula makes it clear that there is a constant in front of each term:
HI = c1 + c2.T + c3.R + c4.T.R + c5.T^2 + c_6.R^2 + c_7.T^2.R + c_8.T.R^2 + c_9.T^2.R^2
He has a shortened version : HI = temp + rh + temp*rh + temp**2*rh + rh**2*temp
So I think he has a whole lot of constant factors which happen to be all numerically equal to 1.0, but they must have units !
In Iris the 'correct' way would be to use scalar cubes with appropriate conversion units. Very crude example:
>>> temp = Cube([295.0], 'air_temperature', units='K')
>>> rh = Cube([45.0], 'relative_humidity', units='0.01')
>>> t_per_percent = Cube([1.0], long_name='c1', units='K / 0.01')
>>> temp + rh
. . .
iris.exceptions.NotYetImplementedError: Cannot use 'add' with differing units (K & 0.01)
>>> temp + t_per_percent * rh
<iris 'Cube' of unknown / (K) (-- : 1)>
>>> (temp + t_per_percent * rh).data
array([ 340.])
>>>
Sorry, is all this really obvious to everyone ?
@pelson
Fundamentally the behaviour of multiple & divide should be different from add & subtract.
This is true with regards to the metadata of the phenomenon (i.e. its units), and I believe we still have the behaviour you describe, but it doesn't apply to coordinate metadata, where I think the same logic should apply in all cases.
I just realised that #1987 is also related.
Apologies for the delay; here are some additional use cases for you to consider :)
Multiplying by the land sea fraction (cube1
may have any number of additional coordinates):
% module load scitools
% python
Python 3.6.8 |Anaconda, Inc.| (default, Dec 30 2018, 01:22:34)
[GCC 7.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import iris
>>> print(iris.__version__)
2.2.0
>>> cube1 = iris.load_cube('/project/cdds/testdata/u-aw310/ap4/aw310a.p42345ju?.pp', iris.AttributeConstraint(STASH='m01s03i495'))
>>> print(cube1)
m01s03i495 / (unknown) (time: 2; latitude: 144; longitude: 192)
Dimension coordinates:
time x - -
latitude - x -
longitude - - x
Auxiliary coordinates:
forecast_period x - -
Scalar coordinates:
forecast_reference_time: 1936-01-01 00:00:00
Attributes:
STASH: m01s03i495
source: Data from Met Office Unified Model
um_version: 10.9
Cell methods:
mean: time (1 hour)
>>> cube2 = iris.load_cube('/project/cdds/etc/ancil/UKESM1-0-LL/qrparm.landfrac.pp')
>>> print(cube2)
land_area_fraction / (1) (latitude: 144; longitude: 192)
Dimension coordinates:
latitude x -
longitude - x
Attributes:
STASH: m01s00i505
source: Data from Met Office Unified Model
um_version: 9.1
>>> cube = cube1 * cube2
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/opt/scitools/environments/default/2019_02_27/lib/python3.6/site-packages/iris/analysis/maths.py", line 411, in multiply
in bad_coord_grps}))
ValueError: This operation cannot be performed as there are differing coordinates (time, forecast_period) remaining which cannot be ignored.
Adding cubes, both on a single level, but where one was extracted from a cube on pseudo levels, so has the orography added:
>>> cube1 = iris.load_cube('/project/cdds/testdata/u-aw310/ap4/aw310a.p42345ju?.pp', iris.AttributeConstraint(STASH='m01s02i517'))
>>> cube1.units = 'W m-2'
>>> print(cube1)
m01s02i517 / (W m-2) (time: 2; model_level_number: 86; latitude: 144; longitude: 192)
Dimension coordinates:
time x - - -
model_level_number - x - -
latitude - - x -
longitude - - - x
Auxiliary coordinates:
forecast_period x - - -
level_height - x - -
sigma - x - -
surface_altitude - - x x
Derived coordinates:
altitude - x x x
Scalar coordinates:
forecast_reference_time: 1936-01-01 00:00:00
Attributes:
STASH: m01s02i517
source: Data from Met Office Unified Model
um_version: 10.9
Cell methods:
mean: time (1 hour)
>>> cube1 = cube1.extract(iris.Constraint(model_level_number=86))
>>> print(cube1)
m01s02i517 / (W m-2) (time: 2; latitude: 144; longitude: 192)
Dimension coordinates:
time x - -
latitude - x -
longitude - - x
Auxiliary coordinates:
forecast_period x - -
surface_altitude - x x
Derived coordinates:
altitude - x x
Scalar coordinates:
forecast_reference_time: 1936-01-01 00:00:00
level_height: 87949.99 m, bound=(85000.0, 87949.99) m
model_level_number: 86
sigma: 0.0, bound=(0.0, 0.0)
Attributes:
STASH: m01s02i517
source: Data from Met Office Unified Model
um_version: 10.9
Cell methods:
mean: time (1 hour)
>>> cube2 = iris.load_cube('/project/cdds/testdata/u-aw310/ap4/aw310a.p42345ju?.pp', iris.AttributeConstraint(STASH='m01s03i332'))
>>> print(cube2)
toa_outgoing_longwave_flux / (W m-2) (time: 2; latitude: 144; longitude: 192)
Dimension coordinates:
time x - -
latitude - x -
longitude - - x
Auxiliary coordinates:
forecast_period x - -
Scalar coordinates:
forecast_reference_time: 1936-01-01 00:00:00
Attributes:
STASH: m01s03i332
source: Data from Met Office Unified Model
um_version: 10.9
Cell methods:
mean: time (1 hour)
>>> cube = cube1 + cube2
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/opt/scitools/environments/default/2019_02_27/lib/python3.6/site-packages/iris/cube.py", line 3045, in __add__
return iris.analysis.maths.add(self, other)
File "/opt/scitools/environments/default/2019_02_27/lib/python3.6/site-packages/iris/analysis/maths.py", line 267, in add
in_place=in_place)
File "/opt/scitools/environments/default/2019_02_27/lib/python3.6/site-packages/iris/analysis/maths.py", line 349, in _add_subtract_common
in bad_coord_grps}))
ValueError: This operation cannot be performed as there are differing coordinates (altitude, surface_altitude) remaining which cannot be ignored.
Dividing by cloud weights (similar to the "Multiplying by the land sea fraction" use case):
>>> cube1 = iris.load_cube('/project/cdds/testdata/u-aw310/ap5/aw310a.p51960mar.pp', iris.AttributeConstraint(STASH='m01s02i337'))
>>> print(cube1)
m01s02i337 / (unknown) (pseudo_level: 7; pressure: 7; latitude: 144; longitude: 192)
Dimension coordinates:
pseudo_level x - - -
pressure - x - -
latitude - - x -
longitude - - - x
Scalar coordinates:
forecast_period: 209160.0 hours, bound=(208800.0, 209520.0) hours
forecast_reference_time: 1936-01-01 00:00:00
time: 1960-03-16 00:00:00, bound=(1960-03-01 00:00:00, 1960-04-01 00:00:00)
Attributes:
STASH: m01s02i337
source: Data from Met Office Unified Model
um_version: 10.9
Cell methods:
mean: time (1 hour)
>>> cube2 = iris.load_cube('/project/cdds/testdata/u-aw310/ap5/aw310a.p51960mar.pp', iris.AttributeConstraint(STASH='m01s02i330'))
>>> print(cube2)
m01s02i330 / (unknown) (latitude: 144; longitude: 192)
Dimension coordinates:
latitude x -
longitude - x
Scalar coordinates:
forecast_period: 209160.0 hours, bound=(208800.0, 209520.0) hours
forecast_reference_time: 1936-01-01 00:00:00
time: 1960-03-16 00:00:00, bound=(1960-03-01 00:00:00, 1960-04-01 00:00:00)
Attributes:
STASH: m01s02i330
source: Data from Met Office Unified Model
um_version: 10.9
Cell methods:
mean: time (1 hour)
>>> cube = cube1 / cube2
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/opt/scitools/environments/default/2019_02_27/lib/python3.6/site-packages/iris/analysis/maths.py", line 493, in divide
in bad_coord_grps}))
ValueError: This operation cannot be performed as there are differing coordinates (pseudo_level, pressure) remaining which cannot be ignored.
Subtracting with the assumption that the first cube has the same value on all pseudo levels:
>>> cube1 = iris.load_cube('/project/cdds/testdata/u-aw310/ap5/aw310a.p51960mar.pp', iris.AttributeConstraint(STASH='m01s01i235'))
>>> print(cube1)
surface_downwelling_shortwave_flux_in_air / (W m-2) (latitude: 144; longitude: 192)
Dimension coordinates:
latitude x -
longitude - x
Scalar coordinates:
forecast_period: 209160.0 hours, bound=(208800.0, 209520.0) hours
forecast_reference_time: 1936-01-01 00:00:00
time: 1960-03-16 00:00:00, bound=(1960-03-01 00:00:00, 1960-04-01 00:00:00)
Attributes:
STASH: m01s01i235
source: Data from Met Office Unified Model
um_version: 10.9
Cell methods:
mean: time (1 hour)
>>> cube2 = iris.load_cube('/project/cdds/testdata/u-aw310/ap5/aw310a.p51960mar.pp', iris.AttributeConstraint(STASH='m01s03i382'))
>>> cube2.units = 'W m-2'
>>> print(cube2)
m01s03i382 / (W m-2) (pseudo_level: 27; latitude: 144; longitude: 192)
Dimension coordinates:
pseudo_level x - -
latitude - x -
longitude - - x
Scalar coordinates:
forecast_period: 209160.0 hours, bound=(208800.0, 209520.0) hours
forecast_reference_time: 1936-01-01 00:00:00
time: 1960-03-16 00:00:00, bound=(1960-03-01 00:00:00, 1960-04-01 00:00:00)
Attributes:
STASH: m01s03i382
source: Data from Met Office Unified Model
um_version: 10.9
Cell methods:
mean: time (1 hour)
>>> cube = cube1 - cube2
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/opt/scitools/environments/default/2019_02_27/lib/python3.6/site-packages/iris/cube.py", line 3053, in __sub__
return iris.analysis.maths.subtract(self, other)
File "/opt/scitools/environments/default/2019_02_27/lib/python3.6/site-packages/iris/analysis/maths.py", line 310, in subtract
dim=dim, in_place=in_place)
File "/opt/scitools/environments/default/2019_02_27/lib/python3.6/site-packages/iris/analysis/maths.py", line 349, in _add_subtract_common
in bad_coord_grps}))
ValueError: This operation cannot be performed as there are differing coordinates (pseudo_level) remaining which cannot be ignored.
I hope that helps! :)
Awesome, thanks @ehogan, just what I was looking for 😺
@rcomer do you have any further use case examples that you want to contribute that haven't already been captured here?
Just had a quick glance through some of my code that's affected by this issue. I guess one thing that's mentioned but possibly not well captured here is mismatches in metadata. I.e. the cubes' coordinates are basically the same, but have different long_name
, var_name
or attributes
.
I know we did touch on this in our meetings, and the problem was: if you allow mismatched metadata, what should the metadata on the new cube's coordinate look like? I'm comfortable with just using whatever was on cube1
(for simplicity) but I know others were concerned that these operations should be entirely commutative.
Another thing I don't think this issue covers, but was raised by @ehogan at #2698 is what happens to scalar coordinates that don't match? Currently they just get chucked away. Prior to #2498 we kept the coord from cube1
for multiplication and division.
I guess scalar coordinates don't necessarily need to be treated differently from other auxcoords. But as they currently are, perhaps worth bearing in mind.
Currently [unmatched scalar coords] just get chucked away.
This was helpful for my debugging today, as the presence of a particular scalar coord told me that the subtraction operation that should have happened in fact had not. 🤔
There has been some disagreement about the level of checks performed by add/subtract and multiply/divide. Up to Iris 1.12, the behaviour was not consistent between the two groups of math ops (add/subtract had more strict coordinate matching checks than multiply/divide).
This PR changes the behaviour of multiply/divide to be consistent with add/subtract: https://github.com/SciTools/iris/pull/2498
This was merged and released in Iris 1.13, but @rcomer has raised concerns that the checks may be too robust, and may need some deeper thought before we decide on which consistent behaviour math ops should follow.
See here for use case
Acceptance Critera