SciTools / iris

A powerful, format-agnostic, and community-driven Python package for analysing and visualising Earth science data
https://scitools-iris.readthedocs.io/en/stable/
BSD 3-Clause "New" or "Revised" License
635 stars 283 forks source link

Additional cubeList.merge capability to extend AuxCoords and AncillaryVariables #4335

Open wjbenfold opened 3 years ago

wjbenfold commented 3 years ago

✨ Feature Request

When merging cubes with auxillary coordinates or ancillary variables (which I'm just going to shorten to AuxCoords for the rest of this description), that vary over the cubes, it would make sense to permit the values in these coordinates to be different, and to create a higher dimensioned coordinate in the resultant cube.

e.g. Merging a set of cubes that each contain windspeeds at different heights at a specific lat/lon (lat/lon stored as scalar coordinates). An ancillary variable could be present on each cube along the height axis storing the time that the windspeed was measured, which would be different in each location. Currently, the merge to produce a 3D cube (with dimensions height, lat and lon) will fail because the ancillary variable has different values in each cube. My proposed change would mean that merge could be told to create a 3D ancillary variable on the resultant cube that stored the time windspeed was measured for every point.

Motivation

This was motivated by a support request in which a similar process to the one above was being attempted. Current workaround is to make the cube without the variable, create it separately and then add it to the merge result.

wjbenfold commented 3 years ago

In response to a query from @pp-mo who expected this to already be a feature of merge, I've attached an example where I believe it would be happening if it could, but it isn't. File is renamed from .py to .txt to allow attachment.

merge_ancillary_variable_min_fail.txt

rcomer commented 3 years ago

I don't know anything about ancillary variables, but I think these issues are related: #3603 #3084

wjbenfold commented 2 years ago

Also #3600

edmundhenley-mo commented 2 years ago

Big +1 from me (will vote properly!) - have just run into this. In case use-case details useful: am trying to merge a set of monthly gridded observations, where I'm wanting to keep track of the # of obs ("counts") which contributed to a given mean obs value on grid via an ancillary variable - saber_counts below. These counts differ by month, as such for this use-case I'd want the creation of higher-dimensioned ancillary variable behaviour which @wjbenfold mentions

Will use @wjbenfold's workaround for now - thanks for detailing that (and more generally for helpful diagnostic messages which let me work out what cause was)!

(Pdb) saber_year = iris.cube.CubeList(saber_gridded_year[field])
(Pdb) saber_year
[<iris 'Cube' of Kinetic Temperature / (K) (latitude: 19; longitude: 24; height: 70)>,
# ...snip Feb to Nov cubes
<iris 'Cube' of Kinetic Temperature / (K) (latitude: 19; longitude: 24; height: 70)>]
(Pdb) saber_year.merge_cube()
*** iris.exceptions.MergeError: failed to merge into a single cube.
  cube.ancillary_variables differ
(Pdb) print(saber_year[0])
Kinetic Temperature / (K)           (latitude: 19; longitude: 24; height: 70)
    Dimension coordinates:
        latitude                             x              -           -
        longitude                            -              x           -
        height                               -              -           x
    Ancillary variables:
        saber_counts                         x              x           x
    Scalar coordinates:
        time                        2002-01-15 00:00:00, bound=(2001-12-31 00:00:00, 2002-01-30 00:00:00)
    Attributes:
        Calibration_Version         '02.00'
        <snip other attributes>
        Title                       'SABER Custom Level2A Product!'
edmundhenley-mo commented 2 years ago

Think unrelated to #4446 (as at least part of this is about higher-dimensioning differing aux/anc coords, rather than leniently ignoring differences). But cheekily making the cross-link as that's the frontrunner voted issue, and any merge overhaul resulting from that might consider encompassing this too!

trexfeathers commented 2 years ago

Realise wouldn't always be appropriate behaviour - e.g. if ancvar is time-stationary

@edmundhenley-mo could you explain this part further?

wjbenfold commented 2 years ago

@edmundhenley-mo could you explain this part further?

I think my understanding of this (or at least what I was prompted to realise by it) is that that variable could be a valid reason that these cubes shouldn't merge?

trexfeathers commented 2 years ago

OK I mistook "Realise" as loading into memory. We have too many loaded terms!

github-actions[bot] commented 1 year ago

In order to maintain a backlog of relevant issues, we automatically label them as stale after 500 days of inactivity.

If this issue is still important to you, then please comment on this issue and the stale label will be removed.

Otherwise this issue will be automatically closed in 28 days time.

HGWright commented 1 year ago

@SciTools/peloton Still relevant, probably waiting on a merge/concat sprint.