SciTools / iris

A powerful, format-agnostic, and community-driven Python package for analysing and visualising Earth science data
https://scitools-iris.readthedocs.io/en/stable/
BSD 3-Clause "New" or "Revised" License
633 stars 283 forks source link

Time dimension in hybrid height #5369

Open trexfeathers opened 1 year ago

trexfeathers commented 1 year ago

✨ Feature Request

As described in CF conventions

Motivation

From @matthew-mizielinski. Considering how to represent hybrid height over glaciers, where the orography moves/changes over time.

Additional context

@pp-mo expects problems with FF or PP loading, as it would require a specific sequence of merge steps.

Click to expand this section... ``` Please add additional verbose information in this section e.g., references, screenshots, listings etc ```
### Tasks
- [ ] https://github.com/SciTools/iris/issues/6162
- [ ] https://github.com/SciTools/iris/issues/6163
trexfeathers commented 12 months ago

@matthew-mizielinski has confirmed that data like this currently generates 1 Cube for each time point, rather than a single Cube with a time dimension.

trexfeathers commented 5 months ago

It will be problematic for UK Met Office strategy (climate - IPCC) if this misses the 3.11 (October) release.

stephenworsley commented 1 month ago

I believe it is currently possible to construct a hybrid height coordinate that varies over time. What is not possible is to merge multiple 2D cubes with varying orographies together. This would require a substantial change to merge behaviour. I suspect this may be covered by #5375 which has been a particularly stubborn issue to untangle.

trexfeathers commented 1 month ago

I believe it is currently possible to construct a hybrid height coordinate that varies over time. What is not possible is to merge multiple 2D cubes with varying orographies together. This would require a substantial change to merge behaviour. I suspect this may be covered by #5375 which has been a particularly stubborn issue to untangle.

@stephenworsley let us know what you need. If necessary we have a whole team of developers (given the strategic importance of this).

matthew-mizielinski commented 1 month ago

Shout if a discussion on this would be useful -- I'm sure we can come up with a minimal test data set to work with.

stephenworsley commented 1 month ago

@matthew-mizielinski minimal test data would absolutely be appreciated, and yes, I think it would be good to set up a discussion when possible.

stephenworsley commented 1 month ago

One possible idea for resolving the merge issue:

Provide a keyword argument for the merge method which you can pass the name of an AuxCoord or a tuple of coord names. this tells merge which coordinates it ought to expand the dimensions of. Further information is likely to be required in the case where multiple dimensions are being added by merge, perhaps a tuple of dimension names in which to expand for each AuxCoord. This keyword could also be passed down from the load function.

This approach shouldn't break existing functionality and should allow sufficient controll of the merging process. I expect there may be some attention we would need to give to AuxCoordFactorys to make sure they behave sensibly during this process since I'm not aware of any other functions which add a dimension to a coordinate that another coordinate is derived from, but I don't expect this to be too much of a problem.

stephenworsley commented 4 weeks ago

An alternate approach to explore could involve concatenating instead of merging and using the new_axis utility to expand the dimensions of the orography coordinate appropriately. This ought to be enabled now via #4896, though I'm not sure how this handles derived coordinates.

pp-mo commented 3 weeks ago

Some summary points from our offline discussion today (@pp-mo @stephenworsley @matthew-mizielinski )

Usecase example

we investigated a specific usecase which demonstrates the issue here.

We tried loading selected monthly files, e.g. iris.load(['sep30, 'oct30', 'jan31']) # imaginary monthly files (!)

N.B. we have sample test data to demo this

Solutions acceptable to the user

@matthew-mizielinski said, for his expected usage, it should be easy to identify what data suffers from the "missing merge" like this, and potentially add a specific load keyword as a "hint" (as suggested above), or call into a post-load adjustment utility.

Summary of findings regarding the existing code

Possible solutions we can envisage

User presentation (API)

  1. a general, automatic fix to merge operations within loading (but see complexity objections, above)
  2. or a load (and/or merge/concatenate) keyword to enable the "extra" factory building on load
  3. or a post-load utility call.

In case (2) we might need to worry about selecting the correct cubes to work with in the 'additional' operation. The general 'load+merge' behaviour can produce multiple cubes where one was expected if there is a small mismatch somewhere : In this case it could be hard to apply the 'additional' operation to the correct subsets. But we can limit the expected results, e.g. only allow it in "load_cubes", where a single cube is expected from applying each provided constraint. Likewise, a user-operated post-merge operation could be specified to work only with "suitable" data expected to produce a single result cube.

Calculation

( ignoring for now the "better general merge" approach + looking for easy wins )

In general , we can solve merge/concat problems of this nature by

  1. either reducing all data to have a single point in the problem dimension, then merging everything
  2. or promoting single-point data to get a length-1 dimension, and concatenating everything

In this case, since we observe that concatenate can combine factories while merge cannot, it seems that (2) is probably easiest

So it looks like, a viable proof-of-concept solution could :

  1. accept a set of input cubes which (the user says) "ought" to merge into a single result, plus, probably, user-hints of which factory/coords to work on
  2. promote any cubes with scalar time to have a length-1 time dimension, - including the relevant factory and all the aux-coords which are its dependencies
  3. concatenate, expecting a single cube result