SciTools / iris

A powerful, format-agnostic, and community-driven Python package for analysing and visualising Earth science data
https://scitools-iris.readthedocs.io/en/stable/
BSD 3-Clause "New" or "Revised" License
633 stars 283 forks source link

Various Cube merge/concatenate issues #5375

Open acchamber opened 1 year ago

acchamber commented 1 year ago

Background

Following the Dragon Taming session offline and #4446 #3234, I thought it best to try and capture what users want from improvements to merge/concatenate and what pain points there are, so we can establish what if any can be fixed.

So the user story is as follows - I've loaded in some data as iris cubes, and put them in my cubelist. I want to do this to perform some kind of analysis on the resulting cubes, producing plots or datasets, not cubes to further share on, so destruction or ignorance of metadata is fine. I've done equalise_attributes and unify_time_units. Yet Iris refuses to join my cubes together, when I know the data makes sense together. Why?

Exploration of Problem

Here's a few examples I've found from talking to users / yammer/AVD knowledgebase

  1. Dim Coord bounds not being identical (off by 0.00001 in on the edge for one example)
  2. One cube having cell methods and the other not, or mismatches in cell methods
  3. Auxiliary coordinate bounds not matching dim coords bounds
  4. Cube units not matching
  5. Cubes having different auxiliary or derived coordinates
  6. Dim Coords having the same values, but different Dtypes
  7. Cubes having overlapping times

I think you can put these in three categories - things that are bugs to fix, things that a "force" keyword or another util function could fix, and things that should be errors.

Dim coords having same values, but different Dtypes (see #5372 for it already being addressed for time, but I think a broader check may be useful)

Remove all cell methods from the cubes before merge/concat Remove all auxiliary coordinates from the cubes before merge/concat Remove all derived coordinates from the cubes before merge/concat Remove scalar coordinates before concatenate (not merge) Remove bounds from dimensional coordinates and only compare points (maybe guess_bounds afterwards to resstablish?) Guess a order for dimensional coordinates for cases where dim coords are exactly equal (maybe, might leave this one as error)

Cubes having overlapping times. Cubes having different units (or names, for either coords or the cube itself)

Proposed solution

In this process, the hardest to diagnose errors were often ones relating to Dim coords not matching. Adding a function to iris similar to @rcomer's coord_diffs (http://fcm9/projects/utils/browser/hadru-python/trunk/iris_wrappers.py?marks=19-52#L19) to allow easier or automatic comparison of coords when there is an error would also improve the user experience.

So, I propose two things. A coord_comparison function as a coord method that gives more detail to the user when a concatenate or merge fails due to dim coords not matching.

And a Force keyword for concatenate/merge cube that does some or all of the above listed fixes to allow users to automatically do the steps in iris they are already manually doing to their cubes. It should also automatically call equalise_attributes and unify_time_units

Thoughts?

rcomer commented 1 year ago

For anyone outside the Met Office who is interested, I also have the coord_diffs function as a gist. I'm afraid it has no tests.

mo-matthewfry commented 10 months ago

Great issue, thanks for articulating those problems.

To add another one to the list, when trying to concatenate ensemble runs the realization coordinate often overlaps, e.g., different MOGREPS-G runs all have the control member numbered 0.

Appreciate sometimes you’d want this information to be preserved but most of the time I don’t really care which member is which!