Open acchamber opened 1 year ago
For anyone outside the Met Office who is interested, I also have the coord_diffs
function as a gist. I'm afraid it has no tests.
Great issue, thanks for articulating those problems.
To add another one to the list, when trying to concatenate ensemble runs the realization coordinate often overlaps, e.g., different MOGREPS-G runs all have the control member numbered 0.
Appreciate sometimes you’d want this information to be preserved but most of the time I don’t really care which member is which!
Background
Following the Dragon Taming session offline and #4446 #3234, I thought it best to try and capture what users want from improvements to merge/concatenate and what pain points there are, so we can establish what if any can be fixed.
So the user story is as follows - I've loaded in some data as iris cubes, and put them in my cubelist. I want to do this to perform some kind of analysis on the resulting cubes, producing plots or datasets, not cubes to further share on, so destruction or ignorance of metadata is fine. I've done equalise_attributes and unify_time_units. Yet Iris refuses to join my cubes together, when I know the data makes sense together. Why?
Exploration of Problem
Here's a few examples I've found from talking to users / yammer/AVD knowledgebase
I think you can put these in three categories - things that are bugs to fix, things that a "force" keyword or another util function could fix, and things that should be errors.
Dim coords having same values, but different Dtypes (see #5372 for it already being addressed for time, but I think a broader check may be useful)
Remove all cell methods from the cubes before merge/concat Remove all auxiliary coordinates from the cubes before merge/concat Remove all derived coordinates from the cubes before merge/concat Remove scalar coordinates before concatenate (not merge) Remove bounds from dimensional coordinates and only compare points (maybe guess_bounds afterwards to resstablish?) Guess a order for dimensional coordinates for cases where dim coords are exactly equal (maybe, might leave this one as error)
Cubes having overlapping times. Cubes having different units (or names, for either coords or the cube itself)
Proposed solution
In this process, the hardest to diagnose errors were often ones relating to Dim coords not matching. Adding a function to iris similar to @rcomer's coord_diffs (http://fcm9/projects/utils/browser/hadru-python/trunk/iris_wrappers.py?marks=19-52#L19) to allow easier or automatic comparison of coords when there is an error would also improve the user experience.
So, I propose two things. A coord_comparison function as a coord method that gives more detail to the user when a concatenate or merge fails due to dim coords not matching.
And a Force keyword for concatenate/merge cube that does some or all of the above listed fixes to allow users to automatically do the steps in iris they are already manually doing to their cubes. It should also automatically call equalise_attributes and unify_time_units
Thoughts?