SciTools / iris

A powerful, format-agnostic, and community-driven Python package for analysing and visualising Earth science data
https://scitools-iris.readthedocs.io/en/stable/
BSD 3-Clause "New" or "Revised" License
635 stars 283 forks source link

Iris cubes not merging for re-initialised LFRic files #5131

Closed afinnen closed 1 year ago

afinnen commented 1 year ago

📰 Custom Issue

When loading data from re-initialised LFRic files, iris fails to merge diagnostics into a single cube, because the meta data, such as the file creation time, differs between the different files and prevents iris from recognising that the cubes can/should be merged. (Cubes merging automatically works fine when loading a single file that contains output for multiple forecast times.) This behaviour is making it very challenging to work with LFRic data and is hindering the development of evaluation tools.

(Note that some updates to the output of time meta data in LFRic files are under review, which may or may not impact the behaviour of the time coordinate detailed below.)

Details:

timeStamp of the file is loaded as an attribute, which differs across re-initialised files and thus across individual cubes. Same fore uuid.

>>> with PARSE_UGRID_ON_LOAD.context():
...    pmsl = iris.load(['lfric_diag_1.nc','lfric_diag_2.nc'], 'air_pressure_at_mean_sea_level')
>>> print(pmsl)
0: air_pressure_at_mean_sea_level / (Pa) (time: 1; -- : 419904)
1: air_pressure_at_mean_sea_level / (Pa) (time: 1; -- : 419904)
>>> print(pmsl[0])
air_pressure_at_mean_sea_level / (Pa) (time: 1; -- : 419904)
    Dimension coordinates:
        time                               x       -
    Mesh coordinates:
        latitude                           -       x
        longitude                          -       x
    Mesh:
        name                          Topology data of 2D unstructured mesh
        location                      face
    Cell methods:
        point                         time (60 s)
    Attributes:
        Conventions                   'UGRID-1.0'
        description                   'LFRic file format v0.1.1'
        interval_operation            '60 s'
        interval_write                '3600 s'
        name                          'lfric_diag'
        online_operation              'instant'
        timeStamp                     '2022-Nov-24 13:21:41 GMT'
        title                         'Created by xios'
        uuid                          'fc24c8da-06a3-488c-b50e-bca55b7159bb'
>>> print(pmsl[1])
air_pressure_at_mean_sea_level / (Pa) (time: 1; -- : 419904)
    Dimension coordinates:
        time                               x       -
    Mesh coordinates:
        latitude                           -       x
        longitude                          -       x
    Mesh:
        name                          Topology data of 2D unstructured mesh
        location                      face
    Cell methods:
        point                         time (60 s)
    Attributes:
        Conventions                   'UGRID-1.0'
        description                   'LFRic file format v0.1.1'
        interval_operation            '60 s'
        interval_write                '3600 s'
        name                          'lfric_diag'
        online_operation              'instant'
        timeStamp                     '2022-Nov-24 13:48:57 GMT'
        title                         'Created by xios'
        uuid                          '9aca6a68-3265-46ec-a44e-42986caf4f54'
>>> pmsl_merged = pmsl.merge_cube()
iris.exceptions.MergeError: failed to merge into a single cube.
  cube.attributes values differ for keys: 'timeStamp', 'uuid'

After removing timeStamp and uuid attributes, iris still throws an error due to differences in the time coordinate. (Possibly because time already is a dim coord and not a scalar coordinate despite having only 1 entry?)

>>> del pmsl[0].attributes['uuid']
>>> del pmsl[1].attributes['uuid']
>>> del pmsl[1].attributes['timeStamp']
>>> del pmsl[0].attributes['timeStamp']
pmsl_merged = pmsl.merge_cube()
iris.exceptions.MergeError: failed to merge into a single cube.
  Coordinates in cube.dim_coords differ: time.
>>> print(pmsl[0].coord('time'))
DimCoord :  time / (seconds since 2021-07-29 00:00:00, standard calendar)
    points: [2021-07-29 01:00:00]
    bounds: [[2021-07-29 01:00:00, 2021-07-29 01:00:00]]
    shape: (1,)  bounds(1, 2)
    dtype: float64
    standard_name: 'time'
    long_name: 'Time axis'
    var_name: 'time'
    attributes:
        time_origin  '2021-07-29 00:00:00'
>>> print(pmsl[1].coord('time'))
DimCoord :  time / (seconds since 2021-07-29 00:00:00, standard calendar)
    points: [2021-07-29 02:00:00]
    bounds: [[2021-07-29 02:00:00, 2021-07-29 02:00:00]]
    shape: (1,)  bounds(1, 2)
    dtype: float64
    standard_name: 'time'
    long_name: 'Time axis'
    var_name: 'time'
    attributes:
    time_origin  '2021-07-29 00:00:00'

These supposed differences also prevent calculating differences between the fields (unless one drills down to the underlying cube.data array)

>>> diff = pmsl[1] - pmsl[0]
ValueError: Coordinate 'time' has different points for the LHS cube 'air_pressure_at_mean_sea_level' and RHS cube 'air_pressure_at_mean_sea_level'.
bjlittle commented 1 year ago

Hey @afinnen,

Great to hear from you and thanks for raising this issue.

I've taken a peek at the example files that you provided (offline :+1:) and after inspecting the data I was able to do the following:

from pathlib import Path

import iris
from iris.cube import CubeList
from iris.experimental.ugrid import PARSE_UGRID_ON_LOAD
from iris.util import equalise_attributes

dname = Path("<snip>")
fname1 = dname / "lfric_diag_1.nc"
fname2 = dname / "lfric_diag_2.nc"

with PARSE_UGRID_ON_LOAD.context():
    original = iris.load([fname1, fname2], "air_pressure_at_mean_sea_level")

# address the cube attribute differences in-place
_ = equalise_attributes(original)

# reduce the single valued leading "time" dim coordinate to a scalar coordinate
cubes = CubeList([cube[0] for cube in original])

merged = cubes.merge_cube()
print(merged)

This merges just fine, resulting in the following merged cube:

air_pressure_at_mean_sea_level / (Pa) (time: 2; -- : 419904)
    Dimension coordinates:
        time                               x       -
    Mesh coordinates:
        latitude                           -       x
        longitude                          -       x
    Mesh:
        name                          Topology data of 2D unstructured mesh
        location                      face
    Cell methods:
        point                         time (60 s)
    Attributes:
        Conventions                   'UGRID-1.0'
        description                   'LFRic file format v0.1.1'
        interval_operation            '60 s'
        interval_write                '3600 s'
        name                          'lfric_diag'
        online_operation              'instant'
        title                         'Created by xios'

Note that, merging creates new dimensions from scalar coordinates, hence the need to demote the single valued time dimension coordinate to a time scalar coordinate.

Alternatively, you can get the same result using cube concatenation instead as follows:

concatenated = original.concatenate_cube()

This give the same result as above using merge_cube, but note that there was no need to demote to a scalar coordinate; it's using the original cube list loaded from file.

In terms of getting the cube arithmetic to work, then you are forced to demote the time dimension coordinate to a scalar coordinate. Thanks to lenient cube arithmetic, which auto-magically discards the differing time scalar coordinates, gives the result you might expect:

result = merged[0] - merged[1]
print(result)

This gives the following difference cube:

unknown / (Pa)                      (-- : 419904)
    Mesh coordinates:
        latitude                        x
        longitude                       x
    Mesh:
        name                        Topology data of 2D unstructured mesh
        location                    face
    Attributes:
        Conventions                 'UGRID-1.0'
        description                 'LFRic file format v0.1.1'
        interval_operation          '60 s'
        interval_write              '3600 s'
        name                        'lfric_diag'
        online_operation            'instant'
        title                       'Created by xios'

Does this help?

bjlittle commented 1 year ago

I'm just going to close this issue @afinnen

Feel free to re-open it if you want to discuss further.

:beers: