ESMValGroup / ESMValCore

ESMValCore: A community tool for pre-processing data from Earth system models in CMIP and running analysis scripts.
https://www.esmvaltool.org
Apache License 2.0
42 stars 38 forks source link

Use `iris.FUTURE.save_split_attrs = True` to remove iris warning #2398

Closed schlunma closed 4 months ago

schlunma commented 4 months ago

Description

Closes #2376

Note: this is not fully backwards-compatible, as this changes the way iris writes attributes (some attributes are moved from global to local). I don't think this is a problem, but might be good to keep in mind.


Before you get started

Checklist

It is the responsibility of the author to make sure the pull request is ready to review. The icons indicate whether the item will be subject to the ๐Ÿ›  Technical or ๐Ÿงช Scientific review.


To help with the number pull requests:

schlunma commented 4 months ago

I have no idea why the tests are failing. The test is complaining about a missing tracking_id attribute, but the code that is supposed to write this has never been called with a given tracking_id. Why did this not fail before? Did iris save tracking_ids automatically?

valeriupredoi commented 4 months ago

no, it's still there, but it's malformed:

valeriu@valeriu-PORTEGE-Z30-C:~/ESMValCore$ ncdump -h /tmp/pytest-of-valeriu/pytest-117/test_diagnostic_task_provenanc0/input/cmip5/output1/CCCma/CanESM2/historical/yr/ocnBgchem/Oyr/r1i1p1/*/chl_Oyr_CanESM2_historical_r1i1p1_2000_2009.nc
netcdf chl_Oyr_CanESM2_historical_r1i1p1_2000_2009 {
dimensions:
    dim0 = UNLIMITED ; // (0 currently)
variables:
    double unknown(dim0) ;
        unknown:tracking_id = 0LL ;

// global attributes:
        :Conventions = "CF-1.7" ;
}

as compared to without changes here:

valeriu@valeriu-PORTEGE-Z30-C:~/ESMValCore$ ncdump -h /tmp/pytest-of-valeriu/pytest-116/test_diagnostic_task_provenanc0/input/cmip5/output1/CCCMA/CanESM2/historical/yr/ocnBgchem/Oyr/r1i1p1/*/chl_Oyr_CanESM2_historical_r1i1p1_2000_2009.nc
netcdf chl_Oyr_CanESM2_historical_r1i1p1_2000_2009 {
dimensions:
    dim0 = UNLIMITED ; // (0 currently)
variables:
    double unknown(dim0) ;

// global attributes:
        :tracking_id = 0LL ;
        :Conventions = "CF-1.7" ;
}

note that I used iris=3.9 for both runs, see my comment below what's causing this...how to fix it? Anyone's guess :grinning:

valeriupredoi commented 4 months ago

OK Manu, dug deeper in this (no pub for me this Friday): that tracking id is now (as in, using the call to iris' save_split_attr) an attribute of the netCDF4 variable:

iris attributes: {'Conventions': 'CF-1.7', 'tracking_id': 0}

netCDF4.Dataset variable items:
dict_items([('unknown', <class 'netCDF4._netCDF4.Variable'>
float64 unknown(dim0)
    tracking_id: 0
unlimited dimensions: dim0
current shape = (0,)
filling on, default _FillValue of 9.969209968386869e+36 used)])

-> it shouldn't be, it should be a global attribute of the file, not of the variable, as is the case when not using the FUTURE call:

iris attributes good cube: {'tracking_id': 0, 'Conventions': 'CF-1.7'}
<class 'netCDF4._netCDF4.Dataset'>
root group (NETCDF4 data model, file format HDF5):
    tracking_id: 0
    Conventions: CF-1.7
    dimensions(sizes): dim0(0)
    variables(dimensions): float64 unknown(dim0)
    groups: 
netCDF4 variable items good cube  dict_items([('unknown', <class 'netCDF4._netCDF4.Variable'>
float64 unknown(dim0)
unlimited dimensions: dim0
current shape = (0,)
filling on, default _FillValue of 9.969209968386869e+36 used)])

I think you should tell the iris folk the function is not doing the right thing :beer:

bouweandela commented 4 months ago

Attributes are now split between local (attached to the cube variable) and global, see https://scitools-iris.readthedocs.io/en/stable/generated/api/iris.cube.html#iris.cube.CubeAttrsDict.__setitem__

schlunma commented 4 months ago

Yes, as Bouwe mentioned, this behavior of iris is expected. I was confused since it looked like the files are created without a tracking_id; however, the actual file that is tested is created in the patched_datafinder fixture with a tracking id.

iris.cube.Cube.attributes will contain local and global attributes, so we are fine here. However, netCDF4 does distinguish between the two. Thus, when reading the attributes for the provenance here, the tracking_id in the new code is skipped (since it has been written as a local attribute by the iris function). I fixed this by writing tracking_id as a global attribute for the test files in https://github.com/ESMValGroup/ESMValCore/pull/2398/commits/996ca364058ee681ec52f06d473242f01b9f2ba5.

codecov[bot] commented 4 months ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 94.38%. Comparing base (6a98408) to head (c540eac).

Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #2398 +/- ## ======================================= Coverage 94.37% 94.38% ======================================= Files 246 246 Lines 13736 13737 +1 ======================================= + Hits 12964 12965 +1 Misses 772 772 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

bouweandela commented 3 months ago

@chrisbillowsMO It looks like we forgot to label this pull request with the backwards incompatible change. Could you please update the changelog for v2.11 so this is listed under backward incompatible changes?

chrisbillowsMO commented 3 months ago

@chrisbillowsMO It looks like we forgot to label this pull request with the backwards incompatible change. Could you please update the changelog for v2.11 so this is listed under backward incompatible changes?

Will do!