ESMValGroup / ESMValCore

ESMValCore: A community tool for pre-processing data from Earth system models in CMIP and running analysis scripts.
https://www.esmvaltool.org
Apache License 2.0
42 stars 38 forks source link

Wrong error message for duplicate coordinate #373

Closed schlunma closed 4 years ago

schlunma commented 4 years ago

The error message which appears in #371 is:

esmvalcore.cmor.check.CMORCheckError: There were errors in variable tas:
height2m: does not exist
in cube:
air_temperature / (K)               (time: 1200; latitude: 180; longitude: 288)
     Dimension coordinates:
          time                           x               -               -
          latitude                       -               x               -
          longitude                      -               -               x
     Scalar coordinates:
          height: 2.0 m
             description='~2 m standard surface air temperature and surface humidity  height'
          height: 2.0 m
     Attributes:
          Conventions: CF-1.7 CMIP-6.0 UGRID-1.0
          activity_id: CMIP
          branch_method: standard
          branch_time_in_child: -674885.0
          branch_time_in_parent: 0.0
          comment: <null ref>
          contact: gfdl.climate.model.info@noaa.gov
          data_specs_version: 01.00.27
          experiment: pre-industrial control
          experiment_id: piControl
          external_variables: areacella
          forcing_index: 1
          frequency: monC
          further_info_url: https://furtherinfo.es-doc.org/CMIP6.NOAA-GFDL.GFDL-CM4.piControl.none...
          grid: atmos data regridded from Cubed-sphere (c96) to 180,288; interpolation...
          grid_label: gr1
          initialization_index: 1
          institution: National Oceanic and Atmospheric Administration, Geophysical Fluid Dynamics...
          institution_id: NOAA-GFDL
          interp_method: conserve_order2
          license: CMIP6 model data produced by NOAA-GFDL is licensed under a Creative Commons...
          mip_era: CMIP6
          nominal_resolution: 100 km
          original_name: tas
          parent_activity_id: CMIP
          parent_experiment_id: piControl-spinup
          parent_mip_era: CMIP6
          parent_source_id: GFDL-CM4
          parent_time_units: days since 1850-1-1 00:00:00
          parent_variant_label: r1i1p1f1
          physics_index: 1
          product: model-output
          realization_index: 1
          realm: atmos
          references: see further_info_url attribute
          source: GFDL-CM4 (2018): 
aerosol: interactive
atmos: GFDL-AM4.0.1 (Cubed-sphere...
          source_file: /mnt/lustre02/work/bd0854/DATA/ESMValTool2/CMIP6_DKRZ/CMIP/NOAA-GFDL/G...
          source_id: GFDL-CM4
          source_type: AOGCM
          sub_experiment: none
          sub_experiment_id: none
          table_id: Amon
          title: NOAA GFDL GFDL-CM4 model output prepared for CMIP6 pre-industrial cont...
          variable_id: tas
          variant_info: N/A
          variant_label: r1i1p1f1
     Cell methods:
          mean: area, time

However, the coordinate height is not missing, but appears twice (because it was additionally added by the fix). This can be very confusing.

The reason for this is that iris also raises an CoordinateNotFoundError for cube.coord('coord_name') if 'coord_name' appears more than once in cube:

iris.exceptions.CoordinateNotFoundError: 'Expected to find exactly 1 coordinate, but found 2. They were: ..., ....'
valeriupredoi commented 4 years ago

but it's impossible to add the same coordinate twice in a cube - even if one is DimCoord and the other is AuxCoord, no?

schlunma commented 4 years ago

Apparently it's possible if the two coordinates are slightly different, e.g. if one has additional attributes and the other one doesn't, like it's the case for the example posted above (one height coordinate has a description, the other one doesn't).

valeriupredoi commented 4 years ago

Pff, smells like an iris bug to me then

Dr Valeriu Predoi. Computational scientist NCAS-CMS University of Reading Department of Meteorology Reading RG6 6BB United Kingdom

On Wed, 4 Dec 2019, 09:05 Manuel Schlund, notifications@github.com wrote:

Apparently it's possible if the two coordinates are slightly different, e.g. if one has additional attributes and the other one doesn't, like it's the case for the example posted above (one height coordinate has a description, the other one doesn't).

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/ESMValGroup/ESMValCore/issues/373?email_source=notifications&email_token=AG5EFI23QH54OHL74P5E6F3QW5XGZA5CNFSM4JLTRLK2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEF4H3AA#issuecomment-561544576, or unsubscribe https://github.com/notifications/unsubscribe-auth/AG5EFI6H4CWJNQWJK3IYPGLQW5XGZANCNFSM4JLTRLKQ .

schlunma commented 4 years ago

I don't think so. Why would they add an error message like this then?

iris.exceptions.CoordinateNotFoundError: 'Expected to find exactly 1 coordinate, but found 2. They were: ..., ....'
valeriupredoi commented 4 years ago

so the error message takes care of the already created problem; the problem should not happen in the first place ie adding two longitude's as both DimCoords should not be allowed methinks

schlunma commented 4 years ago

Adding two DimCoords for the same index is not possible anyway, but height is an AuxCoord

valeriupredoi commented 4 years ago

yeah but what happens if the data is 3D and you remove say, time and add longitude in the place of time (so index 0), with the data now having longitude-longitude-latitude coords -> will that work? Coz if it does, it's bonkers :beer:

schlunma commented 4 years ago

It works if the metadata of the DimCoord is different (changing the points is not enough):

Python 3.7.3 | packaged by conda-forge | (default, Jul  1 2019, 21:52:21) 
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import iris
>>> d1 = iris.coords.DimCoord([1, 2], standard_name='latitude')
>>> d2 = d1.copy()
>>> d3 = d1.copy([2, 4])
>>> d4 = iris.coords.DimCoord([1, 2], standard_name='latitude', var_name='lat')
>>> cube = iris.cube.Cube([[1, 2], [3, 4]])
>>> print(cube)
unknown / (unknown)                 (-- : 2; -- : 2)
>>> cube.add_dim_coord(d1, 0)
>>> cube.add_dim_coord(d1, 0)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/mnt/lustre02/work/bd0854/b309141/miniconda3/envs/mlr/lib/python3.7/site-packages/iris/cube.py", line 1054, in add_dim_coord
    raise ValueError('The coordinate already exists on the cube. '
ValueError: The coordinate already exists on the cube. Duplicate coordinates are not permitted.
>>> cube.add_dim_coord(d1, 1)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/mnt/lustre02/work/bd0854/b309141/miniconda3/envs/mlr/lib/python3.7/site-packages/iris/cube.py", line 1054, in add_dim_coord
    raise ValueError('The coordinate already exists on the cube. '
ValueError: The coordinate already exists on the cube. Duplicate coordinates are not permitted.
>>> cube.add_dim_coord(d2, 1)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/mnt/lustre02/work/bd0854/b309141/miniconda3/envs/mlr/lib/python3.7/site-packages/iris/cube.py", line 1054, in add_dim_coord
    raise ValueError('The coordinate already exists on the cube. '
ValueError: The coordinate already exists on the cube. Duplicate coordinates are not permitted.
>>> cube.add_dim_coord(d3, 1)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/mnt/lustre02/work/bd0854/b309141/miniconda3/envs/mlr/lib/python3.7/site-packages/iris/cube.py", line 1054, in add_dim_coord
    raise ValueError('The coordinate already exists on the cube. '
ValueError: The coordinate already exists on the cube. Duplicate coordinates are not permitted.
>>> cube.add_dim_coord(d4, 1)
>>> cube
<iris 'Cube' of unknown / (unknown) (latitude: 2; latitude: 2)>
>>> 
valeriupredoi commented 4 years ago

Bahahah, you got a Frankencube :rofl: I think we should probably tell the iris guys about this - ping @bjlittle

bjlittle commented 4 years ago

@schlunma and @valeriupredoi

An interesting question, and quite topical... see https://github.com/SciTools/iris/pull/3583 which is part of an overhaul of iris cube-arithmetic that I'm doing at the moment (also see https://github.com/SciTools/iris/issues/3478) and aiming to land in the forthcoming 3.0.0.

This all comes down to the question of "What is the identity of a coordinate?".

To be fair we need to communicate this more clearly, but essentially the answer comes down to the metadata of the coordinate, and in this case that is a combination of the coordinate:

The identify of a coordinate has nothing to do with the value of its points and/or bounds. It's all about the associated metadata.

In your example, only the DimCoords d1 and d4 (has a var_name) differ in metadata, therefore their identity is different. Whereas, d1, d2 (exact copy of d1), and d3 (differs only in points, which is not part of the metadata, and therefore not part of the coordinate identity contract) are in metadata terms all the same - therefore that is why iris raises the exception.

Hope this helps to clarify.

I want to make this a slicker (and configurable) experience for the user and https://github.com/SciTools/iris/pull/3583 is the first step in that direction - plus we need to clearly communicate all of this in the user guide to avoid any on-going confusion.

So IMHO this isn't a bug :smiley:

rcomer commented 4 years ago

For info, I have previously queried the Frankencubes: https://github.com/SciTools/iris/issues/2917

bjlittle commented 4 years ago

"Frankencubes" -- it's now officially a thing :rofl:

I've been thinking about Frankencubes over the weekend, and given that I'm kinda actively working in this area of iris at the moment, I may be able to address this behaviour in time for 3.0.0. No promises though...

I stand by my last comment that this isn't a bug, but then again, the double height coordinate in the cube behaviour is surprising for users and takes some explaining to justify... which hints to me that perhaps a less surprising compromise could be had.

Ideally, what you might want to happen here is for iris to raise an exception to say that the height coordinate already exists in that cube instance... and that comes down to relaxing the contract for coordinate metadata equality. I think there is a common sense compromise to be had here, that's not too upsetting for all concerned...I just need to figure that out and see if it make sense.

So, if you're happy, leave this issue open, and let's see if I can get the Frankencube to learn some table manners :smile:

That said, your temporary workaround here might be something along the lines of:

height = iris.coords.DimCoord(2.0, standard_name="height", units="m", attributes=dict(description="...")
coords = cube.coords(height.name())
if not coords:
    # This isn't a Frankencube...
    cube.add_aux_coord(height)
else:
    # A "height-like" coordinate already exists in the cube...
    if len(coords)>1:
        raise ValueError("oops")
    # Add the extra metadata (what ever it is) to the existing "height" coordinate on the cube...
    coords[0].attributes.update(height.attributes)
schlunma commented 4 years ago

That sounds great @bjlittle, thanks for your help!!

rcomer commented 4 years ago

I should probably say that, since posting that issue, I've come across cases where two coords with the same name appears to be deliberate. Have emailed you @bjlittle .

valeriupredoi commented 4 years ago

Cheers @bjlittle - can we add Frankencube to the iris documentation? :grin:

Dr Valeriu Predoi. Computational scientist NCAS-CMS University of Reading Department of Meteorology Reading RG6 6BB United Kingdom

On Mon, 9 Dec 2019, 10:30 Ruth Comer, notifications@github.com wrote:

I should probably say that, since posting that issue, I've come across cases where two coords with the same name appears to be deliberate. Have emailed you @bjlittle https://github.com/bjlittle .

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ESMValGroup/ESMValCore/issues/373?email_source=notifications&email_token=AG5EFI5E4LYLACD6GZENJMLQXYM5LA5CNFSM4JLTRLK2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEGIUHPQ#issuecomment-563168190, or unsubscribe https://github.com/notifications/unsubscribe-auth/AG5EFIZIFAD2BSAIKIRHIWDQXYM5LANCNFSM4JLTRLKQ .

bjlittle commented 4 years ago

I should probably say that, since posting that issue, I've come across cases where two coords with the same name appears to be deliberate.

After investigating, this wasn't an issue, given the proposed approach that I wish to adopt here. All good.

valeriupredoi commented 4 years ago

@schlunma have you had the chance to look for Frankencubes with iris=3.0?

schlunma commented 4 years ago

@schlunma have you had the chance to look for Frankencubes with iris=3.0?

Do you mean 2.3.0? I think 3.0 is not released yet.

valeriupredoi commented 4 years ago

yes yes, I meant 2.3.0, sorry, I meant the 3rd iteration of 2 :grin:

schlunma commented 4 years ago

No, but I don't think that it is expected to change, since @bjlittle talked about version 3.0.0 :grin:

bjlittle commented 4 years ago

@valeriupredoi and @schlunma The latest scoop is that we've just released iris 2.3.0, but that doesn't contain the changes that you need to address frankencubes(:tm:).

We've also decided to push out iris 2.4.0, as a final 2.x release (this is imminent)... again that won't contain the magic you need, soz.

The iris 3.0.0 release has been slightly delayed due to the massive amount of content we're attempting to cram into it. This release will however, thankfully, have new lenient behaviour that will allow you to optionally tame the beast that is the frankencube(:tm:) :tada:

Apologies for the delay. I'm hoping that we can make 3.0.0rc0 available in February/March. If I get a firm date, then I'll be sure to let you know. Ideally, just follow us on https://twitter.com/scitools_iris so that you don't miss any release/candidate announcements :wink:

HTH

schlunma commented 4 years ago

I found another issue regarding Frankencubes:

The following file has two latitude and longitude (regarding the standard_name) coordinates:

netcdf tos_Omon_BCC-CSM2-MR_historical_r1i1p1f1_gn_185001-201412 {
dimensions:
        time = UNLIMITED ; // (1980 currently)
        lat = 232 ;
        lon = 360 ;
        bnds = 2 ;
variables:
        double time(time) ;
                time:bounds = "time_bnds" ;
                time:units = "days since 1850-1-1" ;
                time:calendar = "365_day" ;
                time:axis = "T" ;
                time:long_name = "time" ;
                time:standard_name = "time" ;
        double time_bnds(time, bnds) ;
        double lat(lat) ;
                lat:bounds = "lat_bnds" ;
                lat:units = "degrees_north" ;
                lat:axis = "Y" ;
                lat:long_name = "latitude" ;
                lat:standard_name = "latitude" ;
        double lat_bnds(lat, bnds) ;
        double lon(lon) ;
                lon:bounds = "lon_bnds" ;
                lon:units = "degrees_east" ;
                lon:axis = "X" ;
                lon:long_name = "Longitude" ;
                lon:standard_name = "longitude" ;
        double lon_bnds(lon, bnds) ;
        float latitude(lat, lon) ;
                latitude:standard_name = "latitude" ;
                latitude:long_name = "latitude" ;
                latitude:units = "degrees_north" ;
                latitude:missing_value = 1.e+20f ;
                latitude:_FillValue = 1.e+20f ;
        float longitude(lat, lon) ;
                longitude:standard_name = "longitude" ;
                longitude:long_name = "longitude" ;
                longitude:units = "degrees_east" ;
                longitude:missing_value = 1.e+20f ;
                longitude:_FillValue = 1.e+20f ;
        float tos(time, lat, lon) ;
                tos:standard_name = "sea_surface_temperature" ;
                tos:long_name = "Sea Surface Temperature" ;
                tos:comment = "Temperature of upper boundary of the liquid ocean, including temperatures below sea-ice and floating ice shelves." ;
                tos:units = "degC" ;
                tos:original_name = "tos" ;
                tos:cell_methods = "area: mean where sea time: mean (interval: 30 minutes)" ;
                tos:cell_measures = "area: areacello" ;
                tos:missing_value = 1.e+20f ;
                tos:_FillValue = 1.e+20f ;
                tos:coordinates = "latitude longitude" ;

This causes the tool to fail with tos: does not match coordinate rank, which is definitely not helping, because the rank of this file is just fine.

The problematic part is here

https://github.com/ESMValGroup/ESMValCore/blob/a0552bd1a33108156d9453ddf884e21087932150/esmvalcore/cmor/check.py#L268-L273

which runs into the except part for latitude and longitude. It would be really nice if we could detect duplicate coordinates during _check_dim_names() or _check_coords() and get a nice error message. It took me 1 hr to find out what the problem was :grimacing:

bouweandela commented 4 years ago

@jvegasbsc or @sloosvel Could one of you have a look at this?

jvegreg commented 4 years ago

Yes, I will do it