ESMValGroup / ESMValTool

ESMValTool: A community diagnostic and performance metrics tool for routine evaluation of Earth system models in CMIP
https://www.esmvaltool.org
Apache License 2.0
228 stars 128 forks source link

Unable to load marine oxygen #585

Closed ledm closed 5 years ago

ledm commented 6 years ago

I'm looking at marine oxygen in CMIP5 data using the current version 2 development branch. ESMValTool is not able to load any CMIP5 marine oxygen netCDFs and I can't fully understand whats going on.

So far, I've found that

I'm able to reproduce the problem with the following snipet:

import iris
from esmvaltool.preprocessor._io import concatenate_callback

files = ['/Path/to/files/o2_Oyr_HadGEM2-CC_historical_r1i1p1_1960-2005.nc',]
constraints = 'mole_concentration_of_molecular_oxygen_in_sea_water'
callback = concatenate_callback
cubes = iris.load_raw(files, constraints=constraints, callback=callback)
print(cubes)

produces []

However, if I run the equivalent over another field (for instance marine chlorohpyll), everything seems to work fine:

import iris
from esmvaltool.preprocessor._io import concatenate_callback

files = [/Path/to/files/chl_20110930/chl_Oyr_HadGEM2-CC_historical_r1i1p1_1960-2005.nc',]
constraints = 'mass_concentration_of_phytoplankton_expressed_as_chlorophyll_in_sea_water'
callback = concatenate_callback
cubes = iris.load_raw(files, constraints=constraints, callback=callback)
print(cubes)

produces a description of the file loaded:

0: mass_concentration_of_phytoplankton_expressed_as_chlorophyll_in_sea_water / (kg m-3) (time: 46; depth: 40; latitude: 216; longitude: 360)

Can anyone help me figure out why this command would work for Chlorophyll but not for oxygen? Perhaps the automatically loaded constraints are not appropriate? Why would ESMValTool want to concatenate a single file - is that a problem?

Thanks for the help,

cheers!

Lee

schlunma commented 6 years ago

Hi Lee,

I had the same problem several times. In my cases, the nc file either had no standard_name or it differed from the one given in the CMOR tables. An example for a fix is nbp for inmcm4.

ledm commented 6 years ago

Hi @schlunma,

thanks! That was very helpful and got me over that bug, and into another! Even with the standard_name and long_name set, I'm still seeing the same problem. Are there other cube properties that could be missing from these files?

Lee

schlunma commented 6 years ago

Hmm...for me fixing the standard_name always helped. I don't know of any other attributes, I thought the constraint only contains the standard_name of the cube. Can you post the output of the tool and print the constraint?

ledm commented 6 years ago

Hi,

the output of the command cubes = iris.load_raw(files, constraints=constraints, callback=callback) (as in the first snipet above) is below. Note that I have added several print statements to help me track whats happening.

cubes = iris.load_raw(files, constraints=constraints, callback=callback)
   ...: 
iris.__init__.load_raw load_raw
iris.__init__.load_raw ['o2_Oyr_HadGEM2-ES_historical_r1i1p1_2000-2005.nc'] mole_concentration_of_molecular_oxygen_in_sea_water <function concatenate_callback at 0x7f5c28f16d90>
iris.__init__._load_collection
iris.__init__._load_collection cubes: <generator object _generate_cubes at 0x7f5c28ec3d58>
_CubeFilter:  Constraint(name='mole_concentration_of_molecular_oxygen_in_sea_water')
_CubeFilter: cubes:  < No cubes >
_CubeFilterCollection init
_CubeFilterCollection, cubes: <generator object _generate_cubes at 0x7f5c28ec3d58> pairs: [<iris.cube._CubeFilter object at 0x7f5c28f5ea58>] collection: <iris.cube._CubeFilterCollection object at 0x7f5c28f5e3c8>
iris.__init__._generate_cubes
iris.__init__._generate_cubes, scheme file <itertools._grouper object at 0x7f5c28f5eb70>

/users/modellers/ledm/workspace/ESMValToolTest/miniconda/minicondaLee/envs/esmvaltool/lib/python3.6/site-packages/iris/fileformats/cf.py:1143: IrisDeprecation: NetCDF default loading behaviour currently does not expose variables which define reference surfaces for dimensionless vertical coordinates as independent Cubes. This behaviour is deprecated in favour of automatic promotion to Cubes. To switch to the new behaviour, set iris.FUTURE.netcdf_promote to True.
  warn_deprecated(msg)
/users/modellers/ledm/workspace/ESMValToolTest/miniconda/minicondaLee/envs/esmvaltool/lib/python3.6/site-packages/iris/fileformats/_pyke_rules/compiled_krb/fc_rules_cf_fc.py:1817: FutureWarning: Conversion of the second argument of issubdtype from `str` to `str` is deprecated. In future, it will be treated as `np.str_ == np.dtype(str).type`.
  if np.issubdtype(cf_var.dtype, np.str):
/users/modellers/ledm/workspace/ESMValToolTest/miniconda/minicondaLee/envs/esmvaltool/lib/python3.6/site-packages/iris/fileformats/_pyke_rules/compiled_krb/fc_rules_cf_fc.py:1817: FutureWarning: Conversion of the second argument of issubdtype from `str` to `str` is deprecated. In future, it will be treated as `np.str_ == np.dtype(str).type`.
  if np.issubdtype(cf_var.dtype, np.str):

_CubeFilterCollection, cube:  mole_concentration_of_dissolved_molecular_oxygen_in_sea_water / (mol m-3) (time: 6; depth: 40; latitude: 216; longitude: 360)
     Dimension coordinates:
          time                                                                 x         -             -               -
          depth                                                                -         x             -               -
          latitude                                                             -         -             x               -
          longitude                                                            -         -             -               x
     Attributes:
          Conventions: CF-1.5
          NCO: "4.5.3"
          _filename: /users/modellers/ledm/workspace/ESMValToolTest/KIT_data/o2_Oyr_HadGEM2...
          associated_files: baseURL: http://cmip-pcmdi.llnl.gov/CMIP5/dataLocation gridspecFile: gridspec_ocnBgchem_fx_HadGEM2-ES_historical_r0i0p0.nc...
          branch_time: 0.0
          cmor_version: 2.7.1
          contact: chris.d.jones@metoffice.gov.uk, john.hughes@metoffice.gov.uk
          experiment: historical
          experiment_id: historical
          forcing: GHG, Oz, SA, LU, Sl, Vl, BC, OC, (GHG = CO2, N2O, CH4, CFCs)
          frequency: yr
          initialization_method: 1
          institute_id: MOHC
          institution: Met Office Hadley Centre, Fitzroy Road, Exeter, Devon, EX1 3PB, UK, (h...
          invalid_standard_name: mole_concentration_of_molecular_oxygen_in_sea_water
          mo_runid: ajhoh
          model_id: HadGEM2-ES
          modeling_realm: ocnBgchem
          original_name: mo: m02s00i115/1000.
          parent_experiment: pre-industrial control
          parent_experiment_id: piControl
          parent_experiment_rip: r1i1p1
          physics_version: 1
          product: output
          project_id: CMIP5
          realization: 1
          references: Bellouin N. et al, (2007) Improved representation of aerosols for HadGEM2....
          source: HadGEM2-ES (2009) atmosphere: HadGAM2 (N96L38); ocean: HadGOM2 (lat: 1.0-0.3...
          table_id: Table Oyr (27 April 2011) a816306750f284585dc77210f193f7bb
          title: HadGEM2-ES model output prepared for CMIP5 historical
     Cell methods:
          mean: time
          mean where sea: area

_CubeFilterCollection: add_cube
_CubeFilter: add
iris.__init__._load_collection result: <iris.cube._CubeFilterCollection object at 0x7f5c28f5e3c8>
<class 'iris.cube._CubeFilterCollection'>
<bound method _CubeFilterCollection.cubes of <iris.cube._CubeFilterCollection object at 0x7f5c28f5e3c8>>
_CubeFilterCollection: cubes
_CubeFilterCollection: cubes... iterating. 0
iris.__init__.load_raw result of load collection < No cubes >

As you may be able to see, it seems to load the raw cube just fine, but iris refuses to return the cube as part of a cube filter collection.

ledm commented 6 years ago

As I told @valeriupredoi, I suspect that the real underlying issue here is not that the oxygen files are the problem, but rather that ESMValTool (or possibly iris) is failing silently without any way to figure out what went wrong.

schlunma commented 6 years ago

Hi Lee,

I think I found the problem: When I load the dataset directly by iris.load_cube(...), the cube's name is called Dissolve Oxygen Concentration, which is not the standard_name of the cube, but the long_name.

Since the standard_name is set correctly in the original file (at least ncdump tells me that), this is an issue of iris. I don't know why it is not able to the load the standard_name.

The ESMValTool uses the iris.load_raw(files, constraints=constraints) function to load the cubes, but constraints only includes the standard_name. A fix for this problem could be to extend the constraints in the load function with the long_name or even the var_name of the nc file. Is this a solution @bouweandela?

valeriupredoi commented 6 years ago

extract_strict(constraint=name) works fine with the long_name even if the standard_name is messed up (as in the case of derived variables that have an attribute of invalid_standard_name; of course one needs to set long_name) so if one does a load() followed by extract_strict() this will be prevented. Having said this, I will have a look at the actual files right now @ledm

valeriupredoi commented 6 years ago

ok so I have just looked the o2 file: that one doe not have a standard_name (instead has an invalid_standard_name in the attributes which denotes an operation on the original the variable):

iris.load_raw('/badc/cmip5/data/cmip5/output1/MOHC/HadGEM2-ES/historical/mon/ocnBgchem/Omon/r1i1p1/latest/o2/o2_Omon_HadGEM2-ES_historical_r1i1p1_195912-200512.nc', constraints=iris.Constraint(name='mole_concentration_of_molecular_oxygen_in_sea_water'))

will of course return an empty list because that name does not exist neither as standard nor as long name; however if one does the load:

d=load(cubes)

followed by assignment of long name and extraction:

d[0].long_name = 'mole_concentration_of_molecular_oxygen_in_sea_water'

 d.extract_strict(constraints=iris.Constraint(name='mole_concentration_of_molecular_oxygen_in_sea_water'))
<iris 'Cube' of mole_concentration_of_molecular_oxygen_in_sea_water / (mol m-3) (time: 553; latitude: 216; longitude: 360)>

works fine

valeriupredoi commented 6 years ago

I suggest we add this case at loading point - @bouweandela ?

ledm commented 6 years ago

Just a quick comment, the correct standard name is mole_concentration_of_dissolved_molecular_oxygen_in_sea_water, where as mole_concentration_of_molecular_oxygen_in_sea_water is not a valid standard name.

Is there any way that I can add these into a fix_file fixes command?

ledm commented 6 years ago

So, it looks like the problem here was that the CMOR table in ESMValTool uses the incorrect standard name.

As you can see in the file https://github.com/ESMValGroup/ESMValTool/blob/version2_development/esmvaltool/cmor/tables/cmip5/Tables/CMIP5_Oyr, the ESMValTool CMOR table uses the non-CF standard name, mole_concentration_of_molecular_oxygen_in_sea_water instead of the CF-compliant standard name mole_concentration_of_dissolved_molecular_oxygen_in_sea_water. Replacing the wrong one with the right name results in everything working as expected!

schlunma commented 6 years ago

That is weird. The file at DKRZ has a standard_name which is set to mole_concentration_of_molecular_oxygen_in_sea_water and still cannot be processed. Anyway, glad that you found a solution 👍

valeriupredoi commented 6 years ago

@ledm @schlunma the standard name has been changed to the correct one mentioned by Lee for CMIP6_Omon/Oyr - I reckon we should do the same for CMIP5, what say you @jvegasbsc ?

valeriupredoi commented 6 years ago

OK guys so @ledm would be very happy if we sorted this out sooner than later so what I am proposing is to change the standard name in CMIP5 CMOR files to the one that is in CMIP6 files and add a check on the existence of standard_name right before we load like:

if not cube.standard_name:
    long_name = name

so that the load can be done even if that standard name does not exist in cube. What say yous?

schlunma commented 6 years ago

If the standard_name changed in CMIP6 we should keep the CMIP5 version and change wrong standard_names in fix files. But extending the constraint to somehow inlcude the long_name sounds very good :+1:

bouweandela commented 6 years ago

The reason I added the standard_name constraint is that in some cases you end up loading other stuff than the required variable into cubes. As you say, the standard name can be fixed with fix_file and then things work fine, so why would we need additional or different constraints?

valeriupredoi commented 6 years ago

@bouweandela absolutely, ma man -- the constraint is most useful, but it is a bit too constrainy :grin: - case at hand is this o2 file - it doesn't have a standard_name so the load_raw by constraint fails noqa which is not cool. Also note that fix_file works only after load point, the load point is the 0th point and o2 does not go past it, when it should (I mean, how'd that file passed the ESGF tests is beyond me but it's there now). Anyway, I had to deal with some real nasty OBS files for a new UKESMer today and I put this in _io.py (the OBS files are actually cubeLists per single monthly time point with no standard_name and weird long_name):

def load_cubes(files, filename, metadata, constraints=None, callback=None):
    """Load iris cubes from files"""
    logger.debug("Loading:\n%s", "\n".join(files))
    cubes = iris.load_raw(files, constraints=constraints, callback=callback)
    if not cubes:
        cubes = []
        for file_i in files:
            cube_list = iris.load(file_i)
            for cube in cube_list:
                if cube.long_name == 'NOAA Climate Data Record (CDR) of Monthly GPCP Satellite-Gauge Combined Precipitation':
                    interest = cube
            interest.long_name = constraints
            # ad-hoc fixes; these should go into the cmor/fixes
            interest.coord('longitude').long_name = 'longitude'
            interest.coord('latitude').long_name = 'latitude'
            interest.coord('longitude').var_name = 'lon'
            interest.coord('latitude').var_name = 'lat'
            iris.util.promote_aux_coord_to_dim_coord(interest, 'longitude')
            iris.util.promote_aux_coord_to_dim_coord(interest, 'latitude')
            interest.coord('latitude').points = interest.coord('latitude').points[::-1]
            interest.data[0] = interest.data[0][::-1]
            interest.units = 'kg m-2 s-1'
            interest.standard_name = 'precipitation_flux'
            interest.var_name = 'precipitation_flux'
            # end fixes
            cubes.append(cube_list.extract_strict(constraints=iris.Constraint(interest.long_name)))
    iris.util.unify_time_units(cubes)
    if not cubes:
        raise Exception('Can not load cubes from {0}'.format(files))

    for cube in cubes:
        cube.attributes['_filename'] = filename
        cube.attributes['metadata'] = yaml.safe_dump(metadata)
        # TODO add block below when using iris 2.0
        # always set fillvalue to 1e+20
        # if np.ma.is_masked(cube.data):
        #     np.ma.set_fill_value(cube.data, GLOBAL_FILL_VALUE)

    return cubes

of course, those metadata and data fixes will go into cmor/_fixes, but if there is only the load_raw with standard_name constraint, these types of files will never be loaded.

valeriupredoi commented 6 years ago

also note that the concatenation in _io is very basic at the moment and will fail (thanks, iris!) if even certain metadata arguments (that are unimportant to the data operations themselves) differ between cubes, but that's a different story

schlunma commented 6 years ago

Actually, fix_file is executed before the loading step, so that should work. It does at least in the case of inmcm4`s nbp:

https://github.com/ESMValGroup/ESMValTool/blob/dfedafd28dfad253c569e10aebdd9e18f6ea3739/esmvaltool/cmor/_fixes/CMIP5/inmcm4.py#L58

valeriupredoi commented 6 years ago

That's cool - I wasn't aware we can apply fixes that go before the load point (the callback is actually working nicely then!). Perfect then, problem solved for this o2 :grin: here's an idea - we can use these fixes to fix OBS files too? Like that crap set of files I posted the fixes in the code snippet

schlunma commented 6 years ago

Yeah, I already did it with some OBS file outside of the tool, worked fine! :+1:

valeriupredoi commented 6 years ago

sweet caboose, that is useful! could you point me to that one, man, pls? You guys have a good weekend in the meantime, I needs beer :beers:

schlunma commented 6 years ago

It was an OBS file where the standard_name was wrong, unfortunately I did the fix just for testing purposes on a local branch which is gone now. However, just put a file with the name of your observations here ESMValTool/esmvaltool/cmor/_fixes/OBS/ and create a class with the name of the variable inside it with a function fix_file like that:

https://github.com/ESMValGroup/ESMValTool/blob/dfedafd28dfad253c569e10aebdd9e18f6ea3739/esmvaltool/cmor/_fixes/CMIP5/inmcm4.py#L58

Enjoy your weekend, man! :beers:

bouweandela commented 6 years ago

Related to #606

mattiarighi commented 5 years ago

@ledm is this solved?

ledm commented 5 years ago

Yes, I think so.