Closed ledm closed 5 years ago
Hi Lee,
I had the same problem several times. In my cases, the nc file either had no standard_name
or it differed from the one given in the CMOR tables. An example for a fix is nbp for inmcm4.
Hi @schlunma,
thanks! That was very helpful and got me over that bug, and into another! Even with the standard_name and long_name set, I'm still seeing the same problem. Are there other cube properties that could be missing from these files?
Lee
Hmm...for me fixing the standard_name
always helped. I don't know of any other attributes, I thought the constraint
only contains the standard_name
of the cube. Can you post the output of the tool and print the constraint
?
Hi,
the output of the command cubes = iris.load_raw(files, constraints=constraints, callback=callback)
(as in the first snipet above) is below. Note that I have added several print statements to help me track whats happening.
cubes = iris.load_raw(files, constraints=constraints, callback=callback)
...:
iris.__init__.load_raw load_raw
iris.__init__.load_raw ['o2_Oyr_HadGEM2-ES_historical_r1i1p1_2000-2005.nc'] mole_concentration_of_molecular_oxygen_in_sea_water <function concatenate_callback at 0x7f5c28f16d90>
iris.__init__._load_collection
iris.__init__._load_collection cubes: <generator object _generate_cubes at 0x7f5c28ec3d58>
_CubeFilter: Constraint(name='mole_concentration_of_molecular_oxygen_in_sea_water')
_CubeFilter: cubes: < No cubes >
_CubeFilterCollection init
_CubeFilterCollection, cubes: <generator object _generate_cubes at 0x7f5c28ec3d58> pairs: [<iris.cube._CubeFilter object at 0x7f5c28f5ea58>] collection: <iris.cube._CubeFilterCollection object at 0x7f5c28f5e3c8>
iris.__init__._generate_cubes
iris.__init__._generate_cubes, scheme file <itertools._grouper object at 0x7f5c28f5eb70>
/users/modellers/ledm/workspace/ESMValToolTest/miniconda/minicondaLee/envs/esmvaltool/lib/python3.6/site-packages/iris/fileformats/cf.py:1143: IrisDeprecation: NetCDF default loading behaviour currently does not expose variables which define reference surfaces for dimensionless vertical coordinates as independent Cubes. This behaviour is deprecated in favour of automatic promotion to Cubes. To switch to the new behaviour, set iris.FUTURE.netcdf_promote to True.
warn_deprecated(msg)
/users/modellers/ledm/workspace/ESMValToolTest/miniconda/minicondaLee/envs/esmvaltool/lib/python3.6/site-packages/iris/fileformats/_pyke_rules/compiled_krb/fc_rules_cf_fc.py:1817: FutureWarning: Conversion of the second argument of issubdtype from `str` to `str` is deprecated. In future, it will be treated as `np.str_ == np.dtype(str).type`.
if np.issubdtype(cf_var.dtype, np.str):
/users/modellers/ledm/workspace/ESMValToolTest/miniconda/minicondaLee/envs/esmvaltool/lib/python3.6/site-packages/iris/fileformats/_pyke_rules/compiled_krb/fc_rules_cf_fc.py:1817: FutureWarning: Conversion of the second argument of issubdtype from `str` to `str` is deprecated. In future, it will be treated as `np.str_ == np.dtype(str).type`.
if np.issubdtype(cf_var.dtype, np.str):
_CubeFilterCollection, cube: mole_concentration_of_dissolved_molecular_oxygen_in_sea_water / (mol m-3) (time: 6; depth: 40; latitude: 216; longitude: 360)
Dimension coordinates:
time x - - -
depth - x - -
latitude - - x -
longitude - - - x
Attributes:
Conventions: CF-1.5
NCO: "4.5.3"
_filename: /users/modellers/ledm/workspace/ESMValToolTest/KIT_data/o2_Oyr_HadGEM2...
associated_files: baseURL: http://cmip-pcmdi.llnl.gov/CMIP5/dataLocation gridspecFile: gridspec_ocnBgchem_fx_HadGEM2-ES_historical_r0i0p0.nc...
branch_time: 0.0
cmor_version: 2.7.1
contact: chris.d.jones@metoffice.gov.uk, john.hughes@metoffice.gov.uk
experiment: historical
experiment_id: historical
forcing: GHG, Oz, SA, LU, Sl, Vl, BC, OC, (GHG = CO2, N2O, CH4, CFCs)
frequency: yr
initialization_method: 1
institute_id: MOHC
institution: Met Office Hadley Centre, Fitzroy Road, Exeter, Devon, EX1 3PB, UK, (h...
invalid_standard_name: mole_concentration_of_molecular_oxygen_in_sea_water
mo_runid: ajhoh
model_id: HadGEM2-ES
modeling_realm: ocnBgchem
original_name: mo: m02s00i115/1000.
parent_experiment: pre-industrial control
parent_experiment_id: piControl
parent_experiment_rip: r1i1p1
physics_version: 1
product: output
project_id: CMIP5
realization: 1
references: Bellouin N. et al, (2007) Improved representation of aerosols for HadGEM2....
source: HadGEM2-ES (2009) atmosphere: HadGAM2 (N96L38); ocean: HadGOM2 (lat: 1.0-0.3...
table_id: Table Oyr (27 April 2011) a816306750f284585dc77210f193f7bb
title: HadGEM2-ES model output prepared for CMIP5 historical
Cell methods:
mean: time
mean where sea: area
_CubeFilterCollection: add_cube
_CubeFilter: add
iris.__init__._load_collection result: <iris.cube._CubeFilterCollection object at 0x7f5c28f5e3c8>
<class 'iris.cube._CubeFilterCollection'>
<bound method _CubeFilterCollection.cubes of <iris.cube._CubeFilterCollection object at 0x7f5c28f5e3c8>>
_CubeFilterCollection: cubes
_CubeFilterCollection: cubes... iterating. 0
iris.__init__.load_raw result of load collection < No cubes >
As you may be able to see, it seems to load the raw cube just fine, but iris refuses to return the cube as part of a cube filter collection.
As I told @valeriupredoi, I suspect that the real underlying issue here is not that the oxygen files are the problem, but rather that ESMValTool (or possibly iris) is failing silently without any way to figure out what went wrong.
Hi Lee,
I think I found the problem: When I load the dataset directly by iris.load_cube(...)
, the cube's name is called Dissolve Oxygen Concentration
, which is not the standard_name
of the cube, but the long_name
.
Since the standard_name
is set correctly in the original file (at least ncdump
tells me that), this is an issue of iris. I don't know why it is not able to the load the standard_name
.
The ESMValTool uses the iris.load_raw(files, constraints=constraints)
function to load the cubes, but constraints
only includes the standard_name
. A fix for this problem could be to extend the constraints
in the load function with the long_name
or even the var_name
of the nc file. Is this a solution @bouweandela?
extract_strict(constraint=name)
works fine with the long_name
even if the standard_name
is messed up (as in the case of derived variables that have an attribute of invalid_standard_name
; of course one needs to set long_name
) so if one does a load()
followed by extract_strict()
this will be prevented. Having said this, I will have a look at the actual files right now @ledm
ok so I have just looked the o2 file: that one doe not have a standard_name
(instead has an invalid_standard_name
in the attributes which denotes an operation on the original the variable):
iris.load_raw('/badc/cmip5/data/cmip5/output1/MOHC/HadGEM2-ES/historical/mon/ocnBgchem/Omon/r1i1p1/latest/o2/o2_Omon_HadGEM2-ES_historical_r1i1p1_195912-200512.nc', constraints=iris.Constraint(name='mole_concentration_of_molecular_oxygen_in_sea_water'))
will of course return an empty list because that name does not exist neither as standard nor as long name; however if one does the load:
d=load(cubes)
followed by assignment of long name and extraction:
d[0].long_name = 'mole_concentration_of_molecular_oxygen_in_sea_water'
d.extract_strict(constraints=iris.Constraint(name='mole_concentration_of_molecular_oxygen_in_sea_water'))
<iris 'Cube' of mole_concentration_of_molecular_oxygen_in_sea_water / (mol m-3) (time: 553; latitude: 216; longitude: 360)>
works fine
I suggest we add this case at loading point - @bouweandela ?
Just a quick comment, the correct standard name is mole_concentration_of_dissolved_molecular_oxygen_in_sea_water
, where as mole_concentration_of_molecular_oxygen_in_sea_water
is not a valid standard name.
Is there any way that I can add these into a fix_file
fixes command?
So, it looks like the problem here was that the CMOR table in ESMValTool uses the incorrect standard name.
As you can see in the file
https://github.com/ESMValGroup/ESMValTool/blob/version2_development/esmvaltool/cmor/tables/cmip5/Tables/CMIP5_Oyr, the ESMValTool CMOR table uses the non-CF standard name, mole_concentration_of_molecular_oxygen_in_sea_water
instead of the CF-compliant standard name mole_concentration_of_dissolved_molecular_oxygen_in_sea_water
. Replacing the wrong one with the right name results in everything working as expected!
That is weird. The file at DKRZ has a standard_name
which is set to mole_concentration_of_molecular_oxygen_in_sea_water
and still cannot be processed. Anyway, glad that you found a solution 👍
@ledm @schlunma the standard name has been changed to the correct one mentioned by Lee for CMIP6_Omon/Oyr - I reckon we should do the same for CMIP5, what say you @jvegasbsc ?
OK guys so @ledm would be very happy if we sorted this out sooner than later so what I am proposing is to change the standard name in CMIP5 CMOR files to the one that is in CMIP6 files and add a check on the existence of standard_name
right before we load like:
if not cube.standard_name:
long_name = name
so that the load can be done even if that standard name does not exist in cube. What say yous?
If the standard_name
changed in CMIP6 we should keep the CMIP5 version and change wrong standard_names
in fix files. But extending the constraint
to somehow inlcude the long_name
sounds very good :+1:
The reason I added the standard_name constraint is that in some cases you end up loading other stuff than the required variable into cubes. As you say, the standard name can be fixed with fix_file
and then things work fine, so why would we need additional or different constraints?
@bouweandela absolutely, ma man -- the constraint is most useful, but it is a bit too constrainy :grin: - case at hand is this o2 file - it doesn't have a standard_name so the load_raw by constraint fails noqa which is not cool. Also note that fix_file works only after load point, the load point is the 0th point and o2 does not go past it, when it should (I mean, how'd that file passed the ESGF tests is beyond me but it's there now). Anyway, I had to deal with some real nasty OBS files for a new UKESMer today and I put this in _io.py (the OBS files are actually cubeLists per single monthly time point with no standard_name and weird long_name):
def load_cubes(files, filename, metadata, constraints=None, callback=None):
"""Load iris cubes from files"""
logger.debug("Loading:\n%s", "\n".join(files))
cubes = iris.load_raw(files, constraints=constraints, callback=callback)
if not cubes:
cubes = []
for file_i in files:
cube_list = iris.load(file_i)
for cube in cube_list:
if cube.long_name == 'NOAA Climate Data Record (CDR) of Monthly GPCP Satellite-Gauge Combined Precipitation':
interest = cube
interest.long_name = constraints
# ad-hoc fixes; these should go into the cmor/fixes
interest.coord('longitude').long_name = 'longitude'
interest.coord('latitude').long_name = 'latitude'
interest.coord('longitude').var_name = 'lon'
interest.coord('latitude').var_name = 'lat'
iris.util.promote_aux_coord_to_dim_coord(interest, 'longitude')
iris.util.promote_aux_coord_to_dim_coord(interest, 'latitude')
interest.coord('latitude').points = interest.coord('latitude').points[::-1]
interest.data[0] = interest.data[0][::-1]
interest.units = 'kg m-2 s-1'
interest.standard_name = 'precipitation_flux'
interest.var_name = 'precipitation_flux'
# end fixes
cubes.append(cube_list.extract_strict(constraints=iris.Constraint(interest.long_name)))
iris.util.unify_time_units(cubes)
if not cubes:
raise Exception('Can not load cubes from {0}'.format(files))
for cube in cubes:
cube.attributes['_filename'] = filename
cube.attributes['metadata'] = yaml.safe_dump(metadata)
# TODO add block below when using iris 2.0
# always set fillvalue to 1e+20
# if np.ma.is_masked(cube.data):
# np.ma.set_fill_value(cube.data, GLOBAL_FILL_VALUE)
return cubes
of course, those metadata and data fixes will go into cmor/_fixes, but if there is only the load_raw with standard_name constraint, these types of files will never be loaded.
also note that the concatenation in _io is very basic at the moment and will fail (thanks, iris!) if even certain metadata arguments (that are unimportant to the data operations themselves) differ between cubes, but that's a different story
Actually, fix_file
is executed before the loading step, so that should work. It does at least in the case of inmcm4`s nbp:
That's cool - I wasn't aware we can apply fixes that go before the load point (the callback is actually working nicely then!). Perfect then, problem solved for this o2 :grin: here's an idea - we can use these fixes to fix OBS files too? Like that crap set of files I posted the fixes in the code snippet
Yeah, I already did it with some OBS file outside of the tool, worked fine! :+1:
sweet caboose, that is useful! could you point me to that one, man, pls? You guys have a good weekend in the meantime, I needs beer :beers:
It was an OBS file where the standard_name
was wrong, unfortunately I did the fix just for testing purposes on a local branch which is gone now. However, just put a file with the name of your observations here ESMValTool/esmvaltool/cmor/_fixes/OBS/
and create a class with the name of the variable inside it with a function fix_file
like that:
Enjoy your weekend, man! :beers:
Related to #606
@ledm is this solved?
Yes, I think so.
I'm looking at marine oxygen in CMIP5 data using the current version 2 development branch. ESMValTool is not able to load any CMIP5 marine oxygen netCDFs and I can't fully understand whats going on.
So far, I've found that
I'm able to reproduce the problem with the following snipet:
produces
[]
However, if I run the equivalent over another field (for instance marine chlorohpyll), everything seems to work fine:
produces a description of the file loaded:
Can anyone help me figure out why this command would work for Chlorophyll but not for oxygen? Perhaps the automatically loaded constraints are not appropriate? Why would ESMValTool want to concatenate a single file - is that a problem?
Thanks for the help,
cheers!
Lee