Closed bascrezee closed 5 years ago
Since alb
is a custom variable you need to read from the custom table.
The above error has been solved, thanks.
I ran into another error. I am quite sure that the 'standard_name' in CMOR_alb.dat is supposed to be left empty, but it raises an error. However, changing it to some random other valid standard name does not remove the error, so it seems not fully related.
2019-05-06 15:24:49,085 INFO esmvaltool.utils.cmorizers.obs.cmorize_obs_Duveiller2018,89 CMORizing var alb from file /net/exo/landclim/PROJECTS/C3S/datadir/rawobsdir/Tier2/Duveiller2018/albedo_IGBPgen.nc
Traceback (most recent call last):
File "/net/exo/landclim/crezees/conda/envs/esmvaltool-public/bin/cmorize_obs", line 11, in <module>
load_entry_point('ESMValTool', 'console_scripts', 'cmorize_obs')()
File "/home/crezees/ESMValTool/esmvaltool/utils/cmorizers/obs/cmorize_obs.py", line 201, in execute_cmorize
_cmor_reformat(config_user, obs_list)
File "/home/crezees/ESMValTool/esmvaltool/utils/cmorizers/obs/cmorize_obs.py", line 260, in _cmor_reformat
module_root + dataset)
File "/home/crezees/ESMValTool/esmvaltool/utils/cmorizers/obs/cmorize_obs.py", line 122, in _run_pyt_script
py_cmor.cmorization(in_dir, out_dir)
File "/home/crezees/ESMValTool/esmvaltool/utils/cmorizers/obs/cmorize_obs_Duveiller2018.py", line 103, in cmorization
extract_variable(var_info, raw_info, out_dir, glob_attrs)
File "/home/crezees/ESMValTool/esmvaltool/utils/cmorizers/obs/cmorize_obs_Duveiller2018.py", line 63, in extract_variable
_fix_var_metadata(cube, var_info)
File "/home/crezees/ESMValTool/esmvaltool/utils/cmorizers/obs/utilities.py", line 43, in _fix_var_metadata
cube.standard_name = var_info.standard_name
File "/net/exo/landclim/crezees/conda/envs/esmvaltool-public/lib/python3.6/site-packages/iris/_cube_coord_common.py", line 128, in standard_name
raise ValueError('%r is not a valid standard_name' % name)
ValueError: '' is not a valid standard_name
@tomaslovato or @valeriupredoi can you help?
@bascrezee At first I would say that it is a problem with the standard_name definition...
I saw that the following branch exists version2_development_cmorize_duveiller2018
If it is yours or connected to this issue, could you please upload in there both the Duveiller2018.yml and cmorize_obs_Duveiller2018.py so it will much easier to reproduce the error !
I just staged the files and pushed them. Thanks for looking into this!
the error is a standard iris error for non-standard standard names (CF conventions) :grin:
Here is an example of a custom cmor table for a variable which will not have any standard name since otherwise will break CF conventions and hence get the iris error above
!----------------------------------
! Variable attributes:
!----------------------------------
standard_name:
units: 1
cell_methods: time: mean
the problem here is that the custom cmor table will not contain any entry for standard_name since it's a derived variable so the cmorizer will always fail because of that cube.standard_name = var_info.standard_name
line. So we need to plug in a special case in the cmorizer utilities that accounts for derived variables. That's not going to be easy because the purpose of the cmorizer is to make cmor-compliant data that also adheres to CF standards; any way you can grab the rsds
and rsus
datasets so alb
can be derived internally in ESMValTool?
Thanks Valeriu, I think I kind of get what you mean.
What you suggest as a solution, is not a solution here, since this observational dataset has no rsds or rsus. There is just values of (difference in) albedo.
in that case put a check in utilities.py
eg
if var_info.standard_name == '':
cube.standard_name = None
that will save the cube ok and will be ok when running it through ESMValTool since standard_name is None anyway from the custom table
What you suggest as a solution, is not a solution here, since this observational dataset has no rsds or rsus. There is just values of (difference in) albedo.
That's correct. Derived variables are designed for models only, in order to compare with a variable which is only available in the OBS.
Solution works :) I'll keep the issue open until I finished the CMORization :)
I now arrived at taking care of the 'time' axis. This is a somewhat special case, since it is a climatological dataset. How should I deal with this within ESMValTool? (See ncdump below). There are CF conventions describing how NetCDF files with climatological statistics should look like, however, since the original dataset does not adhere to these conventions, it would be involving to get there... Any guidance?
Here is the ncdump:
netcdf albedo_IGBPgen {
dimensions:
lon = 360 ;
lat = 180 ;
mon = 12 ;
iTr = 6 ;
variables:
double lon(lon) ;
lon:units = "degreesE" ;
lon:long_name = "Longitude" ;
double lat(lat) ;
lat:units = "degreesN" ;
lat:long_name = "Latitude" ;
int mon(mon) ;
mon:units = "months" ;
mon:long_name = "Month" ;
double iTr(iTr) ;
iTr:long_name = "Vegetation transition code" ;
float Delta_albedo(iTr, mon, lat, lon) ;
Delta_albedo:_FillValue = NaNf ;
Delta_albedo:long_name = "Difference in surface albedo for a given vegetation cover transition" ;
float SD_Delta_albedo(iTr, mon, lat, lon) ;
SD_Delta_albedo:_FillValue = NaNf ;
SD_Delta_albedo:long_name = "St.Dev. on the diff. in surface albedo for a given vegetation cover transition" ;
float N_Delta_albedo(iTr, mon, lat, lon) ;
N_Delta_albedo:units = "samples" ;
N_Delta_albedo:_FillValue = NaNf ;
N_Delta_albedo:long_name = "Number of samples from which the aggregated estimate is made" ;
}
Climatological data are not officially supported yet by Iris (https://github.com/SciTools/iris/issues/2904). Soon it will be possible to vote for this functionality in Iris (https://github.com/SciTools/iris/issues/3307). I now wonder if it makes sense to CMORize this dataset at this moment. Is it possible to simply read and plot this dataset in a custom diagnostic without running the CMORizing script? @mattiarighi Thanks for your help :)
@ledm has cmorized some climatological data from the WOA
dataset, you can try to use his script as an example.
@bascrezee Actually You can define the timeline of the dataset using time
instead of mon
, by setting the correct year of reference for the climatology as done for WOA
data. This make even more sense since the climatology is representative of a certain period and it should be better to have it explicitly associated to the data .
You can add a custom variable for the reference year similarly to WOA
https://github.com/ESMValGroup/ESMValTool/blob/0b4ef0e7b1f124897a75981b0c82e47153742068/esmvaltool/utils/cmorizers/obs/cmor_config/WOA.yml#L40-L43
and then read it within the cmorization
function of your cmorizer script using CFG['custom']['years']
and finally apply/set the time values to the cube, e.g, within extract_variable
.
Sounds like a good approach. The original data contains a monthly climatology over 4 years (2008-2012). Is my understanding correct, that with the approach you suggest, 4 files will be written, one for each year? Each file will hold exactly the same data values. Since the data is not too big, this is a fine workaround.
I now run into another error:
File "/home/crezees/ESMValTool/esmvaltool/utils/cmorizers/obs/utilities.py", line 131, in save_variable
dates = reftime.num2date(cube_time.points[[0, -1]])
File "/net/exo/landclim/crezees/conda/envs/esmvaltool-public/lib/python3.6/site-packages/cf_units/__init__.py", line 1988, in num2date
cdf_utime = self.utime()
File "/net/exo/landclim/crezees/conda/envs/esmvaltool-public/lib/python3.6/site-packages/cf_units/__init__.py", line 1902, in utime
raise ValueError(emsg.format(interval))
ValueError: Time units with interval of "months", "years" (or singular of these) cannot be processed, got 'months'.
It has been reported before (https://github.com/ESMValGroup/ESMValTool/issues/516). For @schlunma it did work when using Iris v2.2.0
, but not for me. I use cf_units v2.0.2
. Any idea's what might go wrong here? (I will run the ESMValTests just to be sure that my installation is completely fine, keep you updated).
update
The tests are running fine...
@bascrezee since you have a monthly climatology you need to set only one reference year, in this case I would suggest to set 2010 (middle of climatological period). Only one file has to be generated.
Note that source should point to the exact download path of the data so
https://github.com/ESMValGroup/ESMValTool/blob/0b4ef0e7b1f124897a75981b0c82e47153742068/esmvaltool/utils/cmorizers/obs/cmor_config/Duveiller2018.yml#L9
should be reporting instead the nature download link https://ndownloader.figshare.com/files/9969496
or the amazon S3 archive full path https://s3-eu-west-1.amazonaws.com/pstorage-npg-968563215/9969496/albedo_IGBPgen.nc
Ok, thanks. But the start and end of the period should be included somehow as well, to describe the data correctly. I guess adding them to the global attributes makes sense?
Or in the filename?
time in filename so far matches with data content, so in this case the final cmorizes name should contain 201001-201012
. It may be a good idea to add it in the global attributes.
@bascrezee To solve the issue with time the you reported it would probably be better to use a callback
function when iris load the data to set the cube reference time and units
.
Thanks. This callback
works fine indeed. It now ran through :)
Now I am checking the file with recipe_check_obs.yml
:
# ESMValTool
# recipe_check_obs.yml
---
documentation:
description: |
Test recipe for OBS, no preprocessor or diagnostics are applied,
just to check correct reading of the CMORized data.
authors:
- righ_ma
preprocessors:
nopp:
extract_levels: false
regrid: false
mask_fillvalues: false
multi_model_statistics: false
diagnostics:
Duveiller2018:
description: Duveiller2018
variables:
albDiff:
preproc: nopp
mip: Amon
additional_datasets:
- {dataset: Duveiller2018, project: OBS, tier: 2, version: v2018, start_year: 2010, end_year: 2010, frequency: mon}
scripts: null
But I run into the following error. I do not fully understand the error message. It does not find the dataset
key, but it is specified in the recipe.
File "/home/crezees/ESMValTool/esmvaltool/_data_finder.py", line 117, in _replace_tags
"your recipe entry".format(tag, variable))
KeyError: "Dataset key type must be specified for {'preproc': 'nopp', 'mip': 'Amon', 'variable_group': 'albDiff', 'short_name': 'albDiff', 'diagnostic': 'Duveiller2018', 'preprocessor': 'default', 'dataset': 'Duveiller2018', 'project': 'OBS', 'tier': 2, 'version': 'v2018', 'start_year': 2010, 'end_year': 2010, 'frequency': 'mon', 'recipe_dataset_index': 0, 'cmor_table': 'OBS', 'standard_name': '', 'long_name': 'Difference in surface albedo for a given vegetation cover transition', 'units': '1', 'modeling_realm': ['atmos']}, check your recipe entry"
Branch: https://github.com/ESMValGroup/ESMValTool/tree/version2_development_cmorize_duveiller2018
the missing key is not dataset
but type
- if you look at the source code for the error:
raise KeyError("Dataset key {} must be specified for {}, check "
"your recipe entry".format(tag, variable))
(look at it next time :grin: )
Type can be eg type: reanalysis
but that depends on your data, dunno that :beer:
Oops.. :stuck_out_tongue_closed_eyes: Yes, I will look at the source code next time.
My dataset has a non-standard dimension called vegetation_transition_code
. So I added this to the file CMOR_albDiff.dat
:
dimensions: longitude latitude time vegetation_transition_code
But I run into the following error:
File "/home/crezees/ESMValTool/esmvaltool/cmor/table.py", line 648, in _read_table_file
table[value] = self._read_variable(value, None)
File "/home/crezees/ESMValTool/esmvaltool/cmor/table.py", line 520, in _read_variable
var.coordinates[dim] = self.coords[dim]
KeyError: 'vegetation_transition_code'
It seems as if I still need to define this dimension somewhere. Maybe @jvegasbsc can help me, since I noted that CMOR_clisccp.dat
includes a non-standard dimension named tau
.
@bascrezee You need to add the information about your new axis vegetation_transition_code
in CMOR_coordinates.dat
, following the structure of the already available non-standard dimension.
Interestingly, whereas for all custom variable definitions we leave the standard_name blank, but not for the CMOR_coordinates.dat file. Do you have any idea why? @jvegasbsc
At least in case of the derived variables I created the reason was, that the standard name in the variable definition hat to be in the list in IRIS std_names.py. Else You get an error. To remove that the easiest way is to leave the standard name blank in the derived variable file.
I picked up this work again today, after moving around some files due to the split into tool/core I got back to the stage where I was. The script runs through, but the CMORize checker is not happy yet.
esmvalcore.cmor.check.CMORCheckError: There were errors in variable albDiff:
iTr: standard_name should be , not None
time: Frequency mon does not match input data
albDiff: does not match coordinate rank
in cube:
Difference in surface albedo for a given vegetation cover transition / (1) (Vegetation transition code: 6; time: 12; latitude: 180; longitude: 360)
Dimension coordinates:
Vegetation transition code x - - -
time - x - -
latitude - - x -
longitude - - - x
Attributes:
Conventions: CF-1.5
climatology_end: 2012-12-31T23:59:59Z
climatology_start: 2008-01-01T00:00:00Z
comment:
host: exo
mip: Amon
modeling_realm: clim
project_id: custom
reference: Duveiller, G., J. Hooker, A. Cescatti, Scientific Data 5, 180014 (2018...
source: https://ndownloader.figshare.com/files/9969496
source_file: /net/exo/landclim/PROJECTS/C3S/datadir/obsdir/Tier2/Duveiller2018/OBS_...
tier: 2
title: Duveiller2018 data reformatted for ESMValTool v2.0a2
user: crezees
version: v2018
I hope to tackle them one-by-one.
iTr: standard_name should be , not None
In the CMOR_coordinates.dat
I left standard name blank, as usual for custom variables.
time: Frequency mon does not match input data
Is it possible that the CMOR checker does not know how to handle climatological data? See also the discussion above?
albDiff: does not match coordinate rank
Hope this goes away as soon as iTr has been fixed, maybe it is related to that one.
Branches:
landvariables
[core repository ; for custom CMIP definitions]
version2_development_cmorize_duveiller2018
[public repository ; cmorize scripts ]
Any help is appreciated !
Please ask @jvegasbsc for CMOR related issues
@jvegasbsc any thoughts on this?
I tried two options of fixing the custom coordinate, but both fail, see the comments in the code below. Is it possible that the CMOR checker fails in parsing correctly a custom defined CMOR coordinate? I think I am the first one adding a coordinate that does not have a valid standard name.
for cube in cubes:
if cube.var_name == rawvar:
for cubecoord in cube.coords():
if cubecoord.var_name=='iTr':
# cubecoord.standard_name = None # CMOR checker raises: iTr: standard_name should be , not None
cubecoord.standard_name = '' # this script raises: ValueError: '' is not a valid standard_name
I can trace back the error to being raised in l. 104 (permalink does not embed because it's a different repository?), so it must be one of the checks before that fail.
Update: I decided to extract a certain vegetation cover transition code, after which this is not a dimension coordinate any more. This is a fine workaround for my case. But it might still be good to check if the CMOR checker allows for custom 'non-valid standard name' coordinate names.
Update: Time has been solved as well. CMORization done. Thanks for the support, especially to @tomaslovato. I will submit a PR early next week.
I am building on cmorize_obs_Landschuetzer2016.py to cmorize another obs dataset. This is my yml file:
Content of Duveiller2018.yml
I follow very closely the script by @tomaslovato . Also the variable 'alb' is defined in the custom tables. However, I run into the following error:
I tried to further trace down the problem, but at some stage I got lost. See my own attempt of a traceback below, if it helps.
Is it correct that I take CMIP5 as a project ID? Or should it indicate 'custom' since this is a custom variable? Any ideas on what goes wrong here?