ESMValGroup / ESMValCore

ESMValCore: A community tool for pre-processing data from Earth system models in CMIP and running analysis scripts.
https://www.esmvaltool.org
Apache License 2.0
42 stars 38 forks source link

compatibility of ERA5 pressure level variable(s) #1136

Closed fserva closed 3 years ago

fserva commented 3 years ago

Hello all, I would need to process (six) hourly ERA5 data in order to prepare input for the zmnam recipe, which requires daily means geopotential height.

I've tried to process it by using https://github.com/ESMValGroup/ESMValTool/blob/master/esmvaltool/recipes/cmorizers/recipe_era5.yml, however an error is raised when building the cube: ERROR [15512] Failed to run fix_metadata([<iris 'Cube' of geopotential / (m**2 s**-2) (time: 124; pressure_level: 19; latitude: 721; longitude: 1440)>, <iris 'Cube' of geopotential / (m**2 s**-2) (time: 124; pressure_level: 19; latitude: 721; longitude: 1440)>, [...], <iris 'Cube' of geopotential / (m**2 s**-2) (time: 124; pressure_level: 19; latitude: 721; longitude: 1440)>], {'project': 'native6', 'dataset': 'ERA5', 'short_name': 'zg', 'mip': '6hrPlev', 'frequency': '6hr', 'check_level': <CheckLevels.DEFAULT: 3>}), likely due to iris This seems to be an iris problem iris.exceptions.CoordinateNotFoundError: 'Expected to find exactly 1 coordinate, but found none.'

I believe this may be due to the missing standard_names, see the header of the raw files:

dimensions:
    longitude = 1440 ;
    latitude = 721 ;
    level = 19 ;
    time = 124 ;
variables:
    float longitude(longitude) ;
        longitude:units = "degrees_east" ;
        longitude:long_name = "longitude" ;
    float latitude(latitude) ;
        latitude:units = "degrees_north" ;
        latitude:long_name = "latitude" ;
    int level(level) ;
        level:units = "hPa" ;
        level:long_name = "pressure_level" ;
    int time(time) ;
        time:units = "hours since 1900-01-01 00:00:00.0" ;
        time:long_name = "time" ;
        time:calendar = "gregorian" ;
    short z(time, level, latitude, longitude) ;
        z:scale_factor = 7.61496044931561 ;
        z:add_offset = 244700.965957275 ;
        z:_FillValue = -32767s ;
        z:missing_value = -32767s ;
        z:units = "m**2 s**-2" ;
        z:long_name = "Geopotential" ;
        z:standard_name = "geopotential" ;

Maybe assigning default attributes would be enough? See e.g. https://groups.google.com/g/scitools-iris/c/9IwC2Rr7xm

Please note the discussion started in: https://github.com/ESMValGroup/ESMValTool/issues/1909

If anyone succeeded cmorizing ERA5 pressure level data, any hint would be useful.

FYI @remi-kazeroni @zklaus @bouweandela

bouweandela commented 3 years ago

Could you please post the complete stack trace? Or attach the file run/main_log_debug.txt?

fserva commented 3 years ago

Here it is, thanks. main_log_debug.txt

bouweandela commented 3 years ago

I tried to reproduce this with hourly zg and it looks like the crash is due to the axis attribute being set to the empty string, because loading the variable definition from the CMOR table, it picks up zg with an alevel generic coordinate instead of the usual vertical pressure coordinate. This happens because the variable is not present in the E1hr table, so then it searches the other mip tables and ends up picking the wrong one.

@jvegasbsc Could you look into this? We would need some way to select the correct variable from the CMOR table, i.e. one that has e.g. "dimensions": "longitude latitude plev19 time" instead of "dimensions": "longitude latitude alevel time".

Here is an example recipe:


datasets:
  - {dataset: ERA5, project: native6, tier: 3, type: reanaly, version: 1}

diagnostics:

  timeseries:
    description: zg
    variables:
      zg:
        mip: E1hr
        frequency: 1hrPt
        start_year: 1990
        end_year: 1990
    scripts: null
schlunma commented 3 years ago

This is described in #1029 and a possible solution is proposed in #1032.

bouweandela commented 3 years ago

That looks like a different but related problem. In this case, zg occurs in the CMIP6 CMOR tables with both a generic and a non-generic vertical coordinate, depending on the table. To solve the problem here, it would be sufficient to be able to just specify that you want the version with the non-generic coordinate.

I realized that this is actually already possible, but it's a bit counter intuitive. The following recipe seems to work for me with ERA5 hourly geopotential height on pressure levels. Could you give it a try @fserva? Make sure to replace the frequency with the frequency that you actually need.

datasets:
  - {dataset: ERA5, project: native6, tier: 3, type: reanaly, version: 1}

diagnostics:

  timeseries:
    description: test zg
    variables:
      zg:
        mip: Amon
        frequency: 1hrPt
        start_year: 1990
        end_year: 1990
    scripts: null
fserva commented 3 years ago

Thanks @bouweandela, I've tried your recipe and I needed to do a change adding a documentation section, otherwise it stops immediately.

When using the data I have, I got an error on the time axis ValueError: invalid day number provided in cftime.DatetimeGregorian(2000, 6, 31, 0, 0, 0, 0), which is not present when using cdo (but it makes sense since 31 June does not exist). This seems caused by _fix_monthly_time_coord.

Suggestions or other issues I should consider? Thanks

fserva commented 3 years ago

I've tried various mip and frequency (in my case probably 6hrPt is the correct one?), but that did not help

bouweandela commented 3 years ago

It looks like that happens because we were only aware of hourly and monthly ERA5 data when we wrote the fix file. It sees that the timestep is more than an hour, so it wrongly assumes that it is monthly. How can I download the six hourly data?

in my case probably 6hrPt is the correct one?

Yes, I think so.

fserva commented 3 years ago

Yes, good point. In fact the highest frequency you can get is hourly, but through the CDS API and the web form one can select specific hours. In this case I took 6-hourly to save some space, but of course I can start from the hourly.

Maybe not a problem, but I see that above you mentioned plev19, while ERA5 provides 37 levels as far as I recall (and also in this case, I sub-selected some levels of interest).

bouweandela commented 3 years ago

That the pressure levels are different than in the CMOR tables is not a problem, this is almost always the case for non-CMIP data, so long as the coordinate is sufficiently similar it should work.

If you are working on some institutional machine (e.g. Jasmin), you might be able to use ERA5 data that is already available instead of downloading it yourself, see also ESMValGroup/ESMValTool#2183.

fserva commented 3 years ago

Thanks Bouwe, sorry for the long time to respond.

Now I have access also to the space esmeval on JASMIN; would you suggest to use the default install (https://docs.esmvaltool.org/projects/esmvalcore/en/latest/quickstart/install.html#pre-installed-versions-on-hpc-clusters) or would it be better to make a new local one (since I understand there are changes in the core packages https://github.com/ESMValGroup/ESMValTool/issues/2198)?

fserva commented 3 years ago

Hello @bouweandela from what I can see in /gws/nopw/j04/esmeval/obsdata-v2/Tier3/ERA5/ there is only precipitation. I do not have access to Mistral, so I guess I will simply download the files in their native format for testing.

bouweandela commented 3 years ago

Apparently there is not enough space on the group workspace on Jasmin, but there is ERA5 data available in another directory, see here for some recent discussion on the topic: https://github.com/ESMValGroup/ESMValCore/issues/1246.

fserva commented 3 years ago

Ok, I see.

In the Jasmin disks there are surface variables and some model level variables under /badc/ecmwf-era5/data/oper/an_ml/, but the geopotential there is the surface orography short z(time, latitude, longitude). If I well recall it is possible to derive geopotential on pressure levels but that is maybe be too complex.

I have some hourly data for geopotential on pressure levels and I am planning to use those.

zklaus commented 3 years ago

It may also be possible to augment the data available on jasmin by asking the right people there. For that, it will be very useful to consult the ERA5 documentation to specify exactly which data one is interested in.

fserva commented 3 years ago

I made some test with https://github.com/ESMValGroup/ESMValCore/issues/1136#issuecomment-850216172

The run finished correctly, even if no file was produced (I guess it is normal). There were some warnings

WARNING There were warnings in variable zg:
plev: does not contain 100000.0 Pa
 plev: does not contain 70000.0 Pa
 plev: does not contain 25000.0 Pa

related to the fact that units of data downloaded from the CDS are hPa. Maybe they can be changed manually in advance, or there is a better procedure?

In the original files I have level = 10, 30, 50, 100, 500, 850 ;

bouweandela commented 3 years ago

Files are produced, but deleted after a successful run. You can set remove_preproc_dir: false if you want to keep the preprocessed files.

The warning is not about the units, but it tells you that some pressure levels that are mandatory for the zg variable's plev coordinate in the CMOR table are not present, this is only to be expected because this is not CMIP data.

fserva commented 3 years ago

Thanks Bouwe, your tip worked and the file is produced; levels are converted to Pa as for CMOR standard. I was trying to obtain the daily means, so I took some definition from the cmorizer:

datasets:
  - {dataset: ERA5, project: native6, tier: 3, type: reanaly, version: 1}

preprocessors:
  daily_mean:
    daily_statistics:
      operator: mean

diagnostics:
  timeseries:
    description: test zg
    variables:
      zg:
        preprocessor: daily_mean
        mip: day
        frequency: 1hrPt
        start_year: 2006
        end_year: 2006
    scripts: null

Update: This worked (note I did not include add_one_day!

Note that mip: day, using mip: E1hr (table for hourly extra variables) failed.

bouweandela commented 3 years ago

Note that mip: day, using mip: E1hr (table for hourly extra variables) failed.

It probably doesn't matter too much what mip you choose, but to signify your intent I think mip: E1hr would make most sense, because the variable specifies the input data. What was the error message?

fserva commented 3 years ago

Yes, I also think so. It's iris complaining about the coordinates (not new likely)...

2021-08-06 14:24:56,052 UTC [499434] ERROR   Failed to run fix_metadata([<iris 'Cube' of geopotential / (m**2 s**-2) (time: 8760; air_pressure: 6; latitude: 181; longitude: 360)>], {'preprocessor': 'daily_mean', 'mip': 'E1hr', 'frequency': '1hrPt', 'start_year': 2006, 'end_year': 2006, 'variable_group': 'zg', 'short_name': 'zg', 'diagnostic': 'timeseries', 'dataset': 'ERA5', 'project': 'native6', 'tier': 3, 'type': 'reanaly', 'version': 1, 'recipe_dataset_index': 0, 'alias': 'ERA5', 'original_short_name': 'zg', 'standard_name': 'geopotential_height', 'long_name': 'Geopotential Height', 'units': 'm', 'modeling_realm': ['atmos'], 'filename': '/work/users/serva/esmvaltool_output/recipe_cmor_era5_20210806_142455/preproc/timeseries/zg/native6_ERA5_reanaly_1_E1hr_zg_2006-2006.nc', 'check_level': <CheckLevels.DEFAULT: 3>})
2021-08-06 14:24:56,980 UTC [499434] INFO    Maximum memory used (estimate): 0.0 GB
2021-08-06 14:24:56,981 UTC [499434] INFO    Sampled every second. It may be inaccurate if short but high spikes in memory consumption occur.
2021-08-06 14:24:56,981 UTC [499434] ERROR   Program terminated abnormally, see stack trace below for more information:
Traceback (most recent call last):
  File "/work/users/serva/miniconda3/envs/esmval202107/lib/python3.9/site-packages/esmvalcore/_main.py", line 433, in run
    fire.Fire(ESMValTool())
  File "/work/users/serva/miniconda3/envs/esmval202107/lib/python3.9/site-packages/fire/core.py", line 141, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "/work/users/serva/miniconda3/envs/esmval202107/lib/python3.9/site-packages/fire/core.py", line 466, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
  File "/work/users/serva/miniconda3/envs/esmval202107/lib/python3.9/site-packages/fire/core.py", line 681, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "/work/users/serva/miniconda3/envs/esmval202107/lib/python3.9/site-packages/esmvalcore/_main.py", line 410, in run
    process_recipe(recipe_file=recipe, config_user=cfg)
  File "/work/users/serva/miniconda3/envs/esmval202107/lib/python3.9/site-packages/esmvalcore/_main.py", line 104, in process_recipe
    recipe.run()
  File "/work/users/serva/miniconda3/envs/esmval202107/lib/python3.9/site-packages/esmvalcore/_recipe.py", line 1434, in run
    self.tasks.run(max_parallel_tasks=self._cfg['max_parallel_tasks'])
  File "/work/users/serva/miniconda3/envs/esmval202107/lib/python3.9/site-packages/esmvalcore/_task.py", line 674, in run
    self._run_sequential()
  File "/work/users/serva/miniconda3/envs/esmval202107/lib/python3.9/site-packages/esmvalcore/_task.py", line 685, in _run_sequential
    task.run()
  File "/work/users/serva/miniconda3/envs/esmval202107/lib/python3.9/site-packages/esmvalcore/_task.py", line 252, in run
    self.output_files = self._run(input_files)
  File "/work/users/serva/miniconda3/envs/esmval202107/lib/python3.9/site-packages/esmvalcore/preprocessor/__init__.py", line 482, in _run
    product.apply(step, self.debug)
  File "/work/users/serva/miniconda3/envs/esmval202107/lib/python3.9/site-packages/esmvalcore/preprocessor/__init__.py", line 351, in apply
    self.cubes = preprocess(self.cubes, step, **self.settings[step])
  File "/work/users/serva/miniconda3/envs/esmval202107/lib/python3.9/site-packages/esmvalcore/preprocessor/__init__.py", line 295, in preprocess
    result.append(_run_preproc_function(function, items, settings))
  File "/work/users/serva/miniconda3/envs/esmval202107/lib/python3.9/site-packages/esmvalcore/preprocessor/__init__.py", line 281, in _run_preproc_function
    return function(items, **kwargs)
  File "/work/users/serva/miniconda3/envs/esmval202107/lib/python3.9/site-packages/esmvalcore/cmor/fix.py", line 114, in fix_metadata
    cube_list = fix.fix_metadata(cube_list)
  File "/work/users/serva/miniconda3/envs/esmval202107/lib/python3.9/site-packages/esmvalcore/cmor/_fixes/native6/era5.py", line 377, in fix_metadata
    cube = self._fix_coordinates(cube)
  File "/work/users/serva/miniconda3/envs/esmval202107/lib/python3.9/site-packages/esmvalcore/cmor/_fixes/native6/era5.py", line 329, in _fix_coordinates
    coord = cube.coord(axis=axis)
  File "/work/users/serva/miniconda3/envs/esmval202107/lib/python3.9/site-packages/iris/cube.py", line 1829, in coord
    raise iris.exceptions.CoordinateNotFoundError(msg)
iris.exceptions.CoordinateNotFoundError: 'Expected to find exactly 1  coordinate, but found none.'
bouweandela commented 3 years ago

Ok, yes, I forgot about that. This is the problem from the top post and it can indeed be solved by picking a table that does contain zg variable with the right coordinates https://github.com/ESMValGroup/ESMValCore/issues/1136#issuecomment-850216172.

fserva commented 3 years ago

So the mip: day is a good choice in this context. For my needs the problem is then solved, I'll let you close the issue in case. Thanks all for the tips!