ESMValGroup / ESMValCore

ESMValCore: A community tool for pre-processing data from Earth system models in CMIP and running analysis scripts.
https://www.esmvaltool.org
Apache License 2.0
42 stars 38 forks source link

Dataset problem: CORDEX missing bounds #1673

Open thomascrocker opened 2 years ago

thomascrocker commented 2 years ago

Describe the dataset issue PR #184 (from 3 years ago) attempts to resolve a number of seperate issues with CORDEX models, as a result it's a fairly big PR and merging isn't straightforward. Instead, as suggested here https://github.com/ESMValGroup/ESMValCore/pull/184#issuecomment-1156401084 I'm creating issues addressing some of the problems individually which should be a more appropriate approach to fixing them. (Via smaller more focused PRs)

This first one addresses the fact that many of the CORDEX models on ESGF are missing bounds data on either one or both of their native grid coordinates or multidimensional regular latitude and longitude coordinates. This lack of bounds means that regridding is not possible. It also means that calculating area weighted statistics is not possible without an appropriate areacella fx file. However, even if these are present, using them in ESMValTool is non-trivial unless #1081 is resolved, perhaps via #1082

See example simple recipe for regridding:

documentation:
  title: Demo recipe 

  description:
    What if I want to regrid a regional model file?

  authors:
    - predoi_valeriu

datasets:
# Rotated pole model
- {institute: SMHI, driver: MOHC-HadGEM2-ES, dataset: RCA4, project: CORDEX, ensemble: r1i1p1, mip: mon, rcm_version: v1, domain: EUR-11}

preprocessors:
  preproc:
    regrid:
      target_grid:
        start_latitude: 50
        end_latitude: 60
        start_longitude: 0
        end_longitude: 10
        step_latitude: 1
        step_longitude: 1
      scheme: linear

diagnostics:
  temp:
    description: Regrid some data
    variables:
      tas:
        start_year: 1981
        end_year: 2000
        exp: historical
        preprocessor: preproc
    scripts: null

Results in error due to lack of bounds:

File "/opt/scitools/conda/environments/esmvaltool-2.5.0/lib/python3.9/site-packages/esmvalcore/preprocessor/_regrid_esmpy.py", line 112, in is_lon_circular
    seam = (lon.bounds[1:-1, -1, (1, 2)]
TypeError: 'NoneType' object is not subscriptable

ncdump:

$ ncdump -h /project/champ/data/cordex/output/EUR-11/SMHI/MOHC-HadGEM2-ES/historical/r1i1p1/RCA4/v1/mon/tas/v20131026/tas_EUR-11_MOHC-HadGEM2-ES_historical_r1i1p1_SMHI-RCA4_v1_mon_200101-200512.nc
netcdf tas_EUR-11_MOHC-HadGEM2-ES_historical_r1i1p1_SMHI-RCA4_v1_mon_200101-200512 {
dimensions:
    time = UNLIMITED ; // (60 currently)
    bnds = 2 ;
    rlon = 424 ;
    rlat = 412 ;
variables:
    double time(time) ;
        time:axis = "T" ;
        time:bounds = "time_bnds" ;
        time:calendar = "360_day" ;
        time:long_name = "time" ;
        time:standard_name = "time" ;
        time:units = "days since 1949-12-01 00:00:00" ;
    double lon(rlat, rlon) ;
        lon:standard_name = "longitude" ;
        lon:long_name = "longitude" ;
        lon:units = "degrees_east" ;
    double lat(rlat, rlon) ;
        lat:standard_name = "latitude" ;
        lat:long_name = "latitude" ;
        lat:units = "degrees_north" ;
    double time_bnds(time, bnds) ;
        time_bnds:units = "days since 1949-12-01 00:00:00" ;
        time_bnds:calendar = "360_day" ;
    float tas(time, rlat, rlon) ;
        tas:standard_name = "air_temperature" ;
        tas:long_name = "Near-Surface Air Temperature" ;
        tas:units = "K" ;
        tas:coordinates = "lon lat height" ;
        tas:_FillValue = 1.e+20f ;
        tas:missing_value = 1.e+20f ;
        tas:cell_methods = "time: mean" ;
        tas:grid_mapping = "rotated_pole" ;
    double rlon(rlon) ;
        rlon:standard_name = "grid_longitude" ;
        rlon:long_name = "longitude in rotated pole grid" ;
        rlon:units = "degrees" ;
        rlon:axis = "X" ;
    double rlat(rlat) ;
        rlat:standard_name = "grid_latitude" ;
        rlat:long_name = "latitude in rotated pole grid" ;
        rlat:units = "degrees" ;
        rlat:axis = "Y" ;
    char rotated_pole ;
        rotated_pole:grid_mapping_name = "rotated_latitude_longitude" ;
        rotated_pole:grid_north_pole_latitude = 39.25 ;
        rotated_pole:grid_north_pole_longitude = -162. ;
    double height ;
        height:axis = "Z" ;
        height:long_name = "height" ;
        height:positive = "up" ;
        height:standard_name = "height" ;
        height:units = "m" ;

// global attributes:
        :Conventions = "CF-1.4" ;
        :institution = "Swedish Meteorological and Hydrological Institute, Rossby Centre" ;
        :contact = "rossby.cordex@smhi.se" ;
        :creation_date = "2013-07-03-T19:58:14Z" ;
        :experiment = "historical" ;
        :experiment_id = "historical" ;
        :driving_experiment = "MOHC-HadGEM2-ES, historical, r1i1p1" ;
        :driving_model_id = "MOHC-HadGEM2-ES" ;
        :driving_model_ensemble_member = "r1i1p1" ;
        :driving_experiment_name = "historical" ;
        :frequency = "mon" ;
        :institute_id = "SMHI" ;
        :model_id = "SMHI-RCA4" ;
        :rcm_version_id = "v1" ;
        :project_id = "CORDEX" ;
        :CORDEX_domain = "EUR-11" ;
        :product = "output" ;
        :references = "http://www.smhi.se/en/Research/Research-departments/climate-research-rossby-centre" ;
        :tracking_id = "246cfe5b-078e-46e9-8526-f64413c381c3" ;
        :rossby_comment = "201131: CORDEX Europe 0.11 deg | RCA4 v1 | MOHC-HadGEM2-ES | r1i1p1 | historical | L40" ;
        :rossby_run_id = "201131" ;
        :rossby_grib_path = "/nobackup/rossby15/rossby/joint_exp/cordex/201131/raw/" ;
}
sloosvel commented 2 years ago

FYI @pepcos , EUR-11 is just the region we need haha

sloosvel commented 2 years ago

As such I'm not sure if the most appropriate fix is individual CMOR modules for each of the modules that uses duplicate code, it might be good to introduce a standard check / fix for this issue for all files from the CORDEX project

I don't think there will be an issue with duplicated code, it is quite common to import "base fixes" between models and variables that share the same issue (icon fixes do it, common fixes for cmip6 are kept in a separate file and are then called in several individual models, different resolutions of the same model share fixes as well)

So creating a directory for cordex fixes, adding a file for common procedures that may be needed by multiple models, such as adding bounds, and then import those fixes when you detect that a model/variable needs them, would be a good approach.

nhsavage commented 2 years ago

So creating a directory for cordex fixes, adding a file for common procedures that may be needed by multiple models, such as adding bounds, and then import those fixes when you detect that a model/variable needs them, would be a good approach.

So in summary you think we should:

  1. make a new folder _fixes/cordex as done on #184
  2. create a module common_cordex.py in that folder for fixes which will be used across CORDEX models (or perhaps just add to common.py in the folder above) - most of this code can be reused from #184
  3. for each model we want to use write a module model_name.py in the CORDEX folder and also add some tests for this to ensure that test coverage doesn't fall as a result of adding the code

This would mean adding a lot of extra fixes as there are a lot of models used in CORDEX (59) and almost all have problems. Of course some are more important than others. So for ALADIN64 there are only 6 files while for RegCM4-7 there are 14,780

I am concerned that if we go down this route, using 1 ticket per model then it will take a long time to get to a reasonable coverage of the archive - even if we prioritise.

sloosvel commented 2 years ago

I mean it's the same as in CMIP6, there are even more models and most of them have issues (some more complex than others, some are common, etc). At the end the fixing for CMIP6 happens a bit on the go, people fix what they need and as time goes by the pool of fixed models increases. No one aims at fixing all CMIP6 in one sitting. I am not as familiar with CORDEX data and its issues, but for CMIP6 it's impossible to guess what will be wrong so it's difficult to avoid the per model fix approach.

pepcos commented 2 years ago

This branch, which was never merged, might have some fixes for CORDEX EUR-11 interesting to you: https://github.com/ESMValGroup/ESMValCore/tree/working_cordex_2.2/esmvalcore/cmor/_fixes/cordex

sloosvel commented 2 years ago

https://cordex.org/wp-content/uploads/2012/11/CORDEX-domain-description_231015.pdf http://is-enes-data.github.io/cordex_archive_specifications.pdf

sloosvel commented 2 years ago

I think the root of the issue is that the CMOR checker does not check the rotated coordinates because they are defined in the grids table, which is not read. The bounds for these coordinates are not guessed in the checker because the variables definition does consider lat/lon to be the coordinates And then the guessing of the bounds cannot happen for the geographic coordinates because guess_bounds does not work with 2D coordinates.

With that I think that the project level fix in #184, should be reworked a little bit, to make it work both as a "checker" and a set of fixes. The idea would be to first read the CORDEX_grids table in order to load the CMOR information on the rotated coordinates

def fix_metadata 
        table = CMOR_TABLES['CORDEX']
        cmor_folder = table._cmor_folder
        table._load_table(os.path.join(cmor_folder, 'CORDEX_grids'))
...

For what I understand, CORDEX data can be outputed in x/y (+ 2D lat/lon as aux coordinates), rlat/rlon (+ 2D lat/lon as aux coordinates) or directly in a regular grid ( 1D lat/lon as dim coordinates), so the checking/fixing could be split in that way:

...
        if isinstance(coord_sys, iris.coord_systems.LambertConformal):
            self.fix_lambert_conformal(cube, table)

        elif isinstance(coord_sys, iris.coord_systems.RotatedGeogCS):
            self.fix_rotated(cube, table)

        elif isinstance(coord_sys, iris.coord_systems.GeogCS):
            # coordinates will be fixed during the CMOR checks, because they are regular 1D lat/lon
            return cubes

        else:
            emsg = 'Unknown coordinate system, got {!r}.'
            raise NotImplementedError(
                emsg.format(type(coord_sys)))
...

And then the fixes for each one of the coordinate systems should include a check/fix on the metadata that is loaded from the CMOR information:

    def fix_grid_metadata(coord, cmor_info):
        if coord.standard_name != cmor_info.standard_name:
            coord.standard_name = cmor_info.standard_name
        if coord.long_name != cmor_info.long_name:
            coord.long_name = cmor_info.long_name
        if coord.units != cmor_info.units:
            coord.convert_units(cf_units.Unit(coord.units))
        if not coord.has_bounds():
            guess_bounds(coord)

As well as a routine to guess the bounds for the lat/lon coordinates from the bounds of either rlat/rlon and x/y (based on what is done in #184):

    def guess_bounds_from_lambert(self, cube):
    ...
        ccrs_global = global_coord_sys.as_cartopy_crs()
        xyz = ccrs_global.transform_points(
        grid_coord_sys.as_cartopy_crs(), glon, glat)
        lon_bnds = xyz[:, :, 0]
        lat_bnds = xyz[:, :, 1]
   ... # + reshaping of the bounds

    def guess_bounds_from_rotated(self, cube):
    ....
        grid_coord_sys = x_coord.coord_system
        lon_bnds, lat_bnds = iris.analysis.cartography.unrotate_pole(
            glon, glat,
            grid_coord_sys.grid_north_pole_longitude,
            grid_coord_sys.grid_north_pole_latitude
            )
    ... # + reshaping of the bounds

184 also contains some further fixes in the Lambert Conformal coordinate system that could be added in the fix_lambert_conformal routine. It also contains a routine to guess lat/lon from the native coordinates in case they do not exist, but for what I have seen this may not be needed because the only files I have run into that do not contain lat/lon are and old version of a dataset, that got updated so that the coordinates are included. But I may be wrong.

Additionally, in the CMOR checks, the coordinates get their (max/min) values and monotonicity checked. Maybe it would be worth it to include such checks in the project level fix, that would check for each one of the domains the values of the coordinates (since the definition of the span of the domains is available https://cordex.org/wp-content/uploads/2012/11/CORDEX-domain-description_231015.pdf). Other general fixes related to the grid could also be included in the project level fix.

And for other type of fixes, I think it would be easier to fix them as they are found at a domain/driver/dataset level.