Open thomascrocker opened 2 years ago
FYI @pepcos , EUR-11 is just the region we need haha
As such I'm not sure if the most appropriate fix is individual CMOR modules for each of the modules that uses duplicate code, it might be good to introduce a standard check / fix for this issue for all files from the CORDEX project
I don't think there will be an issue with duplicated code, it is quite common to import "base fixes" between models and variables that share the same issue (icon fixes do it, common fixes for cmip6 are kept in a separate file and are then called in several individual models, different resolutions of the same model share fixes as well)
So creating a directory for cordex fixes, adding a file for common procedures that may be needed by multiple models, such as adding bounds, and then import those fixes when you detect that a model/variable needs them, would be a good approach.
So creating a directory for cordex fixes, adding a file for common procedures that may be needed by multiple models, such as adding bounds, and then import those fixes when you detect that a model/variable needs them, would be a good approach.
So in summary you think we should:
This would mean adding a lot of extra fixes as there are a lot of models used in CORDEX (59) and almost all have problems. Of course some are more important than others. So for ALADIN64 there are only 6 files while for RegCM4-7 there are 14,780
I am concerned that if we go down this route, using 1 ticket per model then it will take a long time to get to a reasonable coverage of the archive - even if we prioritise.
I mean it's the same as in CMIP6, there are even more models and most of them have issues (some more complex than others, some are common, etc). At the end the fixing for CMIP6 happens a bit on the go, people fix what they need and as time goes by the pool of fixed models increases. No one aims at fixing all CMIP6 in one sitting. I am not as familiar with CORDEX data and its issues, but for CMIP6 it's impossible to guess what will be wrong so it's difficult to avoid the per model fix approach.
This branch, which was never merged, might have some fixes for CORDEX EUR-11 interesting to you: https://github.com/ESMValGroup/ESMValCore/tree/working_cordex_2.2/esmvalcore/cmor/_fixes/cordex
I think the root of the issue is that the CMOR checker does not check the rotated coordinates because they are defined in the grids
table, which is not read. The bounds for these coordinates are not guessed in the checker because the variables definition does consider lat/lon to be the coordinates And then the guessing of the bounds cannot happen for the geographic coordinates because guess_bounds does not work with 2D coordinates.
With that I think that the project level fix in #184, should be reworked a little bit, to make it work both as a "checker" and a set of fixes. The idea would be to first read the CORDEX_grids
table in order to load the CMOR information on the rotated coordinates
def fix_metadata
table = CMOR_TABLES['CORDEX']
cmor_folder = table._cmor_folder
table._load_table(os.path.join(cmor_folder, 'CORDEX_grids'))
...
For what I understand, CORDEX data can be outputed in x/y (+ 2D lat/lon as aux coordinates), rlat/rlon (+ 2D lat/lon as aux coordinates) or directly in a regular grid ( 1D lat/lon as dim coordinates), so the checking/fixing could be split in that way:
...
if isinstance(coord_sys, iris.coord_systems.LambertConformal):
self.fix_lambert_conformal(cube, table)
elif isinstance(coord_sys, iris.coord_systems.RotatedGeogCS):
self.fix_rotated(cube, table)
elif isinstance(coord_sys, iris.coord_systems.GeogCS):
# coordinates will be fixed during the CMOR checks, because they are regular 1D lat/lon
return cubes
else:
emsg = 'Unknown coordinate system, got {!r}.'
raise NotImplementedError(
emsg.format(type(coord_sys)))
...
And then the fixes for each one of the coordinate systems should include a check/fix on the metadata that is loaded from the CMOR information:
def fix_grid_metadata(coord, cmor_info):
if coord.standard_name != cmor_info.standard_name:
coord.standard_name = cmor_info.standard_name
if coord.long_name != cmor_info.long_name:
coord.long_name = cmor_info.long_name
if coord.units != cmor_info.units:
coord.convert_units(cf_units.Unit(coord.units))
if not coord.has_bounds():
guess_bounds(coord)
As well as a routine to guess the bounds for the lat/lon coordinates from the bounds of either rlat/rlon and x/y (based on what is done in #184):
def guess_bounds_from_lambert(self, cube):
...
ccrs_global = global_coord_sys.as_cartopy_crs()
xyz = ccrs_global.transform_points(
grid_coord_sys.as_cartopy_crs(), glon, glat)
lon_bnds = xyz[:, :, 0]
lat_bnds = xyz[:, :, 1]
... # + reshaping of the bounds
def guess_bounds_from_rotated(self, cube):
....
grid_coord_sys = x_coord.coord_system
lon_bnds, lat_bnds = iris.analysis.cartography.unrotate_pole(
glon, glat,
grid_coord_sys.grid_north_pole_longitude,
grid_coord_sys.grid_north_pole_latitude
)
... # + reshaping of the bounds
fix_lambert_conformal
routine. It also contains a routine to guess lat/lon from the native coordinates in case they do not exist, but for what I have seen this may not be needed because the only files I have run into that do not contain lat/lon are and old version of a dataset, that got updated so that the coordinates are included. But I may be wrong.Additionally, in the CMOR checks, the coordinates get their (max/min) values and monotonicity checked. Maybe it would be worth it to include such checks in the project level fix, that would check for each one of the domains the values of the coordinates (since the definition of the span of the domains is available https://cordex.org/wp-content/uploads/2012/11/CORDEX-domain-description_231015.pdf). Other general fixes related to the grid could also be included in the project level fix.
And for other type of fixes, I think it would be easier to fix them as they are found at a domain/driver/dataset level.
Describe the dataset issue PR #184 (from 3 years ago) attempts to resolve a number of seperate issues with CORDEX models, as a result it's a fairly big PR and merging isn't straightforward. Instead, as suggested here https://github.com/ESMValGroup/ESMValCore/pull/184#issuecomment-1156401084 I'm creating issues addressing some of the problems individually which should be a more appropriate approach to fixing them. (Via smaller more focused PRs)
This first one addresses the fact that many of the CORDEX models on ESGF are missing bounds data on either one or both of their native grid coordinates or multidimensional regular latitude and longitude coordinates. This lack of bounds means that regridding is not possible. It also means that calculating area weighted statistics is not possible without an appropriate areacella fx file. However, even if these are present, using them in ESMValTool is non-trivial unless #1081 is resolved, perhaps via #1082
Data file has been changed by you in any way (if you answer yes the issue will be void and closed, we are not supporting custom-made datasets that present problems, it is your resposability to fix those problems): NO
Project (CMIP5/CMIP6/OBS/obs4MIPs/etc): CORDEX
Full dataset description (fill out as many as you know, please): tas:
{institute: SMHI, driver: MOHC-HadGEM2-ES, dataset: RCA4, project: CORDEX, ensemble: r1i1p1, mip: mon, rcm_version: v1, domain: EUR-11}
Note that this is only one model that has this issue. However, there are multiple CORDEX data files on ESGF from a variety of different models / domains that have this issue. As such I'm not sure if the most appropriate fix is individual CMOR modules for each of the modules that uses duplicate code, it might be good to introduce a standard check / fix for this issue for all files from the CORDEX project (as PR #184 attempts to do)Problems encountered (please describe what the actual problems are: e.g. wrong standard name, issue with dimensions): Regridding fails due to missing bounds. Area weighted statistics not possible without FX files.
Pointer to existing copy of data on ESGF node (it would be very useful if you could provide a physical fullpath to the file(s) that are causing the problem, e.g. on CEDA Jasmin or DKRZ): E.g. From ESGF: http://esg-dn1.nsc.liu.se/thredds/fileServer/esg_dataroot1/cordexdata/cordex/output/EUR-11/SMHI/MOHC-HadGEM2-ES/historical/r1i1p1/SMHI-RCA4/v1/mon/tas/v20131026/tas_EUR-11_MOHC-HadGEM2-ES_historical_r1i1p1_SMHI-RCA4_v1_mon_200101-200512.nc
Other notes and mentions: @sloosvel @nhsavage
See example simple recipe for regridding:
Results in error due to lack of bounds:
ncdump: