Closed thomascrocker closed 3 years ago
From stepping into the souce code of _area.py
using pdb I worked out the problem with the area extraction. My data was defining the longitude coord in the range -180 to +180 rather than 0 to 360 as the code clearly expects. I can fix this by adapting my data preparation stage to ensure longitude is stored in the 0 to 360 range.
Am about to take a closer look at the regridding now to see if I can work out what is going wrong there.
Hi @thomascrocker , Thanks for reaching out! For some datasets, that 0-360 thing is automatically fixed in the preprocessor. We might want to look into issuing a warning and adding a fix for your data. In the meantime, did you have any luck after changing your longitude coordinate?
@valeriupredoi you've been looking quite a bit into the regridding lately. Any thoughts?
In the meantime, did you have any luck after changing your longitude coordinate?
Thanks for the reply. I added some nco commands to my data preparation code (which needs to run anyway to convert the data from pp format to CF compliant netCDF format) so that any longitude points (and bounds) < 0 have 360 added to them, forcing longitude into the range 0 to 360. After doing that area extraction worked as expected.
I then spent some time stepping through the source code in _regid_esmpy.py
to try and isolate where the problem was with the regridding. It looks like the issue is with the ESMF.Regrid
object itself.
https://github.com/ESMValGroup/ESMValCore/blob/6594471a0128dca05b56a7e5317525ddb16b8567/esmvalcore/preprocessor/_regrid_esmpy.py#L175
https://github.com/ESMValGroup/ESMValCore/blob/6594471a0128dca05b56a7e5317525ddb16b8567/esmvalcore/preprocessor/_regrid_esmpy.py#L186
Stepping through the code, at this point in execution, (line 186) the data in src_field
is fine, but the data returned in regr_field
is all zeros.
I've never used ESMPY before, so may well have missed something obvious in the setup of the src_field
and dst_field
due to my inexperience with it. (I normally use IRIS when working with regular grids, and CDO if I have to use an irregular one).
One thing that might be significant is the co-ordinate system of this dataset. Since it's from a regional model the main dim coords for the variable (rlon and rlat) are defined on a rotated grid, (i.e. with a different north pole location) and the longitude and latitude coords are aux coords that span both these dimensions. Due to the fact that the rotated pole is in a different location, the rotated longitudes of this data span the meridian, and therefore include values greater than 360.
One last thing I noticed when working with the debugger was that in construction of the target grid, if the target grid is being specified from a string specification (e.g. "2.5.x2.5") then the code assumes that the co-ordinate system of the target grid should match that of the source grid. https://github.com/ESMValGroup/ESMValCore/blob/6594471a0128dca05b56a7e5317525ddb16b8567/esmvalcore/preprocessor/_regrid.py#L317 In my use case this behaviour is undesirable, since the source grid has a rotated pole co-ordinate system, but the grid specified ("2.5x2.5") is a global grid which one would assume to be on a regular unrotated co-ordinate system. This is probably relevant to this issue https://github.com/ESMValGroup/ESMValCore/issues/493 To get around this in case it was an issue I tried specifying explicitly the grid from a GCM to regrid to in my recipe (HadGEM2-ES in this case) but I still saw the same behaviour of the regridded data all being set to zero,
Hope this information is helpful. I can upload a sample file if that would help? The one I've been using is around 100 mb in size, just let me know where to upload to.
@thomascrocker cheers for raising this and thanks for the very detailed debugging and problem-tracing! One suggestion would be to run with --check_level=strict
so you can see if there are any CMOR-related issues with the data - the code will stop and raise where the problem occurs; also, are you running on Jasmin? If so, can you pls point me to the data and the config/recipe you running? Cheers :beer:
@valeriupredoi I tried with --check_level=strict
but the data passed all the checks.
I'm not running on JASMIN, I have a local install. My recipe file (and debug output) are attached in the zip in my original post. I've also attached my config files here: esmvaltool_config.zip I did have JASMIN credentials but they expired a couple of months ago since I've not used it for some time. I've reapplied for access.
Do you have access to the MASS service on JASMIN? My processed RCM data is stored in MASS. If I understand MASS permissions correctly (there's a non zero chance I don't....) it should be read accessible to any MASS user.
See:
moose:/adhoc/users/thomas.crocker/cmor/u-bx459/CEFAS_Antigua/output/CEFAS_Antigua/MOHC/MOHC-HadGEM2-ES/historical/r1i1p1/MOHC-HadREM3-GA7-05/v1/v1/pr/pr_CEFAS_Antigua_MOHC-HadGEM2-ES_historical_r1i1p1_MOHC-HadREM3-GA7-05_v1_mon_197912-198812.nc
One other thing. I've been attempting to regrid the data from my high resolution (~12km grid) RCM to a coarser GCM scale grid. This uses the RCM as the source grid, and GCM as the destination. Since ESMValTool examines the lat and lon coords in the source grid and sees they are irregular it opts for ESMpy to handle regridding. Earlier I tried the other way around, i.e. regridding GCM data from the GCM grid onto the RCM grid. This time, ESMValTool sees that the lat and lon of the source grid are regularly spaced, and uses iris for the regridding. However, this also fails, in the case of the area_weighted scheme with the message:
ValueError: The horizontal grid coordinates of both the source and grid cubes must have the same coordinate system.
As far as I'm aware this is a known limitation of iris. I.e. area weighted regridding is not currently supported for differing coordinate systems. With the linear scheme there is a different error:
ValueError: The rectilinear grid coordinates of the given cube and target grid must either both have coordinate systems or both have no coordinate system but with matching coordinate metadata.
Looking at the data, the CMIP5 GCM data does not have a coordinate system specified. My RCM data does have the coordinate system specified on the rotated coords (rlon and rlat), but not on the standard lat and lon coords. In any case I really need area weighted regridding for my analysis anyway. Thanks
I've not gotten any further with this yet, since I've been focussing on a couple of other bits of work. However in the meantime I have got my JASMIN access sorted again. @valeriupredoi Could you let me know how to access the installations there? I can then transfer my data to JASMIN and we'll be singing from the same hymn sheet at least.
friendly ping @valeriupredoi
blast! this fell off the radar big way, sorry. @thomascrocker on JASMIN just do module load esmvaltool
(on any of the sci nodes) and you'll have the latest esmvaltool, I'll have a closer look at the issue today, need lunch now :beer:
@thomascrocker - right - am afraid without the actual data I can't do much; it would be great if you could put the data (or a sample of it) on a shared group workspace, eg esmeval where we store the OBS data - could you maybe do that and let me know pls (you'll have to apply for membership of that gws but I'll grant it right away), after that I will attempt at running a test :beer: Quick question - is this a dataset that will eventually be on ESGF? If so, we need to talk - the official cmorization is done via CDDS and that's what you guys should be running :+1:
Hi @valeriupredoi thanks very much for looking into this. I've applied for access to the esmval workspace so will upload some data once it's approved.
On the ESGF question. No it won't be ending up there (at least not in the short term). The budget for the project is very limited and hosting on ESGF isn't part of it. However, I have been doing my cmorization using an adapted version of the CDDS suites that are being used internally to prep data from other projects for upload to ESGF. There has been a lot of interest in data from the project from others so it's highly likely it will be used in further research in the future, hopefully that might result in an official hosting on ESGF at some point.
@thomascrocker just approved your membership for esmeval :+1: If you could tell me where you put the data that'd be awesome! About CDDS - good stuff, let me know if you want to use the Jasmin installation, I am managing the project on Jasmin :beer:
OK, I've dropped some monthly data in:
/group_workspaces/jasmin4/esmeval/gh_issue772/mi-ba795
I've also uploaded the custom CMOR tables that I've been using (these are the ones that were used by the suite I used to CMORize my data) into:
/group_workspaces/jasmin4/esmeval/gh_issue772/cmor_tables
Finally, here is the entry I used in my config-developer.yml file for reading the data in recipes:
Antigua_NAP_raw:
input_dir:
default: '{dataset}'
input_file: '{short_name}_CEFAS_Antigua_{driver}_{exp}_r1i1p1_MOHC-HadREM3-GA7-05_v1_{mip}*.nc'
output_file: '{short_name}_{dataset}_{exp}_{mip}'
cmor_path: '/project/ciid/projects/Antigua/CMOR_tables/cordex'
cmor_type: 'CMIP6'
cmor_default_table_prefix: 'CMIP6_'
Hopefully that should be sufficient for you to read in the data and attempt to regrid it to a regular lat/lon grid. Let me know if you have any issues or questions. Cheers
great! cheers, mate - I'll have a stab at testing it tomorrow :+1: :beer:
@thomascrocker here's what I found out: I tried running a recipe (which I post below), and immediately run into this CMOR checker error:
esmvalcore.cmor.check.CMORCheckError: There were errors in variable pr:
Coordinate longitude has var name longitude instead of lon
Coordinate latitude has var name latitude instead of lat
in cube:
precipitation_flux / (kg m-2 s-1) (time: 120; grid_latitude: 310; grid_longitude: 656)
Dimension coordinates:
time x - -
grid_latitude - x -
grid_longitude - - x
Auxiliary coordinates:
latitude - x x
longitude - x x
Attributes:
CORDEX_domain: CEFAS_Antigua
Conventions: CF-1.4
c3s_disclaimer: This data has been produced in the context of the C3S_34b_Lot1 and Lot...
comment: at surface; includes both liquid and solid phases from all types of clouds...
contact: thomas.crocker@metoffice.gov.uk
driving_experiment: MOHC-HadGEM2-ES,historical,r1i1p1
driving_experiment_name: historical
driving_model_ensemble_member: r1i1p1
driving_model_id: MOHC-HadGEM2-ES
experiment: Historical run using HadGEM2-ES as a driving model
experiment_id: historical
frequency: mon
institute_id: MOHC
institution: MetOffice, Hadley Centre, UK
model_id: MOHC-HadREM3-GA7-05
original_name: mo: (stash: m01s05i216, lbproc: 128)
product: output
project_id: CEFAS_Antigua
rcm_version_id: v1
references: https://www.metoffice.gov.uk/climate-guide/science/science-behind-climate-change/hadley;...
source_file: /group_workspaces/jasmin4/esmeval/gh_issue772/mi-ba795/pr_CEFAS_Antigu...
Cell methods:
mean: time
Note that this is a not such a serious issue since if one runs with --check_level relaxed
the warning will pop:
2020-11-11 14:24:02,380 UTC [20873] WARNING There were warnings in variable pr:
Coordinate longitude has var name longitude instead of lon
Coordinate latitude has var name latitude instead of lat
but the code will not crash as it's supposed to do when there are serious issues. Note that this should be fixed since the default setting for cmor checks does crash and spits out this error, and we want data to ideally be compliant to the default level. But anyway, we can run with relaxed for now...
OK, now the recipe:
---
documentation:
description: |
Check 0 values from regrid
authors:
- predoi_valeriu
preprocessors:
regridp:
regrid:
target_grid: 1x1
scheme: linear
diagnostics:
BerkeleyEarth:
description: Antigua check
variables:
pr:
preprocessor: regridp
additional_datasets:
- {dataset: MOHC-HadREM3-GA7-05, domain: CEFAS_Antigua, project: CORDEX, driver: MOHC-HadGEM2-ES, mip: mon, exp: historical, ensemble: r1i1p1, rcm_version: v1,
start_year: 2000, end_year: 2008}
scripts: null
you can use this recipe and use the CORDEX project right out the box, not needing to add projects to the config-developer file nor paths to custom cmor dirs; see the config-user:
rootpath:
CORDEX: /group_workspaces/jasmin4/esmeval/gh_issue772/mi-ba795
drs:
CORDEX: default
and it'll work right away! Also note that your custom cmor tables are almost identical to the ones in esmvaltool (CORDEX ones), bar a couple more vars in your tables, but not affecting the current tests.
Now, on to the issue at hand - why the regridded data is 0: here's what I found:
_regrid_esmpy.py
:
def regridder(src):
"""Regrid 2d for irregular grids."""
res = get_empty_data(dst_rep.shape, src.dtype)
data = src.data
if np.ma.is_masked(data):
data = data.data
src_field.data[...] = data.T
print("source data", np.mean(src_field.data))
regr_field = field_regridder(src_field, dst_field)
print("regridder field data", np.mean(regr_field.data))
res.data[...] = regr_field.data[...].T
res.mask[...] = dst_mask
return res
the source data is O(1e-5) and the regr data is O(1e-300) - so it means that the ESMF regridder:
field_regridder = ESMF.Regrid(src_mask_values=np.array([1]),
dst_mask_values=np.array([1]),
**regridding_arguments)
is spitting out garbage; note that this is what I got for a regrid on a 1x1 degree grid, increasing the target grid to 10x10 or 20x20 etc does not improve the supertiny result numbers, nor does multiplying the source data by any given of orders of magnitudes. It is as if the ESMF regridder hits an overflow and returns garbage instead of actual values (1.3e-322 for each data point). I am not familiar with ESMF though, but @zklaus is and maybe he can help? :beer:
also just noticed that some of my findings have already been listed by @thomascrocker in an earlier comment - meh, good thing we reached the same conclusion :grin:
It might be a good idea to send @zklaus an email if you want a reaction from him, he hasn't been very actively reading the notifications from the ESMValTool project recently..
let's see if it works: Klauuuussssss @zklaus @zklaus @zklaus :grin: Yeah, I'll send him an email :mailbox:
@valeriupredoi thanks for taking a look and good to see that we both came to the same conclusion at least. I also encountered the issue with the co-ordinate names, but I think I got around it using my custom CMOR tables. Good to know that I can use the standard CORDEX project settings though provided I just change the lat and lon naming in my use of the CDDS suite. Let's hope someone can shed some light on the ESMF behaviour
I will be emailing Klaus in 2min
Alright, had a look, here is what I found:
@thomascrocker is completely right that the treatment of coordinate systems in _regrid.regrid
is problematic. Basically it just says to interpret the latitudes and longitudes created in the stock cube (often 1x1 or similar) according to the coord_system of the data cube. This kind-of worked up to now mostly because everyone is using geodetic cs anyway, but is quite dangerous when we are mixing in other coordinate systems, like the rotated one here and probably some projections.
The main problem is that the regridder doesn't check whether source and target cover the same domain. If the target is not covered by the source, we need to employ mask handling, even if neither source nor target are masked.
good stuff @zklaus :+1: - what do you recommend - implementing a check on the CS and subsequent treatment depending on it or this is something that will work out of the box in iris 3+ - also do you think this is worth plopping on to SciTools GitHub? :beer:
The coordinate system issue has to be fixed regardless of this regridding problem, I think. It is a bit hairy though for two reasons: One is that a general approach with cartopy.crs.CRS
and iris.coord_systems.CoordSystem
can be involved. Perhaps some changes to iris (e.g. to complement the CoordSystem.as_cartopy_crs
function, a CoordSystem.from_cartopy_crs
classmethod could be added) would be most sensible.
The regridding should be taken care of with the new regridding project. But perhaps an interims fix is sensible?
@thomascrocker, @valeriupredoi, I put up PR #865 that should fix the regridding issue (at least I was able to run your recipe and data with it. Could you give it a test run?
@thomascrocker, @valeriupredoi, I put up PR #865 that should fix the regridding issue (at least I was able to run your recipe and data with it. Could you give it a test run?
Thanks for developing a fix so quickly..
Looks like there is a problem though, I found that while the regridding appears to work correctly, it has the side effect of doing something to the latitude and longitude co-ordinates such that Iris can't read them..
This is what a print(cube) in python returns on the regrid cube produced by the processor
precipitation_flux / (kg m-2 s-1) (time: 360; -- : 72; -- : 144)
Dimension coordinates:
time x - -
However, it looks like the coords are still there in the netcdf file
$ ncdump -h 06_regrid.nc
netcdf \06_regrid {
dimensions:
time = 360 ;
lat = 72 ;
lon = 144 ;
bnds = 2 ;
variables:
float pr(time, lat, lon) ;
pr:_FillValue = 1.e+20f ;
pr:standard_name = "precipitation_flux" ;
pr:long_name = "Precipitation" ;
pr:units = "kg m-2 s-1" ;
pr:cell_methods = "time: mean" ;
pr:grid_mapping = "rotated_latitude_longitude" ;
int rotated_latitude_longitude ;
rotated_latitude_longitude:grid_mapping_name = "rotated_latitude_longitude" ;
rotated_latitude_longitude:grid_north_pole_latitude = 68.5 ;
rotated_latitude_longitude:grid_north_pole_longitude = 112.5 ;
rotated_latitude_longitude:north_pole_grid_longitude = 0. ;
double time(time) ;
time:axis = "T" ;
time:bounds = "time_bnds" ;
time:units = "days since 1850-1-1 00:00:00" ;
time:standard_name = "time" ;
time:long_name = "time" ;
time:calendar = "360_day" ;
double time_bnds(time, bnds) ;
double lat(lat) ;
lat:axis = "Y" ;
lat:bounds = "lat_bnds" ;
lat:units = "degrees_north" ;
lat:standard_name = "latitude" ;
double lat_bnds(lat, bnds) ;
double lon(lon) ;
lon:axis = "X" ;
lon:bounds = "lon_bnds" ;
lon:units = "degrees_east" ;
lon:standard_name = "longitude" ;
double lon_bnds(lon, bnds) ;
...
Could it be that the grid_mapping attribute needs removing from the variable?
Could it be that the grid_mapping attribute needs removing from the variable?
Yes, that does seem to do the trick. WIll have a look how it snuck through...
Turns out this is a resurfacing of #479.
Hello, I'm attempting to use ESMValTool with data from the Met Office HadGEM3-RA regional model. ESMValTool_issue_files.zip
I have successfully run my model output through an in house CMORization / mip_convert suite and added project lines to read the files to a custom
config-developer.yml
file (using the same CMOR tables as for CORDEX). My simple recipe and diagnostic runs without any errors, I've also not noticed any relevant warnings.However, something is going wrong with the regridding steps (and if I remove regridding steps from the recipe, I get problems with area extraction, as I'll explain later).
I set
save_intermediary_cubes
to true in myconfig_user.yml
in order to examine the state of the data as it moves through the pre processor. Regardless of regridding scheme chosen, the data created immediately after the regridding step (in05_regrid.nc
) has all been set to zero, across every single grid point. (The data prior to this step is still intact)I wondered if this is perhaps related to this open pull request https://github.com/ESMValGroup/ESMValCore/pull/185 (If so, is the code under review likely to fix my problem?)
I also tried removing the regridding steps from my recipe entirely. In this case although again my recipe ran without any warnings or errors, the output was not as expected and I noticed something goes wrong at the point that the region extraction is carried out. This time the data immediately after the region extraction has all been set to 1e20 (masking?). Again, the preprocessed data immediately prior to this step is fine.
Any help appreciated, I'm hoping to use ESMValTool to perform analysis of some downscaled climate projections for the Caribbean that we have recently completed.
Thanks