ESMValGroup / ESMValCore

ESMValCore: A community tool for pre-processing data from Earth system models in CMIP and running analysis scripts.
https://www.esmvaltool.org
Apache License 2.0
42 stars 38 forks source link

ESMF returns garbage data after CORDEX regridding #772

Closed thomascrocker closed 3 years ago

thomascrocker commented 4 years ago

Hello, I'm attempting to use ESMValTool with data from the Met Office HadGEM3-RA regional model. ESMValTool_issue_files.zip

I have successfully run my model output through an in house CMORization / mip_convert suite and added project lines to read the files to a custom config-developer.yml file (using the same CMOR tables as for CORDEX). My simple recipe and diagnostic runs without any errors, I've also not noticed any relevant warnings.

However, something is going wrong with the regridding steps (and if I remove regridding steps from the recipe, I get problems with area extraction, as I'll explain later).

I set save_intermediary_cubes to true in my config_user.yml in order to examine the state of the data as it moves through the pre processor. Regardless of regridding scheme chosen, the data created immediately after the regridding step (in 05_regrid.nc) has all been set to zero, across every single grid point. (The data prior to this step is still intact)

I wondered if this is perhaps related to this open pull request https://github.com/ESMValGroup/ESMValCore/pull/185 (If so, is the code under review likely to fix my problem?)

I also tried removing the regridding steps from my recipe entirely. In this case although again my recipe ran without any warnings or errors, the output was not as expected and I noticed something goes wrong at the point that the region extraction is carried out. This time the data immediately after the region extraction has all been set to 1e20 (masking?). Again, the preprocessed data immediately prior to this step is fine.

Any help appreciated, I'm hoping to use ESMValTool to perform analysis of some downscaled climate projections for the Caribbean that we have recently completed.

Thanks

thomascrocker commented 4 years ago

From stepping into the souce code of _area.py using pdb I worked out the problem with the area extraction. My data was defining the longitude coord in the range -180 to +180 rather than 0 to 360 as the code clearly expects. I can fix this by adapting my data preparation stage to ensure longitude is stored in the 0 to 360 range. Am about to take a closer look at the regridding now to see if I can work out what is going wrong there.

Peter9192 commented 4 years ago

Hi @thomascrocker , Thanks for reaching out! For some datasets, that 0-360 thing is automatically fixed in the preprocessor. We might want to look into issuing a warning and adding a fix for your data. In the meantime, did you have any luck after changing your longitude coordinate?

@valeriupredoi you've been looking quite a bit into the regridding lately. Any thoughts?

thomascrocker commented 4 years ago

In the meantime, did you have any luck after changing your longitude coordinate?

Thanks for the reply. I added some nco commands to my data preparation code (which needs to run anyway to convert the data from pp format to CF compliant netCDF format) so that any longitude points (and bounds) < 0 have 360 added to them, forcing longitude into the range 0 to 360. After doing that area extraction worked as expected.

I then spent some time stepping through the source code in _regid_esmpy.py to try and isolate where the problem was with the regridding. It looks like the issue is with the ESMF.Regrid object itself. https://github.com/ESMValGroup/ESMValCore/blob/6594471a0128dca05b56a7e5317525ddb16b8567/esmvalcore/preprocessor/_regrid_esmpy.py#L175 https://github.com/ESMValGroup/ESMValCore/blob/6594471a0128dca05b56a7e5317525ddb16b8567/esmvalcore/preprocessor/_regrid_esmpy.py#L186 Stepping through the code, at this point in execution, (line 186) the data in src_field is fine, but the data returned in regr_field is all zeros. I've never used ESMPY before, so may well have missed something obvious in the setup of the src_field and dst_field due to my inexperience with it. (I normally use IRIS when working with regular grids, and CDO if I have to use an irregular one).

One thing that might be significant is the co-ordinate system of this dataset. Since it's from a regional model the main dim coords for the variable (rlon and rlat) are defined on a rotated grid, (i.e. with a different north pole location) and the longitude and latitude coords are aux coords that span both these dimensions. Due to the fact that the rotated pole is in a different location, the rotated longitudes of this data span the meridian, and therefore include values greater than 360.

One last thing I noticed when working with the debugger was that in construction of the target grid, if the target grid is being specified from a string specification (e.g. "2.5.x2.5") then the code assumes that the co-ordinate system of the target grid should match that of the source grid. https://github.com/ESMValGroup/ESMValCore/blob/6594471a0128dca05b56a7e5317525ddb16b8567/esmvalcore/preprocessor/_regrid.py#L317 In my use case this behaviour is undesirable, since the source grid has a rotated pole co-ordinate system, but the grid specified ("2.5x2.5") is a global grid which one would assume to be on a regular unrotated co-ordinate system. This is probably relevant to this issue https://github.com/ESMValGroup/ESMValCore/issues/493 To get around this in case it was an issue I tried specifying explicitly the grid from a GCM to regrid to in my recipe (HadGEM2-ES in this case) but I still saw the same behaviour of the regridded data all being set to zero,

Hope this information is helpful. I can upload a sample file if that would help? The one I've been using is around 100 mb in size, just let me know where to upload to.

valeriupredoi commented 4 years ago

@thomascrocker cheers for raising this and thanks for the very detailed debugging and problem-tracing! One suggestion would be to run with --check_level=strict so you can see if there are any CMOR-related issues with the data - the code will stop and raise where the problem occurs; also, are you running on Jasmin? If so, can you pls point me to the data and the config/recipe you running? Cheers :beer:

thomascrocker commented 4 years ago

@valeriupredoi I tried with --check_level=strict but the data passed all the checks.

I'm not running on JASMIN, I have a local install. My recipe file (and debug output) are attached in the zip in my original post. I've also attached my config files here: esmvaltool_config.zip I did have JASMIN credentials but they expired a couple of months ago since I've not used it for some time. I've reapplied for access.

Do you have access to the MASS service on JASMIN? My processed RCM data is stored in MASS. If I understand MASS permissions correctly (there's a non zero chance I don't....) it should be read accessible to any MASS user. See: moose:/adhoc/users/thomas.crocker/cmor/u-bx459/CEFAS_Antigua/output/CEFAS_Antigua/MOHC/MOHC-HadGEM2-ES/historical/r1i1p1/MOHC-HadREM3-GA7-05/v1/v1/pr/pr_CEFAS_Antigua_MOHC-HadGEM2-ES_historical_r1i1p1_MOHC-HadREM3-GA7-05_v1_mon_197912-198812.nc

One other thing. I've been attempting to regrid the data from my high resolution (~12km grid) RCM to a coarser GCM scale grid. This uses the RCM as the source grid, and GCM as the destination. Since ESMValTool examines the lat and lon coords in the source grid and sees they are irregular it opts for ESMpy to handle regridding. Earlier I tried the other way around, i.e. regridding GCM data from the GCM grid onto the RCM grid. This time, ESMValTool sees that the lat and lon of the source grid are regularly spaced, and uses iris for the regridding. However, this also fails, in the case of the area_weighted scheme with the message:

ValueError: The horizontal grid coordinates of both the source and grid cubes must have the same coordinate system.

As far as I'm aware this is a known limitation of iris. I.e. area weighted regridding is not currently supported for differing coordinate systems. With the linear scheme there is a different error:

ValueError: The rectilinear grid coordinates of the given cube and target grid must either both have coordinate systems or both have no coordinate system but with matching coordinate metadata.

Looking at the data, the CMIP5 GCM data does not have a coordinate system specified. My RCM data does have the coordinate system specified on the rotated coords (rlon and rlat), but not on the standard lat and lon coords. In any case I really need area weighted regridding for my analysis anyway. Thanks

thomascrocker commented 4 years ago

I've not gotten any further with this yet, since I've been focussing on a couple of other bits of work. However in the meantime I have got my JASMIN access sorted again. @valeriupredoi Could you let me know how to access the installations there? I can then transfer my data to JASMIN and we'll be singing from the same hymn sheet at least.

bouweandela commented 4 years ago

friendly ping @valeriupredoi

valeriupredoi commented 4 years ago

blast! this fell off the radar big way, sorry. @thomascrocker on JASMIN just do module load esmvaltool (on any of the sci nodes) and you'll have the latest esmvaltool, I'll have a closer look at the issue today, need lunch now :beer:

valeriupredoi commented 4 years ago

@thomascrocker - right - am afraid without the actual data I can't do much; it would be great if you could put the data (or a sample of it) on a shared group workspace, eg esmeval where we store the OBS data - could you maybe do that and let me know pls (you'll have to apply for membership of that gws but I'll grant it right away), after that I will attempt at running a test :beer: Quick question - is this a dataset that will eventually be on ESGF? If so, we need to talk - the official cmorization is done via CDDS and that's what you guys should be running :+1:

thomascrocker commented 4 years ago

Hi @valeriupredoi thanks very much for looking into this. I've applied for access to the esmval workspace so will upload some data once it's approved.

On the ESGF question. No it won't be ending up there (at least not in the short term). The budget for the project is very limited and hosting on ESGF isn't part of it. However, I have been doing my cmorization using an adapted version of the CDDS suites that are being used internally to prep data from other projects for upload to ESGF. There has been a lot of interest in data from the project from others so it's highly likely it will be used in further research in the future, hopefully that might result in an official hosting on ESGF at some point.

valeriupredoi commented 4 years ago

@thomascrocker just approved your membership for esmeval :+1: If you could tell me where you put the data that'd be awesome! About CDDS - good stuff, let me know if you want to use the Jasmin installation, I am managing the project on Jasmin :beer:

thomascrocker commented 4 years ago

OK, I've dropped some monthly data in: /group_workspaces/jasmin4/esmeval/gh_issue772/mi-ba795

I've also uploaded the custom CMOR tables that I've been using (these are the ones that were used by the suite I used to CMORize my data) into:

/group_workspaces/jasmin4/esmeval/gh_issue772/cmor_tables

Finally, here is the entry I used in my config-developer.yml file for reading the data in recipes:

Antigua_NAP_raw:
  input_dir:
    default: '{dataset}'
  input_file: '{short_name}_CEFAS_Antigua_{driver}_{exp}_r1i1p1_MOHC-HadREM3-GA7-05_v1_{mip}*.nc'
  output_file: '{short_name}_{dataset}_{exp}_{mip}'
  cmor_path: '/project/ciid/projects/Antigua/CMOR_tables/cordex'
  cmor_type: 'CMIP6'
  cmor_default_table_prefix: 'CMIP6_'

Hopefully that should be sufficient for you to read in the data and attempt to regrid it to a regular lat/lon grid. Let me know if you have any issues or questions. Cheers

valeriupredoi commented 4 years ago

great! cheers, mate - I'll have a stab at testing it tomorrow :+1: :beer:

valeriupredoi commented 4 years ago

@thomascrocker here's what I found out: I tried running a recipe (which I post below), and immediately run into this CMOR checker error:

esmvalcore.cmor.check.CMORCheckError: There were errors in variable pr:
Coordinate longitude has var name longitude instead of lon
 Coordinate latitude has var name latitude instead of lat
in cube:
precipitation_flux / (kg m-2 s-1)   (time: 120; grid_latitude: 310; grid_longitude: 656)
     Dimension coordinates:
          time                           x                   -                    -
          grid_latitude                  -                   x                    -
          grid_longitude                 -                   -                    x
     Auxiliary coordinates:
          latitude                       -                   x                    x
          longitude                      -                   x                    x
     Attributes:
          CORDEX_domain: CEFAS_Antigua
          Conventions: CF-1.4
          c3s_disclaimer: This data has been produced in the context of the C3S_34b_Lot1 and Lot...
          comment: at surface; includes both liquid and solid phases from all types of clouds...
          contact: thomas.crocker@metoffice.gov.uk
          driving_experiment: MOHC-HadGEM2-ES,historical,r1i1p1
          driving_experiment_name: historical
          driving_model_ensemble_member: r1i1p1
          driving_model_id: MOHC-HadGEM2-ES
          experiment: Historical run using HadGEM2-ES as a driving model
          experiment_id: historical
          frequency: mon
          institute_id: MOHC
          institution: MetOffice, Hadley Centre, UK
          model_id: MOHC-HadREM3-GA7-05
          original_name: mo: (stash: m01s05i216, lbproc: 128)
          product: output
          project_id: CEFAS_Antigua
          rcm_version_id: v1
          references: https://www.metoffice.gov.uk/climate-guide/science/science-behind-climate-change/hadley;...
          source_file: /group_workspaces/jasmin4/esmeval/gh_issue772/mi-ba795/pr_CEFAS_Antigu...
     Cell methods:
          mean: time

Note that this is a not such a serious issue since if one runs with --check_level relaxed the warning will pop:

2020-11-11 14:24:02,380 UTC [20873] WARNING There were warnings in variable pr:
Coordinate longitude has var name longitude instead of lon
 Coordinate latitude has var name latitude instead of lat

but the code will not crash as it's supposed to do when there are serious issues. Note that this should be fixed since the default setting for cmor checks does crash and spits out this error, and we want data to ideally be compliant to the default level. But anyway, we can run with relaxed for now...

OK, now the recipe:

---
documentation:
  description: |
    Check 0 values from regrid

  authors:
    - predoi_valeriu

preprocessors:
  regridp:
    regrid:
      target_grid: 1x1
      scheme: linear

diagnostics:
  BerkeleyEarth:
    description: Antigua check
    variables:
      pr:
        preprocessor: regridp
    additional_datasets:
      - {dataset: MOHC-HadREM3-GA7-05, domain: CEFAS_Antigua, project: CORDEX, driver: MOHC-HadGEM2-ES, mip: mon, exp: historical, ensemble: r1i1p1, rcm_version: v1,
         start_year: 2000, end_year: 2008}
    scripts: null

you can use this recipe and use the CORDEX project right out the box, not needing to add projects to the config-developer file nor paths to custom cmor dirs; see the config-user:

rootpath:
  CORDEX: /group_workspaces/jasmin4/esmeval/gh_issue772/mi-ba795
drs:
  CORDEX: default

and it'll work right away! Also note that your custom cmor tables are almost identical to the ones in esmvaltool (CORDEX ones), bar a couple more vars in your tables, but not affecting the current tests.

Now, on to the issue at hand - why the regridded data is 0: here's what I found:

valeriupredoi commented 4 years ago

also just noticed that some of my findings have already been listed by @thomascrocker in an earlier comment - meh, good thing we reached the same conclusion :grin:

bouweandela commented 4 years ago

It might be a good idea to send @zklaus an email if you want a reaction from him, he hasn't been very actively reading the notifications from the ESMValTool project recently..

valeriupredoi commented 4 years ago

let's see if it works: Klauuuussssss @zklaus @zklaus @zklaus :grin: Yeah, I'll send him an email :mailbox:

thomascrocker commented 4 years ago

@valeriupredoi thanks for taking a look and good to see that we both came to the same conclusion at least. I also encountered the issue with the co-ordinate names, but I think I got around it using my custom CMOR tables. Good to know that I can use the standard CORDEX project settings though provided I just change the lat and lon naming in my use of the CDDS suite. Let's hope someone can shed some light on the ESMF behaviour

valeriupredoi commented 4 years ago

I will be emailing Klaus in 2min

zklaus commented 3 years ago

Alright, had a look, here is what I found:

valeriupredoi commented 3 years ago

good stuff @zklaus :+1: - what do you recommend - implementing a check on the CS and subsequent treatment depending on it or this is something that will work out of the box in iris 3+ - also do you think this is worth plopping on to SciTools GitHub? :beer:

zklaus commented 3 years ago

The coordinate system issue has to be fixed regardless of this regridding problem, I think. It is a bit hairy though for two reasons: One is that a general approach with cartopy.crs.CRS and iris.coord_systems.CoordSystem can be involved. Perhaps some changes to iris (e.g. to complement the CoordSystem.as_cartopy_crs function, a CoordSystem.from_cartopy_crs classmethod could be added) would be most sensible.

The regridding should be taken care of with the new regridding project. But perhaps an interims fix is sensible?

zklaus commented 3 years ago

@thomascrocker, @valeriupredoi, I put up PR #865 that should fix the regridding issue (at least I was able to run your recipe and data with it. Could you give it a test run?

thomascrocker commented 3 years ago

@thomascrocker, @valeriupredoi, I put up PR #865 that should fix the regridding issue (at least I was able to run your recipe and data with it. Could you give it a test run?

Thanks for developing a fix so quickly..

Looks like there is a problem though, I found that while the regridding appears to work correctly, it has the side effect of doing something to the latitude and longitude co-ordinates such that Iris can't read them..

This is what a print(cube) in python returns on the regrid cube produced by the processor

precipitation_flux / (kg m-2 s-1)   (time: 360; -- : 72; -- : 144)
     Dimension coordinates:
          time                           x         -        -

However, it looks like the coords are still there in the netcdf file

$ ncdump -h 06_regrid.nc 
netcdf \06_regrid {
dimensions:
    time = 360 ;
    lat = 72 ;
    lon = 144 ;
    bnds = 2 ;
variables:
    float pr(time, lat, lon) ;
        pr:_FillValue = 1.e+20f ;
        pr:standard_name = "precipitation_flux" ;
        pr:long_name = "Precipitation" ;
        pr:units = "kg m-2 s-1" ;
        pr:cell_methods = "time: mean" ;
        pr:grid_mapping = "rotated_latitude_longitude" ;
    int rotated_latitude_longitude ;
        rotated_latitude_longitude:grid_mapping_name = "rotated_latitude_longitude" ;
        rotated_latitude_longitude:grid_north_pole_latitude = 68.5 ;
        rotated_latitude_longitude:grid_north_pole_longitude = 112.5 ;
        rotated_latitude_longitude:north_pole_grid_longitude = 0. ;
    double time(time) ;
        time:axis = "T" ;
        time:bounds = "time_bnds" ;
        time:units = "days since 1850-1-1 00:00:00" ;
        time:standard_name = "time" ;
        time:long_name = "time" ;
        time:calendar = "360_day" ;
    double time_bnds(time, bnds) ;
    double lat(lat) ;
        lat:axis = "Y" ;
        lat:bounds = "lat_bnds" ;
        lat:units = "degrees_north" ;
        lat:standard_name = "latitude" ;
    double lat_bnds(lat, bnds) ;
    double lon(lon) ;
        lon:axis = "X" ;
        lon:bounds = "lon_bnds" ;
        lon:units = "degrees_east" ;
        lon:standard_name = "longitude" ;
    double lon_bnds(lon, bnds) ;
...

Could it be that the grid_mapping attribute needs removing from the variable?

zklaus commented 3 years ago

Could it be that the grid_mapping attribute needs removing from the variable?

Yes, that does seem to do the trick. WIll have a look how it snuck through...

zklaus commented 3 years ago

Turns out this is a resurfacing of #479.