ESMValGroup / ESMValCore

ESMValCore: A community tool for pre-processing data from Earth system models in CMIP and running analysis scripts.
https://www.esmvaltool.org
Apache License 2.0
42 stars 38 forks source link

Working with unstructured ocean AWI-CM data #751

Closed koldunovn closed 1 year ago

koldunovn commented 4 years ago

Describe the bug There are several checks that do not allow AWI-CM-1-1-MR unstructured ocean data (thetao and so) to pass the cmorization checks.

Little background

Here is a small description of how unstructured ocean mesh in AWI-CM ocean model FESOM is organised: https://fesom.de/cmip6/work-with-awi-cm-unstructured-data/

In short, lons and lats are 1D arrays of coordinates of points (vertices of triangles). It's like 2D lons and lats for structured models, but not organised in an array. In other words, if you ravel the 2D array of lons and lats from structured meshes, you will get something similar (although in this case still with some structure).

What I did

I would really appreciate advice on how to do it the best way, espetially from @valeriupredoi and @jvegasbsc :)

ncdump -h  /mnt/lustre02/work/ik1017/CMIP6/data/CMIP6/CMIP/AWI/AWI-CM-1-1-MR/historical/r1i1p1f1/Omon/thetao/gn/v20181218/thetao_Omon_AWI-CM-1-1-MR_historical_r1i1p1f1_gn_198101-199012.nc
netcdf thetao_Omon_AWI-CM-1-1-MR_historical_r1i1p1f1_gn_198101-199012 {
dimensions:
    time = UNLIMITED ; // (120 currently)
    bnds = 2 ;
    ncells = 830305 ;
    depth = 46 ;
    vertices = 16 ;
variables:
    double time(time) ;
        time:standard_name = "time" ;
        time:long_name = "time" ;
        time:bounds = "time_bnds" ;
        time:units = "days since 1981-1-1 00:00:00" ;
        time:calendar = "standard" ;
        time:axis = "T" ;
    double time_bnds(time, bnds) ;
    double depth(depth) ;
        depth:long_name = "depth" ;
        depth:units = "m" ;
        depth:positive = "down" ;
        depth:axis = "Z" ;
    float thetao(time, depth, ncells) ;
        thetao:units = "degC" ;
        thetao:CDI_grid_type = "unstructured" ;
        thetao:_FillValue = 1.e+30f ;
        thetao:missing_value = 1.e+30f ;
        thetao:description = "Diagnostic should be contributed even for models using conservative temperature as prognostic field." ;
        thetao:coordinates = "lat lon" ;
        thetao:standard_name = "sea_water_potential_temperature" ;
        thetao:cell_methods = "area: mean where sea time: mean" ;
        thetao:cell_measures = "area: areacello volume: volcello" ;
    double lat(ncells) ;
        lat:units = "degrees_north" ;
        lat:standard_name = "latitude" ;
        lat:bounds = "lat_bnds" ;
    double lat_bnds(ncells, vertices) ;
    double lon(ncells) ;
        lon:units = "degrees_east" ;
        lon:standard_name = "longitude" ;
        lon:bounds = "lon_bnds" ;
    double lon_bnds(ncells, vertices) ;

// global attributes:
        :frequency = "mon" ;
        :activity_id = "CMIP" ;
        :Conventions = "CF-1.7 CMIP-6.2" ;
        :creation_date = "2018-12-18T12:00:00Z" ;
        :data_specs_version = "01.00.27" ;
        :experiment = "historical" ;
        :experiment_id = "historical" ;
        :forcing_index = 1 ;
        :further_info_url = "https://furtherinfo.es-doc.org/CMIP6.AWI.AWI-CM-1-1-MR.historical.none.r1i1p1f1" ;
        :grid = "FESOM 1.4 (unstructured grid in the horizontal with 830305 wet nodes; 46 levels; top grid cell 0-5 m)" ;
        :grid_label = "gn" ;
        :initialization_index = 1 ;
        :institution = "Alfred Wegener Institute, Helmholtz Centre for Polar and Marine Research, Am Handelshafen 12, 27570 Bremerhaven, Germany" ;
        :institution_id = "AWI" ;
        :license = "CMIP6 model data produced by Alfred Wegener Institute, Helmholtz Centre for Polar and Marine Research, Am Handelshafen 12, 27570 Bremerhaven, Germany is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License (https://creativecommons.org/licenses/). Consult https://pcmdi.llnl.gov/CMIP6/TermsOfUse for terms of use governing CMIP6 output, including citation requirements and proper acknowledgment. Further information about this data, including some limitations, can be found via the further_info_url (recorded as a global attribute in this file). The data producers and data providers make no warranty, either express or implied, including, but not limited to, warranties of merchantability and fitness for a particular purpose. All liabilities arising from the supply of the information (including any liability arising in negligence) are excluded to the fullest extent permitted by law." ;
        :mip_era = "CMIP6" ;
        :nominal_resolution = "25 km" ;
        :physics_index = 1 ;
        :product = "model-output" ;
        :realization_index = 1 ;
        :realm = "ocean" ;
        :source = "AWI-CM-1-1-MR" ;
        :source_id = "AWI-CM-1-1-MR" ;
        :source_type = "AOGCM" ;
        :sub_experiment = "none" ;
        :sub_experiment_id = "none" ;
        :table_id = "Omon" ;
        :tracking_id = "hdl:21.14100/dea7aff3-7af4-424d-a8a1-2b08d6bdb266" ;
        :variable_id = "thetao" ;
        :variant_label = "r1i1p1f1" ;
        :branch_method = "standard" ;
        :branch_time_in_child = 0. ;
        :branch_time_in_parent = 54421. ;
        :parent_activity_id = "CMIP" ;
        :parent_experiment_id = "piControl" ;
        :parent_mip_era = "CMIP6" ;
        :parent_source_id = "AWI-CM-1-1-MR" ;
        :parent_time_units = "days since 2401-1-1" ;
        :parent_variant_label = "r1i1p1f1" ;
}
bouweandela commented 4 years ago

I am also not sure why there are diferen procedures for 1D and 2D data, but maybe I am missing something.

I suspect this may be due to the limited support iris has for 2D lat/lon coordinates.

bouweandela commented 4 years ago

@zklaus Could you have a look at this issue? Do you have any advice?

jvegreg commented 4 years ago

To pass the checker, we need a way to detect that the coordinate is unstructured and not a bad regular one.

There is nothing in the file that directly states that, but a regular grid will only have 2 bounds per cell, not 16 like this case. I think we can skip those checks if bounds second dimension is > 2. Then you will start having trouble with nearly every function that operates on the grid, but let's solve issues one at a time

koldunovn commented 4 years ago

@jvegasbsc Looks like the way I did it doesn't really work, since since coordinates not nessesarelly have to have shape. What would be the right aproach? I can wrapp it in some try/except, or if statement (checking if coordinate has shape), but it's kind of ugly? :)

jvegreg commented 4 years ago

@jvegasbsc Looks like the way I did it doesn't really work, since since coordinates not nessesarelly have to have shape. What would be the right aproach? I can wrapp it in some try/except, or if statement (checking if coordinate has shape), but it's kind of ugly? :)

I would upload a pull request with the required changes

koldunovn commented 4 years ago

My take on it is in # #752 . Do you think it makes sence to try the ifs aproach for the piece of code in check.py ? :)

jvegreg commented 4 years ago

752 does not solve the monotonic check, which is the most complex. I thought of a more elegant check than the bounds one, which is prone to issues with regular latitudes badly written: if lat and lon coordinates share dimensions, we must skip the monotonic check

zklaus commented 3 years ago

Right now, iris really can not deal with this kind of grid. It won't matter if you manage to somehow get it past the checks, since pretty much nothing else will work. However, this is about to change, partly due to the regrid repo linked above (SciTools-incubator/iris-esmf-regrid), which is part of a larger effort to extend iris to unstructured grids driven by the UK's own next-gen model that is built on a cubed sphere.

As for the data itself, it seems to me that the connectivity information is missing, ie which cell is a neighbor to which cell. This is a shortcoming that is due to the CF conventions and CMIP in some sense, where it is generally not perceived as a problem because the structured nature of the data delivers this information implicitly. Nevertheless, in the context of staggered data, where some variables live on the cell centers and others on the edges or faces, this has already been recognized as a weakness and partially been addressed in the gridspec proposal.

For a comprehensive standard for unstructured grids, the most promising candidate at the moment seems to be UGRID. More specifically, FESOM could fit in there as a 3D layered mesh topology with a 2D triangular mesh topology.

Support for unstructured grids in Iris is likely to follow the UGRID standard and introduce some new form of cube to deal with it.

@bjlittle, what do you think about this? Maybe FESOM could be an example to make sure unstructured grid support is sufficiently general?

koldunovn commented 3 years ago

If the checks are passed one at least have an opportunity to work with CMORised files that are returned. Diagnostic then can decide how to handle the data. Preprocessor functions will not work, that's true.

Explicit connectivity information is missing, but:

We can provide data in UGRID format in the future, but this does not solve the problem of FESOM data in CMIP6 the data format is settled already, no way to change it. One can provide mesh descriptions in UGRID on the side and combine data at CMORisation step, but resulting data will not be CMOR complaint.

We will be very much interested to cooperate with iris on working with FESOM meshes. We have some activity already to standardize representation of FESOM mesh as xarray accessor. Next week we will know if there is a funding for TRR181 project, where Veronika will have funding that can be in part used for implementation of unstructured grids to ESMValTool.

zklaus commented 3 years ago

Fair enough. In that case, I suggest we make it clear that the data has not passed some checks, but rather bypassed them. Perhaps one of the lenient checker settings is enough (--check-level=ignore)?

I think the fact that most preprocessor will not work, meaning that most recipes will not work when one adds this dataset, is reason enough not to let it pass the checker in normal mode.

Don't get me wrong, I think it's great that we have this data and I'm sure more unstructured data will come in the future. But perhaps this must be restricted to a separated development branch where we can successively add preprocessors that know how to deal with it until it is almost ready? In that context, I think @jvegasbsc suggestion to flag the detection of an unstructured grid by the second dimension of the bounds is reasonable. I am not aware of other datasets that have only been caught because of this.

koldunovn commented 3 years ago

Fair enough. In that case, I suggest we make it clear that the data has not passed some checks, but rather bypassed them. Perhaps one of the lenient checker settings is enough (--check-level=ignore)?

Can you or @jvegasbsc point me to the place where I should look to maybe try to add this option?

I think the fact that most preprocessor will not work, meaning that most recipes will not work when one adds this dataset, is reason enough not to let it pass the checker in normal mode.

Not that much diagnostics work with ocean data anyway, I guess :)

Don't get me wrong, I think it's great that we have this data and I'm sure more unstructured data will come in the future. But perhaps this must be restricted to a separated development branch where we can successively add preprocessors that know how to deal with it until it is almost ready? In that context, I think @jvegasbsc suggestion to flag the detection of an unstructured grid by the second dimension of the bounds is reasonable. I am not aware of other datasets that have only been caught because of this.

Yes, the MPAS-O (E3SM) already in CMIP6 and ICON-O certainly will be in the next (if there will be the next). There will be atmospheric ones as well. Currently it is still easier to throw unstructured models out of the analysis to make the life easier. This is what we see in publications, despite interpolation to regular grid is quite straightforward (https://fesom.de/cmip6/work-with-awi-cm-unstructured-data/). So we will provide interpolated data next year.

Maybe the temporary solution would be to use cdo for interpolation of FESOM data in ESMValTool by default?

Actually to make many things work for both unstructured and rectangular grids is not that difficult, you just treat everything as unstructured (points and weights in the simplest case, that covers 90% of needs). Since most of the ocean models have strange grids anyway this is quite natural when you get used to it :)

I will try to get back to it soon and come up with the PR that do if lat and lon coordinates share dimensions, we must skip the monotonic check. This will not only affect unstructured grids but curvilineal as well, I guess?

zklaus commented 3 years ago

You can have a look at esmvaltool run --help for a more detailed explanation, but in principle

esmvaltool run --check_level=ignore your_recipe.yml

should do the trick.

remi-kazeroni commented 3 years ago

Sorry for reopening this. I tried reading (no preprocessor, no diagnostics) the file mentioned above /mnt/lustre02/work/ik1017/CMIP6/data/CMIP6/CMIP/AWI/AWI-ESM-1-1-LR/historical/r1i1p1f1/Omon/thetao/gn/v20200212/thetao_Omon_AWI-ESM-1-1-LR_historical_r1i1p1f1_gn_197101-198012.nc using the lastest master branch. And I think that there are still some checks that fail: ValueError: expected 0 or 2 bound values per cell. Detailed log: main_log_debug.txt Is this behaviour expected or was this issue also about skipping that check?

In the long run, I would also be interested in processing such files but I understand that it still requires some work.

schlunma commented 3 years ago

I also found this issue while working with ICON data. A possible fix is given in #1079.

schlunma commented 1 year ago

This should be fixed now after the countless changes necessary for the ICON on-the-fly CMORizer (e.g., #1079). I got a Run was successful for /work/bd0854/DATA/ESMValTool2/CMIP6_DKRZ/CMIP/AWI/AWI-ESM-1-1-LR/historical/r1i1p1f1/Omon/tos/gn/v20200212/tos_Omon_AWI-ESM-1-1-LR_historical_r1i1p1f1_gn_197101-198012.nc :+1:

The file that @remi-kazeroni mentioned failed with an unrelated problem (Generic level coordinate olevel has wrong var_name).

Please re-open if necessary.