Closed koldunovn closed 1 year ago
I am also not sure why there are diferen procedures for 1D and 2D data, but maybe I am missing something.
I suspect this may be due to the limited support iris has for 2D lat/lon coordinates.
@zklaus Could you have a look at this issue? Do you have any advice?
To pass the checker, we need a way to detect that the coordinate is unstructured and not a bad regular one.
There is nothing in the file that directly states that, but a regular grid will only have 2 bounds per cell, not 16 like this case. I think we can skip those checks if bounds second dimension is > 2. Then you will start having trouble with nearly every function that operates on the grid, but let's solve issues one at a time
@jvegasbsc Looks like the way I did it doesn't really work, since since coordinates not nessesarelly have to have shape. What would be the right aproach? I can wrapp it in some try/except
, or if
statement (checking if coordinate has shape), but it's kind of ugly? :)
@jvegasbsc Looks like the way I did it doesn't really work, since since coordinates not nessesarelly have to have shape. What would be the right aproach? I can wrapp it in some try/except, or if statement (checking if coordinate has shape), but it's kind of ugly? :)
I would upload a pull request with the required changes
My take on it is in # #752 . Do you think it makes sence to try the if
s aproach for the piece of code in check.py
? :)
Right now, iris really can not deal with this kind of grid. It won't matter if you manage to somehow get it past the checks, since pretty much nothing else will work. However, this is about to change, partly due to the regrid repo linked above (SciTools-incubator/iris-esmf-regrid), which is part of a larger effort to extend iris to unstructured grids driven by the UK's own next-gen model that is built on a cubed sphere.
As for the data itself, it seems to me that the connectivity information is missing, ie which cell is a neighbor to which cell. This is a shortcoming that is due to the CF conventions and CMIP in some sense, where it is generally not perceived as a problem because the structured nature of the data delivers this information implicitly. Nevertheless, in the context of staggered data, where some variables live on the cell centers and others on the edges or faces, this has already been recognized as a weakness and partially been addressed in the gridspec proposal.
For a comprehensive standard for unstructured grids, the most promising candidate at the moment seems to be UGRID. More specifically, FESOM could fit in there as a 3D layered mesh topology with a 2D triangular mesh topology.
Support for unstructured grids in Iris is likely to follow the UGRID standard and introduce some new form of cube to deal with it.
@bjlittle, what do you think about this? Maybe FESOM could be an example to make sure unstructured grid support is sufficiently general?
If the checks are passed one at least have an opportunity to work with CMORised files that are returned. Diagnostic then can decide how to handle the data. Preprocessor functions will not work, that's true.
Explicit connectivity information is missing, but:
We can provide data in UGRID format in the future, but this does not solve the problem of FESOM data in CMIP6 the data format is settled already, no way to change it. One can provide mesh descriptions in UGRID on the side and combine data at CMORisation step, but resulting data will not be CMOR complaint.
We will be very much interested to cooperate with iris on working with FESOM meshes. We have some activity already to standardize representation of FESOM mesh as xarray accessor. Next week we will know if there is a funding for TRR181 project, where Veronika will have funding that can be in part used for implementation of unstructured grids to ESMValTool.
Fair enough. In that case, I suggest we make it clear that the data has not passed some checks, but rather bypassed them. Perhaps one of the lenient checker settings is enough (--check-level=ignore
)?
I think the fact that most preprocessor will not work, meaning that most recipes will not work when one adds this dataset, is reason enough not to let it pass the checker in normal mode.
Don't get me wrong, I think it's great that we have this data and I'm sure more unstructured data will come in the future. But perhaps this must be restricted to a separated development branch where we can successively add preprocessors that know how to deal with it until it is almost ready? In that context, I think @jvegasbsc suggestion to flag the detection of an unstructured grid by the second dimension of the bounds is reasonable. I am not aware of other datasets that have only been caught because of this.
Fair enough. In that case, I suggest we make it clear that the data has not passed some checks, but rather bypassed them. Perhaps one of the lenient checker settings is enough (
--check-level=ignore
)?Can you or @jvegasbsc point me to the place where I should look to maybe try to add this option?
I think the fact that most preprocessor will not work, meaning that most recipes will not work when one adds this dataset, is reason enough not to let it pass the checker in normal mode.
Not that much diagnostics work with ocean data anyway, I guess :)
Don't get me wrong, I think it's great that we have this data and I'm sure more unstructured data will come in the future. But perhaps this must be restricted to a separated development branch where we can successively add preprocessors that know how to deal with it until it is almost ready? In that context, I think @jvegasbsc suggestion to flag the detection of an unstructured grid by the second dimension of the bounds is reasonable. I am not aware of other datasets that have only been caught because of this.
Yes, the MPAS-O (E3SM) already in CMIP6 and ICON-O certainly will be in the next (if there will be the next). There will be atmospheric ones as well. Currently it is still easier to throw unstructured models out of the analysis to make the life easier. This is what we see in publications, despite interpolation to regular grid is quite straightforward (https://fesom.de/cmip6/work-with-awi-cm-unstructured-data/). So we will provide interpolated data next year.
Maybe the temporary solution would be to use cdo for interpolation of FESOM data in ESMValTool by default?
Actually to make many things work for both unstructured and rectangular grids is not that difficult, you just treat everything as unstructured (points and weights in the simplest case, that covers 90% of needs). Since most of the ocean models have strange grids anyway this is quite natural when you get used to it :)
I will try to get back to it soon and come up with the PR that do if lat and lon coordinates share dimensions, we must skip the monotonic check
. This will not only affect unstructured grids but curvilineal as well, I guess?
You can have a look at esmvaltool run --help
for a more detailed explanation, but in principle
esmvaltool run --check_level=ignore your_recipe.yml
should do the trick.
Sorry for reopening this. I tried reading (no preprocessor, no diagnostics) the file mentioned above /mnt/lustre02/work/ik1017/CMIP6/data/CMIP6/CMIP/AWI/AWI-ESM-1-1-LR/historical/r1i1p1f1/Omon/thetao/gn/v20200212/thetao_Omon_AWI-ESM-1-1-LR_historical_r1i1p1f1_gn_197101-198012.nc
using the lastest master
branch. And I think that there are still some checks that fail:
ValueError: expected 0 or 2 bound values per cell
.
Detailed log:
main_log_debug.txt
Is this behaviour expected or was this issue also about skipping that check?
In the long run, I would also be interested in processing such files but I understand that it still requires some work.
I also found this issue while working with ICON data. A possible fix is given in #1079.
This should be fixed now after the countless changes necessary for the ICON on-the-fly CMORizer (e.g., #1079). I got a Run was successful
for /work/bd0854/DATA/ESMValTool2/CMIP6_DKRZ/CMIP/AWI/AWI-ESM-1-1-LR/historical/r1i1p1f1/Omon/tos/gn/v20200212/tos_Omon_AWI-ESM-1-1-LR_historical_r1i1p1f1_gn_197101-198012.nc
:+1:
The file that @remi-kazeroni mentioned failed with an unrelated problem (Generic level coordinate olevel has wrong var_name
).
Please re-open if necessary.
Describe the bug There are several checks that do not allow AWI-CM-1-1-MR unstructured ocean data (
thetao
andso
) to pass the cmorization checks.Little background
Here is a small description of how unstructured ocean mesh in AWI-CM ocean model FESOM is organised: https://fesom.de/cmip6/work-with-awi-cm-unstructured-data/
In short, lons and lats are 1D arrays of coordinates of points (vertices of triangles). It's like 2D lons and lats for structured models, but not organised in an array. In other words, if you
ravel
the 2D array of lons and lats from structured meshes, you will get something similar (although in this case still with some structure).What I did
Longitudes are not in the 0..360 range. I try to fix it with the fix file, where shamelessly copy the code from the
check.py
. In theory_check_coord_values
should do this automatically, but it checks the number of dimentions for the longitude finds thatndim
==1 ** tries to apply irisintersection
function, that does not understand unstructured meshes. I decide to go for the fix file, which, unfortunatelly duplicates the code. Maybe we can have one moreif
in the_check_coord_values
, that would apply procedure dedicated to 2D arrays to unstructured lons as well. I am also not sure why there are diferen procedures for 1D and 2D data, but maybe I am missing something.Check for monotonicyty in
check.py
,_check_coord_monotonicity_and_direction
. The coordinates should not be monotonic, but they are 1D, so the check is triggered. Unfortunatelly there is no good indication of unstructureness of coordinates (see ncdump below), so the best thing I was able to invent is to check the shape of boundaries (in unstructured meshes it's > 4). It's a bad call, the tests are falling, since coordinates not nessesarelly have to have shape :)I would really appreciate advice on how to do it the best way, espetially from @valeriupredoi and @jvegasbsc :)