ESMValGroup / ESMValTool

ESMValTool: A community diagnostic and performance metrics tool for routine evaluation of Earth system models in CMIP
https://www.esmvaltool.org
Apache License 2.0
215 stars 126 forks source link

Linear interpolation may generate missing values close to poles #806

Closed jhardenberg closed 5 years ago

jhardenberg commented 5 years ago

When using the preprocessor to interpolate on a common coarse grid using "linear" interpolation, missing NaN values may be generated. For example, this will occur with interpolation onto a 2.5 degree grid in the Northern Hemisphere (from 0 to 90 deg north). All models with resolution coarser that 1.25 degrees will then not have a grid point north of 88.75 deg and the esmvaltool preprocessor, rather than extrapolating creates a row of NaN values close to the pole. Is this desired behaviour? It will create problems if the diagnostic does not expect NaN values in input. The "area_weighted" method does not have this problem.

See also: https://github.com/ESMValGroup/ESMValTool/pull/735#issuecomment-457856685

mattiarighi commented 5 years ago

By NaN do you mean missing values (marked with _FillValue attribute)?

We have tested the regridding against ESMValToolv1, which used the same regridding method but in a different language (NCL) and the results were identical. So my feeling is that this is a feature of the regridding algorithm when using the linear interpolation.

jhardenberg commented 5 years ago

By NaN do you mean missing values (marked with _FillValue attribute)?

yes

We have tested the regridding against ESMValToolv1, which used the same regridding method but in a different language (NCL) and the results were identical. So my feeling is that this is a feature of the regridding algorithm when using the linear interpolation.

The creation of these missing values is certainly a 'feature' in the sense that "linear" interpolation only interpolates between two points, so when asked to produce values close to the pole on a grid which is finer than the resolution of the original grid (or offset by half a grid point) it produces missing values because it cannot extrapolate. The question is if this is desired/expected behaviour. It certainly is inconvenient for all diagnostics which produce statistics e.g. in the NH regridding on a common grid (example blocking) and which do not expect to find missing values in input.

mattiarighi commented 5 years ago

Missing values in the input data are a pretty common case, so I would expect all diagnostic to be able to deal with them, otherwise they could switch to the area-weighted method which does not have such problem.

Anyway, since this is a feature of the regridding scheme, we cannot change it and we have to deal with it.

bouweandela commented 5 years ago

Technically it would be possible to expose more functionality of iris in the recipe. If this is also scientifically a good choice I cannot comment on though. At the moment, iris supports four regridding schemes, which we also support:

and see also this section of the iris manual.

In addition to that, we also support regridding irregular grids through esmpy and regridding vertically through python-stratify.

We could change the arguments of the regrid preprocessor function to allow also specifying the keyword argument in the recpie, at the moment that is hardcoded, because we support these schemes: https://github.com/ESMValGroup/ESMValTool/blob/ceb51cc747ecc171aae1cc0d3bfca2445609f864/esmvaltool/preprocessor/_regrid.py#L51-L57

mattiarighi commented 5 years ago

Related to #815

jhardenberg commented 5 years ago

I would simply add an additional horizontal interpolation scheme in _regrid.py

 HORIZONTAL_SCHEMES = { 
     'linear': Linear(extrapolation_mode='mask'), 
     'linear_extrapolate': Linear(extrapolation_mode='extrapolate'), 
     'nearest': Nearest(extrapolation_mode='mask'), 

I did so in branch https://github.com/ESMValGroup/ESMValTool/tree/fix_preproc_meridional_grid and it works flawlessly.

mattiarighi commented 5 years ago

That's a good solution. As for this comment, please open a PR with this change. Or you can combine the fix for this and #815 in the same PR.

valeriupredoi commented 5 years ago

the Nearest(extrapolation_mode='mask') interpolator performs the best when it comes to missing data (masked data) as input, that's why chose it as default