SciTools / iris

A powerful, format-agnostic, and community-driven Python package for analysing and visualising Earth science data
https://scitools-iris.readthedocs.io/en/stable/
BSD 3-Clause "New" or "Revised" License
635 stars 284 forks source link

Regridding with `iris.analysis.UnstructuredNearest` does not preserve `dtype` and masks #4463

Closed schlunma closed 2 years ago

schlunma commented 2 years ago

🐛 Bug Report

Hi guys, I found two issues with regridding using iris.analysis.UnstructuredNearest:

  1. The data type of the input cube is not preserved, the output type is always float64.
  2. The output is not masked, i.e., the output data is a regular array (and not a masked array) that contains values of the fill_value.

How To Reproduce

Minimal working example using this dataset (this is an nc file, but GitHub does not allow nc files): test.nc.txt

Test dataset ``` netcdf test.nc { dimensions: time = 1 ; dim1 = 10 ; bnds_3 = 3 ; variables: float tas(time, dim1) ; tas:long_name = "temperature in 2m" ; tas:units = "K" ; tas:coordinates = "clat clon" ; double time(time) ; time:standard_name = "time" ; time:invalid_units = "day as %Y%m%d.%f" ; float clat(dim1) ; clat:bounds = "clat_bnds" ; clat:units = "degrees_north" ; clat:standard_name = "latitude" ; clat:long_name = "center latitude" ; float clat_bnds(dim1, bnds_3) ; float clon(dim1) ; clon:bounds = "clon_bnds" ; clon:units = "degrees_east" ; clon:standard_name = "longitude" ; clon:long_name = "center longitude" ; float clon_bnds(dim1, bnds_3) ; // global attributes: :CDI = "Climate Data Interface version 1.8.3rc (http://mpimet.mpg.de/cdi)" ; :CDI_grid_type = "unstructured" ; :uuidOfHGrid = "e941b1d0-ab58-11e8-ba4f-bdd1e82b9e6d" ; :Conventions = "CF-1.7" ; data: tas = _, 293.6598, 292.4131, _, _, _, _, _, 291.7204, 291.7908 ; time = 20051001 ; clat = 52.61851, 52.93201, 52.93201, 51.98309, 53.88348, 53.56083, 54.50582, 53.56083, 51.97181, 52.60703 ; clat_bnds = 53.25029, 52.29887, 52.29887, 52.29887, 53.23846, 53.25029, 53.25029, 53.23846, 52.29887, 52.29887, 51.34985, 52.29887, 54.19249, 54.19249, 53.25029, 53.25029, 53.23846, 54.19249, 54.19249, 55.13753, 54.19249, 54.19249, 53.23846, 53.25029, 52.27613, 51.33891, 52.29887, 52.29887, 53.23846, 52.27613 ; clon = 73, 73.91156, 72.08844, 73, 73, 73.92289, 73, 72.07711, 71.22221, 71.20061 ; clon_bnds = 73, 72.10562, 73.89438, 73.89438, 74.83416, 73, 73, 71.16584, 72.10562, 72.10562, 73, 73.89438, 73.94114, 72.05886, 73, 73, 74.83416, 73.94114, 73.94114, 73, 72.05886, 72.05886, 71.16584, 73, 70.31771, 71.25498, 72.10562, 72.10562, 71.16584, 70.31771 ; } ```
# Regrid unstructured grid

import numpy as np
import iris
import iris.coords
import iris.cube
from iris.analysis import UnstructuredNearest

cube = iris.load_cube('test.nc')

scheme = UnstructuredNearest()
target_lat = iris.coords.DimCoord([50.0, 52.0, 54.0], standard_name='latitude', units='degrees')
target_lon = iris.coords.DimCoord([70.0, 71.0, 72.0], standard_name='longitude', units='degrees')
target_grid = iris.cube.Cube(np.full((3, 3), 0.0), dim_coords_and_dims=[(target_lat, 0), (target_lon, 1)])

regridded_cube = cube.regrid(target_grid, scheme)

print("dtype of original cube:", cube.dtype)
print("dtype of regridded cube:", regridded_cube.dtype)
print("")

print("data:")
print(regridded_cube.data)
print("data is masked?", np.ma.is_masked(regridded_cube.data))

print("")
print("Original fill value:", cube.data.fill_value)

gives

dtype of original cube: float32
dtype of regridded cube: float64

data:
[[[2.91720398e+02 2.91720398e+02 2.91720398e+02]
  [2.91720398e+02 2.91720398e+02 2.91720398e+02]
  [9.96920997e+36 9.96920997e+36 9.96920997e+36]]]
data is masked? False

Original fill value: 9.96921e+36

Expected behaviour

The dtype should be preserved, i.e., in the example both dtypes should be float32 and the data should be masked correctly, i.e., it should look like this:

[[[2.91720398e+02 2.91720398e+02 2.91720398e+02]
  [2.91720398e+02 2.91720398e+02 2.91720398e+02]
  [      --             --             --      ]]]

Environment

I know that you are currently refactoring the support of unstructured grids. I'm not entirely sure if this also includes new regridders or not; if this is the case and you are in the middle of rewriting this regridder, please ignore this issue.

Thanks for your help :+1:

bjlittle commented 2 years ago

Hey @schlunma,

Great to hear from you and thanks for taking the time to raise this issue 👍

Note that, if you're interested in regridding with unstructured grids, I'd highly recommend that you take a look at https://github.com/SciTools-incubator/iris-esmf-regrid

We're actively developing this package in conjunction with our current UGRID work, and @stephenworsley (lead developer) would be super keen for feedback... He'd also be on hand to help you with using iris-esmf-regrid and introducing you to that world.

Note that, like all our regridders, it plums straight into iris using the regrid/interpolate scheme pattern... give it a whirl.

It's also available on both conda-forge and pypi 👍

schlunma commented 2 years ago

Thanks for the quick answer @bjlittle!

iris-esmf-regrid looks super interesting, I will definitely have a look at that. One question on this: is there any documentation available for it? I couldn't find it on the GitHub page. Thanks!!

bjlittle commented 2 years ago

No worries, glad to help.

It's a great question, thanks for asking. The simple answer is no, not yet... It's that new! 😉

Although I'm sure @stephenworsley has plans in the pipeline. Why not create an issue on iris-esmf-regrid asking for some rudimentary docs. If you want it, so will others, so let's make sure that happens.

Perhaps @stephenworsley could provide you with some code examples in the meantime.

What do you think @stephenworsley ? What's the best way for @schlunma to spin up and start playing ? 🤔

jonseddon commented 2 years ago

Rudimentary docs would be really useful for me too please and so I've created SciTools-incubator/iris-esmf-regrid#139

bjlittle commented 2 years ago

Note that, at the moment iris-esmf-regrid only supports area weighted regridding from lat/lat to unstructured, and vice versa... Although @stephenworsley is more qualified to comment on that

schlunma commented 2 years ago

Thanks for opening the issue @jonseddon!

Note that, at the moment iris-esmf-regrid only supports area weighted regridding from lat/lat to unstructured, and vice versa... Although @stephenworsley is more qualified to comment on that

That sounds great for a start! I actually have a use case where area-weighted regridding from unstructured grids to lat/lon could come in handy :+1:

stephenworsley commented 2 years ago

So currently iris-esmf-regrid only handles area weighted regridding. This is not applicable in all cases since it requires bounds information to be present on the source and target cubes, though if you do have compatible cubes you should find it useful. With that said, ESMF itself does provide good support for nearest neighbour regridding and I think it would be worthwhile to eventually add a nearest neighbour regridder to iris-esmf-regrid. I imagine this may potentially be able to cover a wider range of types of nearest neighbour regridding than iris currently handles (e.g. distinguishing between nearest-source-to-destination and nearest-destination-to-source regridding) so feel free to add an issue to iris-esmf-regrid if you feel like you need this functionality.

stephenworsley commented 2 years ago

Also, I'm interested to know what you believe the correct handling of masks should be. Comments in the code suggest that the current handling is not correct: https://github.com/SciTools/iris/blob/3638874a63e9bbee72b062fe02a8e500982eb796/lib/iris/analysis/trajectory.py#L413-L415 Though I could think of different behaviours which could claim to be "more correct".

  1. Masking the output when the nearest input is masked
  2. Taking the value of the nearst unmasked input.

I would imagine that the first of these would be easier to implement, especially when additional dimensions are involved. It may also be more appropriate in certain cases, though I could imagine there may also be cases where the second case would be more appropriate. I'd have to do some more research into existing nearest neighbour regridders like ESMF to see which is "standard".

schlunma commented 2 years ago

Hmm...that's a tricky question. For me both options make sense, but I tend to agree more with option 1.

From what I can tell from the code (filling the data before interpolation) I guess you are kind of using option 1 at the moment without applying a mask afterwards but just leaving the fill values as is, right?

stephenworsley commented 2 years ago

That appears to be right as far as I can tell, yes.