JiaweiZhuang / xESMF

Universal Regridder for Geospatial Data
http://xesmf.readthedocs.io/
MIT License
269 stars 49 forks source link

Keep xarray attributes and dtype after regridding #66

Open raspstephan opened 4 years ago

raspstephan commented 4 years ago

Currently, the regridding seems to delete the attributes of the original dataset. I assume the happens during xr.apply_ufunc. Is there any reason not use keep_attrs=True?

Similarly, all data is converted to 64 bit floats, even if the input data is 32 bit. Would it be reasonable to use output_dtypes=[dr_in.dtype] instead of output_dtypes=[float]?

I am happy to create a pull request of this if nothing speaks against these changes.

JiaweiZhuang commented 4 years ago

Thanks for bringing this up. PRs are welcome!

Is there any reason not use keep_attrs=True?

The only reason is that xr.apply_ufunc defaults to keep_attrs=False and I just keep the defaults. To be consistent with xr.apply_ufunc, I would suggest an optional keep_attrs kwarg that defaults to False, and you can set it to True if needed.

regridder(indata)  # doesn't keep attributes
regridder(indata, keep_attrs=True)  # keeps attributes

Regarding the data type, that's because ESMF stores regridding weights in float64. In numpy, float32 * float64 gives float64. Changing output_dtypes won't actually help in this case. Consider this example:

import numpy as np
import xarray as xr
a = np.array([1, 2, 3], dtype=np.float64)
x = np.array([1, 2, 3], dtype=np.float32)
out = a * x
out.dtype  # float64
out2 = xr.apply_ufunc(lambda x: a * x, x, output_dtypes=[np.float32])
out2.dtype  # still float64

You can cast regridder.weights to np.float32, using scipy.sparse.coo_matrix.astype(). This is actually also useful for nearest neighbor methods where the weights are just 1.0 and can be cast to integers for regridding categorical variables.

JiaweiZhuang commented 4 years ago

Is it useful to have a method to set weights dtype in the Regridder class? It would be just one line:

def set_dtype(self, dtype)
    self.weights = self.weights.astype(dtype)
raspstephan commented 4 years ago

I created a pull request to implement keep_attrs. The datatype is not such a big issue for me, since it's just as easy to convert the data afterwards.