JiaweiZhuang / xESMF

Universal Regridder for Geospatial Data
http://xesmf.readthedocs.io/
MIT License
269 stars 49 forks source link

Masked data is returned as 0.0 after gridding, how can these pixels be identified if zeros exist in input data? #51

Closed JSAnandEOS closed 5 years ago

JSAnandEOS commented 5 years ago

So I'm using the "conservative_normed" algorithm provided in the "masking" branch of xESMF to grid some MODIS GPP data to a lower spatial resolution (fine to coarse). Being a land-only product, the ocean pixels are invalid and so need to be masked. After masking and running xESMF on the data these regions now appear as zeroes, as expected.

My problem is that valid zero values also exist in the input data over regions with no vegetation (e.g. deserts). Therefore, in the resulting array I can't readily tell which pixels are invalid, and which contain real data. How do I get around this? Is there any way to output a mask of which pixels contain real (i.e. no data was binned at all)? Thanks.

JiaweiZhuang commented 5 years ago

the ocean pixels are invalid and so need to be masked.

in the resulting array I can't readily tell which pixels are invalid, and which contain real data. How do I get around this?

Do you mean that the input data (on source grid) are all NaNs cover the ocean region? In that case, the output data will also be NaNs over the ocean, by default. You don't need to apply additional masking. In many cases, "masking" just means "setting NaN to zeros" (https://github.com/JiaweiZhuang/xESMF/issues/22#issuecomment-402320570), which might not be what you actually want.

If you input data do not even cover the ocean region (i.e. a regional grid only over land), but the output grid is global, then the undefined ocean region will have zeros instead of NaNs, by default. To flip this behavior see https://github.com/JiaweiZhuang/xESMF/issues/15#issuecomment-371646763.

JSAnandEOS commented 5 years ago

Do you mean that the input data (on source grid) are all NaNs cover the ocean region?

In addition to the ocean, there are also certain areas where for whatever reason (say, cloud cover) the data is invalid, so these regions have to be removed from the gridding as well. I have currently set these to NaNs as well. These are different to areas where the data is zero (e.g. deserts), because these values are still valid.

You don't need to apply additional masking. In many cases, "masking" just means "setting NaN to zeros" (#22 (comment)), which might not be what you actually want.

I had originally wanted to use conservative gridding with NaNs and zero values, but I encountered the same problem as #22, where large sections of coastal regions were missing in the final gridded dataset, despite having non-zero input data near those regions. The discussion about "conservative_normed" suggested that I needed to do both masking and setting unwanted areas to NaNs in order to deal with both coastal regions and areas with invalid data.

JiaweiZhuang commented 5 years ago

If I understand correctly, then you need to

  1. Use "conservative_normed" with additional masks for NaN values, when building the regridder, just like what you did right now.
  2. Then, after building the regridder, apply the trick at https://github.com/JiaweiZhuang/xESMF/issues/15#issuecomment-371646763 so that "real zeros" and "mask-generated zeros" can be distinguished.

Does this produce what you expected?

JSAnandEOS commented 5 years ago

If I understand you correctly, the regridding should be done like so:

import scipy
import xesmf as xe
import numpy as np

def add_matrix_NaNs(regridder):
    X = regridder.A
    M = scipy.sparse.csr_matrix(X)
    num_nonzeros = np.diff(M.indptr)
    M[num_nonzeros == 0, 0] = np.NaN
    regridder.A = scipy.sparse.coo_matrix(M)
    return regridder

def regrid(ds_in, ds_out, dr_in, method = 'conservative_normed'):
    regridder = xe.Regridder(ds_in, ds_out, method, periodic=True, reuse_weights=False)
    regridder = add_matrix_NaNs(regridder)
    dr_out = regridder(dr_in)
    regridder.clean_weight_file()
    return dr_out

Is this correct?

JiaweiZhuang commented 5 years ago

Yes this should mark undefined regions as NaNs while keeping real zeros untouched. However it is a very niche edge case, so I am not entirely sure if it is correct. Let me know if it works.

JSAnandEOS commented 5 years ago

I apologise for the late reply, but I am pleased to report that this solution works. Thanks!

JiaweiZhuang commented 5 years ago

Great! Just notice that 0.2.0 deprecates regridder.A in favor of regridder.weights (https://github.com/JiaweiZhuang/xESMF/commit/792e2288f883713ec206c2c837fd3bd6ed345894)

I'd like to have a simpler option in the main branch to set different mask-handling behavior, to avoid this ad-hoc fix from users. But given the subtlety of masking, it probably requires more study. Not having a clear timeline right now.