google-research / neuralgcm

Hybrid ML + physics model of the Earth's atmosphere
https://neuralgcm.readthedocs.io
Apache License 2.0
104 stars 10 forks source link

land-sea mask #77

Open yuliang954 opened 1 month ago

yuliang954 commented 1 month ago

Hi NeuralGCM team,

I wonder how the land-sea mask is specified in NeuralGCM. For example, in the 1.4-deg deterministic model, is it based on the NaN values after regridding the 0.25-deg resolution ERA5 SST using xarray_utils.regrid?

I noticed that the very first inference demo yielded quite different locations of NaN values after regridding from the current demo. The current demo has more ocean (i.e., no-NaNs) area. Maybe you changed xarray_utils.regrid a little?

Thanks! Yu

shoyer commented 1 month ago

Our trained models used regridding as shown in the current online docs/demo notebook.

The original demo notebook was using a mistaken setting (horizontal_interpolation.ConservativeRegridder with skipna=False) which resulted in NaN values on coastlines, that needed to be filled with nearest neighbors values.

I wrote a detailed guide to regridding here. Please take a look and let me know if you still have questions.

yuliang954 commented 1 month ago

This information is very helpful, Shoyer! I have a question about the guide where you mentioned that "NeuralGCM’s surface model also includes a mask that ignores values over land." After regridding the ERA5 data to the targeted Gaussian grid, do the land values or locations that will be ignored correspond to NaNs?

shoyer commented 1 month ago

I have a question about the guide where you mentioned that "NeuralGCM’s surface model also includes a mask that ignores values over land." After regridding the ERA5 data to the targeted Gaussian grid, do the land values or locations that will be ignored correspond to NaNs?

In principle, the only locations that take sea surface temperature input account are locations where the land/sea mask is less than 1, so the details of the NaN filling should not matter.

In practice, the model does seem to make slightly different predictions when NaN values are filled in an inconsistent fashion. I don't think we ever quite tracked down why this is the case -- possibly there are some inconsistencies in because ERA5's land/sea mask and SST or sea ice fields.

yuliang954 commented 1 month ago

I am asking because I am trying to couple NeuralGCM with a statistical SST model, and I need to know the exact ocean grids from which NeuralGCM reads SST information.

So, after regridding the 0.25-degree ERA5 SST data to the targeted Gaussian grid, are the grids with NaN values actually a subset of the land grids? In other words, do the grids with non-NaN values include all ocean grids and some coastal land grids?

If this is the case, I can provide NeuralGCM with SST on all grids with non-NaN values, since NeuralGCM won't read the coastal land grids anyway.

shoyer commented 1 month ago

So, after regridding the 0.25-degree ERA5 SST data to the targeted Gaussian grid, are the grids with NaN values actually a subset of the land grids? In other words, do the grids with non-NaN values include all ocean grids and some coastal land grids?

Yes, this is correct.

yuliang954 commented 1 month ago

Thanks!