Open larsbuntemeyer opened 1 day ago
Here is the fraction of eobs missing values from years 1980 to 2020:
import xarray as xr
from dask.distributed import Client
client = Client(dashboard_address="localhost:8787")
store = "https://ncsa.osn.xsede.org/Pangeo/pangeo-forge/pangeo-forge/EOBS-feedstock/eobs-tg-tn-tx-rr-hu-pp.zarr"
ds = xr.open_dataset(store, engine="zarr", chunks={}).sel(time=slice("1980", "2020"))
def sum_nan(da):
return da.isnull().sum(dim="time") / da.time.size
%time tg_nan = sum_nan(ds.tg).compute()
%time pr_nan = sum_nan(ds.pp).compute()
CPU times: user 4.18 s, sys: 671 ms, total: 4.86 s
Wall time: 27.8 s
CPU times: user 2.33 s, sys: 205 ms, total: 2.54 s
Wall time: 9.47 s
tg_nan.plot()
pr_nan.plot()
There are definitely some regions to take care of when computing monthly or seasonal means.
This is the issue showing up in seasonal means of surface temperature:
There are some known issues with EOBS data that we have to deal with, e.g., there are some regions that have only limited observations and might have to be skipped for monthly and sesaonal means. We have some experience with it. Pinging @paindeer since you have worked a lot with EOBS data to evaluate REMO ERA5 output. I could not find any details in https://doi.org/10.5194/gmd-7-1297-2014 about that.