Open remi-kazeroni opened 1 year ago
I started looking into this some days ago. Here are some insights:
Reading the data with the cfgrib
engine of xarray works out of the box, and conversion to iris cubes using DataArray.to_iris()
also seems to work fine after some minor preprocessing. From the code and some tests it looks like this does not realize the data nor save the data to disk in any way (which is good!). However, to implement this into ESMValCore, we need to expand fix_file
(https://github.com/ESMValGroup/ESMValCore/issues/2129).
The raw data is stored on a reduced Gaussian grid (N320), which uses a different number of longitudes for the different latitudes. Thus, the data is not stored like a regular grid (time, latitude, longitude), but rather like an unstructured grid (time, spatial_dimension). For example, after converting to netcdf, the files look like this:
netcdf tas {
dimensions:
time = 24 ;
values = 542080 ;
variables:
int64 time(time) ;
time:long_name = "initial time of forecast" ;
time:standard_name = "forecast_reference_time" ;
time:units = "seconds since 1970-01-01" ;
time:calendar = "proleptic_gregorian" ;
double latitude(values) ;
latitude:_FillValue = NaN ;
latitude:units = "degrees_north" ;
latitude:standard_name = "latitude" ;
latitude:long_name = "latitude" ;
double longitude(values) ;
longitude:_FillValue = NaN ;
longitude:units = "degrees_east" ;
longitude:standard_name = "longitude" ;
longitude:long_name = "longitude" ;
float t2m(time, values) ;
t2m:_FillValue = NaNf ;
t2m:GRIB_paramId = 167LL ;
...
...
}
You can see that the actual variable (t2m
) just depends on two dimensions time
and values
, where values
encodes the spatial grid. The question is now, how do we deal with this? I can think of the following options:
unstructured_nearest
scheme can be used, which works fine [I tested it] but might be inaccurate).@ESMValGroup/esmvaltool-coreteam does anyone have experience with regridding data on a reduced Gaussian grid with Python? Any insights/help is much appreciated. Thank you!
One point I forgot: contrary to the ERA5 documentation, as far as I can tell all 3D variables on pressure levels are also saved on the reduced Gaussian grids (N320), not as T639 spherical harmonics. For example, temperature is listed to be on the T639 native grid, but the data on Levante is on the N320 grid. No idea if this is a service of DKRZ or an error in the ERA5 documentation.
In contrast, some variables on model levels are in fact reported on the T639 grid.
I think we are mainly interested in data on pressure levels, so we do not have to deal with spherical harmonics (for now).
Just stumbled across this by coincident! I work a lot with the DKRZ ERA5 data pool and i follow the ECMWF recommendations, .e.g., for
does anyone have experience with regridding data on a reduced Gaussian grid with Python?
I would also be interested in that, e.g., have a kind of lazy method to do it. Probably it's worth mentioning ERA5 on google cloud. I got valuable insights from their walkthrough (includes regridding with scipy)...
Probably also worth mentioning:
Thanks @larsbuntemeyer for these links, they look super interesting! I will look into that!
We can pass the data as is to the preprocessing chain and let the user deal with regridding
This would be my recommendation: some users may not want automatic regridding.
From the reviewers of our IS-ENES3 deliverable D9.5:
Angelika has just (almost) completed to retrieve the 1940–2022 hourly time series from ECMWF Mars Tape Archive to Levante /pool/data/ERA5. This comprises • surface level analysis (49 parameters) • surface level forecasts (55 parameters) • model level analysis (16 parameters) (retrieval 1940–1958 ongoing) • pressure level analysis (16 parameters) (*retrieval 1940–1950 ongoing). The data are not the 0.25° regridded ERA5 data that users can download from Copernicus CDS, but the native resolution ERA5 data (T639/N320) that can be retrieved from Mars, only. The now around 1550 Tb of data are stored directly on Levante’s disk storage, globally accessible via /pool/data/ERA5/E5.
We could think of adding a
drs
in our config file to enable ESMValTool to access this vast collection of ERA5 data. This would be interesting if DKRZ users would like to benefit from having access to that data pool. It could be an interesting test to check how well grib files are handled by ESMValTool.I'm not sure this could completely replace our own local collection of RAWOBS ERA5 data (downloaded from CDS with
era5cli
). The reason is such retrieval of huge amount of data cannot be done by our users on their own. These would need to continue relying on tools likeera5cli
orcdsapi
to create their own local pools of ERA5 data on their own machines/clusters. Thus, it might be good to continue testing ESMValTool as done now with ERA5 in our own RAWOBS to better reproduce what's done by a majority of our users.This idea is similar to that of #1246 for Jasmin. See also DKRZ docs on ECMWF reanalysis products available locally.