ESMValGroup / ESMValTool

ESMValTool: A community diagnostic and performance metrics tool for routine evaluation of Earth system models in CMIP
https://www.esmvaltool.org
Apache License 2.0
217 stars 126 forks source link

ERA5 Native6 setup on NCI #3230

Open rbeucher opened 1 year ago

rbeucher commented 1 year ago

Hi All,

We (ACCESS-NRI) are trying to set up the ESMValTool recipes on our system at NCI. There are quite a few recipes using the OBS6 ERA5 data which, from what I understand, are cmorised ERA5 data from the native6 raw data you have available on Mistral (#2396).

What confuses me is the name of the variables used in the cmoriser recipes, which are CMIP variable names. I was wondering how the native6 raw data were organised on Mistral? @remi-kazeroni @alistairsellar How do you use ERA5 on your system?

ls /mnt/lustre02/work/bd0854/DATA/ESMValTool2/RAWOBS/Tier3/ERA5/v1/*
1hr:
pr  tas  zg

1hrPt:
clt  evspsbl  evspsblpot  mrro  prsn  ps  psl  ptype  rlds  rls  rsds  rsdt  rss  tas  tasmax  tasmin  tdps  ts  tsn  uas  vas

fx:
orog

mon:
cl  clivi  clt  evspsbl  evspsblpot  hus  lwp  mrro  orog  pr  prsn  prw  ps  psl  ptype  rlds  rls  rsds  rsdt  rss  ta  tas  tdps  ts  tsn  uas  va  vas  zg

Does native6 actually contain native ERA5 data? or are they somehow already reformatted version of the native ERA5? Are you using symlinks to the native ERA5 files?

The ERA5 collection is huge, as you know, so I ideally we would like to use what is already available at NCI:

.
|-- pressure-levels
|   |-- monthly-averaged
|   |-- monthly-averaged-by-hour
|   `-- reanalysis
`-- single-levels
    |-- monthly-averaged
    |-- monthly-averaged-by-hour
    `-- reanalysis

Any help is appreciated!

remi-kazeroni commented 1 year ago

Hi @rbeucher, Thanks for your question. Indeed, creating a local pool of ERA5 data is not the easiest part when deploying ESMValTool on a shared machine. Since ESMValTool does not contain a downloader for ERA5 data, one needs to use an external program to get the data. At DKRZ, the pool of ERA5 data (/work/bd0854/DATA/ESMValTool2/RAWOBS/Tier3/ERA5/v1/) has been created using both cdsapi and era5cli, afaik. Since I started to maintain this pool, I'm only using era5cli. I don't think data are reformatted in the download. The key to enable usage of ERA5 data in the tool (via the so-called on-the-fly CMORization) is to put the downloaded file (e.g. era5_v_component_of_wind_1990_monthly.nc) into the right directory tree (e.g. /work/bd0854/DATA/ESMValTool2/RAWOBS/Tier3/ERA5/v1/mon/va). There is no file to do the mapping between CMOR variables and ERA5 ones.

Some clusters, like DKRZ-Levante or CEDA-Jasmin, provide a much larger collection of ERA5 data, sometimes coming from tapes (see discussions in https://github.com/ESMValGroup/ESMValTool/discussions/2183 and https://github.com/ESMValGroup/ESMValCore/issues/1991). We could start to support such large collections in ESMValTool. But we also need to keep in mind that users working on their laptops or small clusters would never have the possibility to create a large collection of ERA5 data locally. That is why we have, up to now, maintained our own collection of ERA5 data at DKRZ. This pool contains all data required to run all recipes in ESMValTool.

Let me know if you have further doubts or questions on this.

Btw, you might want to add an entry for your institution in the config-user.yml file. This should help your users to configure ESMValTool easily when working on your systems.

rbeucher commented 1 year ago

Thanks @remi-kazeroni . That was my suspicion. I'll work on this and will get back to you.

R

hot007 commented 1 year ago

Hi @remi-kazeroni , regarding "We could start to support such large collections in ESMValTool." how much effort would this be do you think? We really do not want to be keeping a second copy of the data at NCI, as we already have a centrally maintained ERA5 replica, we ideally want to be able to CMORise for ESMValTool on the fly if needed. I think NCI would also very much frown upon any additional replication of ERA5 data, maybe it'd be possible for @rbeucher to create a symlink tree with the ESMValTool-compliant directory structure pointing to the ERA5 files?

rbeucher commented 1 year ago

Yes. My idea is to try a symlink tree.

remi-kazeroni commented 1 year ago

It is indeed better to make use of already available ERA5 data instead replicated them. An alternative to symlink trees could be to add an entry for NCI in the config-developer.yml file for native6 here to reflect the directory structure of your ERA5 data pool, if that makes sense in your case.

rbeucher commented 1 year ago

The pb with this is that ERA5 don't use CMIP vocabulary. So it's hard to map the variables. Or am I missing something?

BTW , what do you do with derived variables? Do you add a new Netcdf file to the pool after calculating the values?

rbeucher commented 1 year ago

Any idea where I can find a mapping between CMIP variable names and ERA5 variable names?

rbeucher commented 1 year ago

OK I have made some progress. The symlink tree does work and is not too hard to set up. I have mapped 90% of the variables. Still a few issues. I will document the process and share a link for reference.

remi-kazeroni commented 1 year ago

BTW , what do you do with derived variables? Do you add a new Netcdf file to the pool after calculating the values?

If both ERA5 variables needed for derivation are defined in CMOR tables, you could use derive: true in the recipe and derive the variables on-the-fly. Afaik, we don't store derived ERA5 variables in our DKRZ data pool.

Any idea where I can find a mapping between CMIP variable names and ERA5 variable names?

I wish I could answer that... For ERA5 variables supported in ESMValTool, you can take a look at the example recipe creating daily data in which you can see the mapping with the era5_name keys.

I will document the process and share a link for reference.

That'd be great, thanks!

rbeucher commented 1 year ago

Thank you @remi-kazeroni . Yes I have found the era5 recipe example very useful to do the mapping.