Closed richardarsenault closed 4 years ago
I've collected a fair amoint of data from ERA5 at Ouranos. The extracts using the CDSAPI tool don't follow CF-Conventions but I have some treatment scripts that can help in getting them closer to it. This is worth planning out in greater detail.
@richardarsenault Could you include here what you need in terms of variables, frequency, spatial coverage, etc.
@Zeitsperre Are the original files themselves cf-compliant ? Or is the extraction process to blame for removing metadata ?
Precip, Tmax and Tmin, ideally daily time step, coverage all of North America. Longest possible time series also (i.e. 1979-most recent).
@huard The extraction process is to blame for removing most of the metadata. The way that the data extraction is set up, I don't think one can download source files in any case.
@richardarsenault I have all of these extracted and they are currently stored in a somewhat CF-like format. If we can navigate data access rights, I can place on copy on our prod server with relative ease.
edit: Actually, I have them at the hourly time-step for that time period. I imagine that's also fine.
@Zeitsperre Is this something worth a bug report to the cds toolbox folks ? I'm concerned that existing ERA-5 users access our repo, find the the files are different from those they're using, and end up being confused. Suggestions to make the whole thing as transparent as possible ?
Have you agreed something with Nathalie/Agnès regarding data access rights and credits ?
It's easier to show what I mean when I say that the metadata does not follow conventions:
netcdf pr_era5_reanalysis_hourly_2005 {
dimensions:
longitude = 1440 ;
latitude = 721 ;
time = 8760 ;
variables:
float longitude(longitude) ;
longitude:units = "degrees_east" ;
longitude:long_name = "longitude" ;
float latitude(latitude) ;
latitude:units = "degrees_north" ;
latitude:long_name = "latitude" ;
int time(time) ;
time:units = "hours since 1900-01-01 00:00:00.0" ;
time:long_name = "time" ;
time:calendar = "gregorian" ;
short tp(time, latitude, longitude) ;
tp:scale_factor = 1.15288460309059e-06 ;
tp:add_offset = 0.0377754169048664 ;
tp:_FillValue = -32767s ;
tp:missing_value = -32767s ;
tp:units = "m" ;
tp:long_name = "Total precipitation" ;
// global attributes:
:Conventions = "CF-1.6" ;
:history = "2019-08-27 07:22:14 GMT by grib_to_netcdf-2.10.0: /opt/ecmwf/eccodes/bin/grib_to_netcdf -o /cache/data5/adaptor.mars.internal-1566889333.6351957-31393-16-fe468c59-e6fb-4107-921d-a0af244b62d0.nc /cache/tmp/fe468c59-e6fb-4107-921d-a0af244b62d0-adaptor.mars.internal-1566889333.6359959-31393-6-tmp.grib" ;
}
I'd like to try subsetting this and sending a sample to an online CF-Checker but even the lat and lon and lon dimensions don't follow what we normally have in most NetCDFs. Opening a PR for this on xclim right now.
Hi @Zeitsperre! Do you mean that you can't run RAVEN with it or you have trouble with something else? The header looks pretty good to me.
It runs perfectly fine, the issues here are that they don follow "typical" CF-Compliant nomenclature. For example, "latitude" and "longitude" dimension are almost always written "lat" and "lon". Also, the units for total precipitation don't follow convention (the variable name should be 'pr', with standard name 'precipitation_amount' and units of 'kg m-2' and appropriate cell methods).
The issues I'm directly having are more related to xclim (lat/lon dim names) that don't permit me to use our toolset to even spatially subset the data (this is due to oversight on my part).
The issues I can see are that these inconsistencies are going to be edge cases that need to be handled within RAVEN's individual processes.
I see. When we start including CaSPAr data you will likely run into similar issues. Below is the header of a classic CaSPAr NetCDF file (HRDPA = CaPA 2.5k). Do you think you will be able to handle/subset these?
netcdf \2020010806 {
dimensions:
time = UNLIMITED ; // (1 currently)
rlon = 2540 ;
rlat = 1290 ;
variables:
int time(time) ;
time:long_name = "time" ;
time:units = "hours since 2020-01-08 06:00:00" ;
time:calendar = "gregorian" ;
time:standard_name = "time" ;
time:axis = "T" ;
float rlon(rlon) ;
rlon:long_name = "longitude in rotated pole grid" ;
rlon:units = "degrees" ;
rlon:eccc_grid_definition = "grtyp: E, ig1: 1430, ig2: 500, ig3: 56000, ig4: 44000" ;
rlon:standard_name = "grid_longitude" ;
rlon:axis = "X" ;
float rotated_pole ;
rotated_pole:long_name = "coordinates of the rotated North Pole" ;
rotated_pole:grid_mapping_name = "rotated_latitude_longitude" ;
rotated_pole:earth_radius = 6371220.f ;
rotated_pole:grid_north_pole_latitude = 36.08852f ;
rotated_pole:grid_north_pole_longitude = 65.30515f ;
rotated_pole:north_pole_grid_longitude = 0.f ;
rotated_pole:longitude_of_prime_meridian = 0.f ;
float rlat(rlat) ;
rlat:long_name = "latitude in rotated pole grid" ;
rlat:units = "degrees" ;
rlat:eccc_grid_definition = "grtyp: E, ig1: 1430, ig2: 500, ig3: 56000, ig4: 44000" ;
rlat:standard_name = "grid_latitude" ;
rlat:axis = "Y" ;
float lon(rlat, rlon) ;
lon:long_name = "longitude" ;
lon:units = "degrees_east" ;
lon:eccc_grid_definition = "grtyp: Z, ig1: 39561, ig2: 41085, ig3: 1, ig4: 0" ;
lon:standard_name = "longitude" ;
float lat(rlat, rlon) ;
lat:long_name = "latitude" ;
lat:units = "degrees_north" ;
lat:eccc_grid_definition = "grtyp: Z, ig1: 39561, ig2: 41085, ig3: 1, ig4: 0" ;
lat:standard_name = "latitude" ;
float CaPA_fine_A_PR_SFC(time, rlat, rlon) ;
CaPA_fine_A_PR_SFC:long_name = "Analysis: Quantity of precipitation" ;
CaPA_fine_A_PR_SFC:units = "m" ;
CaPA_fine_A_PR_SFC:grid_mapping = "rotated_pole" ;
CaPA_fine_A_PR_SFC:coordinates = "lon lat" ;
float CaPA_fine_A_CFIA_SFC(time, rlat, rlon) ;
CaPA_fine_A_CFIA_SFC:long_name = "Analysis: Confidence Index of Analysis CAPA" ;
CaPA_fine_A_CFIA_SFC:units = "1" ;
CaPA_fine_A_CFIA_SFC:grid_mapping = "rotated_pole" ;
CaPA_fine_A_CFIA_SFC:coordinates = "lon lat" ;
// global attributes:
:product = "CaPA_fine" ;
:Conventions = "CF-1.6" ;
:Remarks = "Variable names are following the convention <Product>_<Type:A=Analysis,P=Prediction>_<ECCC name>_<Level/Tile/Category>. Variables with level \'10000\' are at surface level. The height [m] of variables with level \'0XXXX\' needs to be inferrred using the corresponding fields of geopotential height (GZ_0XXXX-GZ_10000). The variables UUC, VVC, UVC, and WDC are not modelled but inferred from UU and VV for convenience of the users. Precipitation (PR) is reported as 6-hr accumulations for CaPA_fine and CaPA_coarse. Precipitation (PR) are accumulations since beginning of the forecast for GEPS, GDPS, REPS, RDPS, HRDPS, and CaLDAS." ;
:License = "These data are provided by the Canadian Surface Prediction Archive CaSPar. You should have received a copy of the License agreement with the data. Otherwise you can find them under http://caspar-data.ca/doc/caspar_license.txt or email caspar.data@uwaterloo.ca." ;
}
Yes. I'm certain that so long as the dimensions make reference to the spatial transformation (ie: rotated pole
with lat
and lon
) they should be understoud. The only issues I can see are the variable names. I would need a small sample to see exactly what constitutes an CF-Convention error.
There you go:
Unable to open the link? | Click on this link and paste the following tracking number: XR5HWF5WGA9EQ8ZA
Note that Raven itself will be able to understand the files even if the variable names are not standard. There is a list of "alternative" variable names.
Exactly! Also the name of dimensions is not hard-coded in RAVEN. But I understand that you need to standardize them for the spatial cropping.
The data looks good and with David's information, we needn't worry about the variable names. I'm nearly done the workarounds necessary to support ERA5 data in xclim. I'll move to see what needs changing exactly for it in RAVEN after.
Just ran some ERA5 data processed via the NetCDF extraction (CDSAPI) through the CEDA CF-Checker. This is what I got back:
CHECKING NetCDF FILE: /group_workspaces/jasmin4/ceda_wps/production/cache/uploads/test.nc
=====================
Using CF Checker Version 3.0.5
Checking against CF Version CF-1.6
Using Standard Name Table Version 70 (2019-12-10T14:47:41Z)
Using Area Type Table Version 9 (07 August 2018)
------------------
Checking variable: longitude
------------------
INFO: attribute _FillValue is being used in a non-standard way
------------------
Checking variable: latitude
------------------
INFO: attribute _FillValue is being used in a non-standard way
------------------
Checking variable: time
------------------
------------------
Checking variable: tp
------------------
ERROR: Attribute missing_value of incorrect type
ERRORS detected: 1
WARNINGS given: 0
INFORMATION messages: 2
The variable name is obviously not standard and the _FillValue
is wrong on all coordinates but it's not as bad I imagined (the GlobalAttributes are bare, but that's fine under the conventions). If we want to try running it in RAVEN and seeing where/if it causes processes to fail, that would be a start. I can start by moving what we have on hand into the server at the very least (North America: U-component of wind, V-component of wind, Snowfall; Globally: 2m-temperature, Total precipitation; All variables are at the hourly time frequency).
We can later fix the metadata attributes and variable names using an NcML file. By the way @julemai this is something that you might be interested in for your DAP server. THREDDS has support for NcML (netcdf markup language), which allows you to virtually aggregate and tweak metadata using an XML file that refers to the files on disk.
Data is transferring now. It might take all weekend. I'll have updates on Monday.
Data is transferred and accessible under the folder ecmwf
.
Could you add the ERA5 license there too ?
Done.
We need to add some reference observation data for hydrological modelling (calibration, simulation, eventually forecasting). As of now there is no data source. ERA-5 is a reanalysis product that covers 1979-2018 globally at a ~31km resolution. We can distribute it freely conditionally on mentioning the source:
Format is NetCDF and is an hourly product. The hourly properties will require converting from GMT to local time for averaging over a given catchment.