Ouranosinc / raven

WPS services related to hydrological modeling
https://pavics-raven.readthedocs.io
MIT License
37 stars 12 forks source link

Add ERA-5 database to server as proxy-observed weather data #183

Closed richardarsenault closed 4 years ago

richardarsenault commented 4 years ago

We need to add some reference observation data for hydrological modelling (calibration, simulation, eventually forecasting). As of now there is no data source. ERA-5 is a reanalysis product that covers 1979-2018 globally at a ~31km resolution. We can distribute it freely conditionally on mentioning the source:


Please acknowledge the use of ERA5 as stated in the Copernicus C3S/CAMS License agreement:

"5.1.2 Where the Licensee communicates or distributes Copernicus Products to the public, the Licensee shall inform the recipients of the source by using the following or any similar notice:

'Generated using Copernicus Climate Change Service Information [Year]'.

5.1.3 Where the Licensee makes or contributes to a publication or distribution containing adapted or modified Copernicus Products, the Licensee shall provide the following or any similar notice:

'Contains modified Copernicus Climate Change Service Information [Year]';

5.1.3 Any such publication or distribution covered by clauses 5.1.1 and 5.1.2 shall state that neither the European Commission nor ECMWF is responsible for any use that may be made of the Copernicus Information or Data it contains."

Format is NetCDF and is an hourly product. The hourly properties will require converting from GMT to local time for averaging over a given catchment.

huard commented 4 years ago

See https://github.com/Ouranosinc/pavics-sdi/issues/144

Zeitsperre commented 4 years ago

I've collected a fair amoint of data from ERA5 at Ouranos. The extracts using the CDSAPI tool don't follow CF-Conventions but I have some treatment scripts that can help in getting them closer to it. This is worth planning out in greater detail.

huard commented 4 years ago

@richardarsenault Could you include here what you need in terms of variables, frequency, spatial coverage, etc.

@Zeitsperre Are the original files themselves cf-compliant ? Or is the extraction process to blame for removing metadata ?

richardarsenault commented 4 years ago

Precip, Tmax and Tmin, ideally daily time step, coverage all of North America. Longest possible time series also (i.e. 1979-most recent).

Zeitsperre commented 4 years ago

@huard The extraction process is to blame for removing most of the metadata. The way that the data extraction is set up, I don't think one can download source files in any case.

@richardarsenault I have all of these extracted and they are currently stored in a somewhat CF-like format. If we can navigate data access rights, I can place on copy on our prod server with relative ease.

edit: Actually, I have them at the hourly time-step for that time period. I imagine that's also fine.

huard commented 4 years ago

@Zeitsperre Is this something worth a bug report to the cds toolbox folks ? I'm concerned that existing ERA-5 users access our repo, find the the files are different from those they're using, and end up being confused. Suggestions to make the whole thing as transparent as possible ?

Have you agreed something with Nathalie/Agnès regarding data access rights and credits ?

Zeitsperre commented 4 years ago

It's easier to show what I mean when I say that the metadata does not follow conventions:

netcdf pr_era5_reanalysis_hourly_2005 {
dimensions:
    longitude = 1440 ;
    latitude = 721 ;
    time = 8760 ;
variables:
    float longitude(longitude) ;
        longitude:units = "degrees_east" ;
        longitude:long_name = "longitude" ;
    float latitude(latitude) ;
        latitude:units = "degrees_north" ;
        latitude:long_name = "latitude" ;
    int time(time) ;
        time:units = "hours since 1900-01-01 00:00:00.0" ;
        time:long_name = "time" ;
        time:calendar = "gregorian" ;
    short tp(time, latitude, longitude) ;
        tp:scale_factor = 1.15288460309059e-06 ;
        tp:add_offset = 0.0377754169048664 ;
        tp:_FillValue = -32767s ;
        tp:missing_value = -32767s ;
        tp:units = "m" ;
        tp:long_name = "Total precipitation" ;

// global attributes:
        :Conventions = "CF-1.6" ;
        :history = "2019-08-27 07:22:14 GMT by grib_to_netcdf-2.10.0: /opt/ecmwf/eccodes/bin/grib_to_netcdf -o /cache/data5/adaptor.mars.internal-1566889333.6351957-31393-16-fe468c59-e6fb-4107-921d-a0af244b62d0.nc /cache/tmp/fe468c59-e6fb-4107-921d-a0af244b62d0-adaptor.mars.internal-1566889333.6359959-31393-6-tmp.grib" ;
}

I'd like to try subsetting this and sending a sample to an online CF-Checker but even the lat and lon and lon dimensions don't follow what we normally have in most NetCDFs. Opening a PR for this on xclim right now.

julemai commented 4 years ago

Hi @Zeitsperre! Do you mean that you can't run RAVEN with it or you have trouble with something else? The header looks pretty good to me.

Zeitsperre commented 4 years ago

It runs perfectly fine, the issues here are that they don follow "typical" CF-Compliant nomenclature. For example, "latitude" and "longitude" dimension are almost always written "lat" and "lon". Also, the units for total precipitation don't follow convention (the variable name should be 'pr', with standard name 'precipitation_amount' and units of 'kg m-2' and appropriate cell methods).

The issues I'm directly having are more related to xclim (lat/lon dim names) that don't permit me to use our toolset to even spatially subset the data (this is due to oversight on my part).

Zeitsperre commented 4 years ago

The issues I can see are that these inconsistencies are going to be edge cases that need to be handled within RAVEN's individual processes.

julemai commented 4 years ago

I see. When we start including CaSPAr data you will likely run into similar issues. Below is the header of a classic CaSPAr NetCDF file (HRDPA = CaPA 2.5k). Do you think you will be able to handle/subset these?

netcdf \2020010806 {
dimensions:
    time = UNLIMITED ; // (1 currently)
    rlon = 2540 ;
    rlat = 1290 ;
variables:
    int time(time) ;
        time:long_name = "time" ;
        time:units = "hours since 2020-01-08 06:00:00" ;
        time:calendar = "gregorian" ;
        time:standard_name = "time" ;
        time:axis = "T" ;
    float rlon(rlon) ;
        rlon:long_name = "longitude in rotated pole grid" ;
        rlon:units = "degrees" ;
        rlon:eccc_grid_definition = "grtyp: E, ig1: 1430, ig2: 500, ig3: 56000, ig4: 44000" ;
        rlon:standard_name = "grid_longitude" ;
        rlon:axis = "X" ;
    float rotated_pole ;
        rotated_pole:long_name = "coordinates of the rotated North Pole" ;
        rotated_pole:grid_mapping_name = "rotated_latitude_longitude" ;
        rotated_pole:earth_radius = 6371220.f ;
        rotated_pole:grid_north_pole_latitude = 36.08852f ;
        rotated_pole:grid_north_pole_longitude = 65.30515f ;
        rotated_pole:north_pole_grid_longitude = 0.f ;
        rotated_pole:longitude_of_prime_meridian = 0.f ;
    float rlat(rlat) ;
        rlat:long_name = "latitude in rotated pole grid" ;
        rlat:units = "degrees" ;
        rlat:eccc_grid_definition = "grtyp: E, ig1: 1430, ig2: 500, ig3: 56000, ig4: 44000" ;
        rlat:standard_name = "grid_latitude" ;
        rlat:axis = "Y" ;
    float lon(rlat, rlon) ;
        lon:long_name = "longitude" ;
        lon:units = "degrees_east" ;
        lon:eccc_grid_definition = "grtyp: Z, ig1: 39561, ig2: 41085, ig3: 1, ig4: 0" ;
        lon:standard_name = "longitude" ;
    float lat(rlat, rlon) ;
        lat:long_name = "latitude" ;
        lat:units = "degrees_north" ;
        lat:eccc_grid_definition = "grtyp: Z, ig1: 39561, ig2: 41085, ig3: 1, ig4: 0" ;
        lat:standard_name = "latitude" ;
    float CaPA_fine_A_PR_SFC(time, rlat, rlon) ;
        CaPA_fine_A_PR_SFC:long_name = "Analysis: Quantity of precipitation" ;
        CaPA_fine_A_PR_SFC:units = "m" ;
        CaPA_fine_A_PR_SFC:grid_mapping = "rotated_pole" ;
        CaPA_fine_A_PR_SFC:coordinates = "lon lat" ;
    float CaPA_fine_A_CFIA_SFC(time, rlat, rlon) ;
        CaPA_fine_A_CFIA_SFC:long_name = "Analysis: Confidence Index of Analysis CAPA" ;
        CaPA_fine_A_CFIA_SFC:units = "1" ;
        CaPA_fine_A_CFIA_SFC:grid_mapping = "rotated_pole" ;
        CaPA_fine_A_CFIA_SFC:coordinates = "lon lat" ;

// global attributes:
        :product = "CaPA_fine" ;
        :Conventions = "CF-1.6" ;
        :Remarks = "Variable names are following the convention <Product>_<Type:A=Analysis,P=Prediction>_<ECCC name>_<Level/Tile/Category>. Variables with level \'10000\' are at surface level. The height [m] of variables with level \'0XXXX\' needs to be inferrred using the corresponding fields of geopotential height (GZ_0XXXX-GZ_10000). The variables UUC, VVC, UVC, and WDC are not modelled but inferred from UU and VV for convenience of the users. Precipitation (PR) is reported as 6-hr accumulations for CaPA_fine and CaPA_coarse. Precipitation (PR) are accumulations since beginning of the forecast for GEPS, GDPS, REPS, RDPS, HRDPS, and CaLDAS." ;
        :License = "These data are provided by the Canadian Surface Prediction Archive CaSPar. You should have received a copy of the License agreement with the data. Otherwise you can find them under http://caspar-data.ca/doc/caspar_license.txt or email caspar.data@uwaterloo.ca." ;
}
Zeitsperre commented 4 years ago

Yes. I'm certain that so long as the dimensions make reference to the spatial transformation (ie: rotated pole with lat and lon) they should be understoud. The only issues I can see are the variable names. I would need a small sample to see exactly what constitutes an CF-Convention error.

julemai commented 4 years ago

There you go:

Link to download

Unable to open the link? | Click on this link  and paste the following tracking number: XR5HWF5WGA9EQ8ZA

huard commented 4 years ago

Note that Raven itself will be able to understand the files even if the variable names are not standard. There is a list of "alternative" variable names.

julemai commented 4 years ago

Exactly! Also the name of dimensions is not hard-coded in RAVEN. But I understand that you need to standardize them for the spatial cropping.

Zeitsperre commented 4 years ago

The data looks good and with David's information, we needn't worry about the variable names. I'm nearly done the workarounds necessary to support ERA5 data in xclim. I'll move to see what needs changing exactly for it in RAVEN after.

Zeitsperre commented 4 years ago

Just ran some ERA5 data processed via the NetCDF extraction (CDSAPI) through the CEDA CF-Checker. This is what I got back:

CHECKING NetCDF FILE: /group_workspaces/jasmin4/ceda_wps/production/cache/uploads/test.nc
=====================
Using CF Checker Version 3.0.5
Checking against CF Version CF-1.6
Using Standard Name Table Version 70 (2019-12-10T14:47:41Z)
Using Area Type Table Version 9 (07 August 2018)
------------------
Checking variable: longitude
------------------
INFO: attribute _FillValue is being used in a non-standard way
------------------
Checking variable: latitude
------------------
INFO: attribute _FillValue is being used in a non-standard way
------------------
Checking variable: time
------------------
------------------
Checking variable: tp
------------------
ERROR: Attribute missing_value of incorrect type
ERRORS detected: 1
WARNINGS given: 0
INFORMATION messages: 2

The variable name is obviously not standard and the _FillValue is wrong on all coordinates but it's not as bad I imagined (the GlobalAttributes are bare, but that's fine under the conventions). If we want to try running it in RAVEN and seeing where/if it causes processes to fail, that would be a start. I can start by moving what we have on hand into the server at the very least (North America: U-component of wind, V-component of wind, Snowfall; Globally: 2m-temperature, Total precipitation; All variables are at the hourly time frequency).

huard commented 4 years ago

We can later fix the metadata attributes and variable names using an NcML file. By the way @julemai this is something that you might be interested in for your DAP server. THREDDS has support for NcML (netcdf markup language), which allows you to virtually aggregate and tweak metadata using an XML file that refers to the files on disk.

Zeitsperre commented 4 years ago

Data is transferring now. It might take all weekend. I'll have updates on Monday.

Zeitsperre commented 4 years ago

Data is transferred and accessible under the folder ecmwf.

huard commented 4 years ago

Could you add the ERA5 license there too ?

Zeitsperre commented 4 years ago

Done.