ESMValGroup / ESMValTool

ESMValTool: A community diagnostic and performance metrics tool for routine evaluation of Earth system models in CMIP
https://www.esmvaltool.org
Apache License 2.0
211 stars 124 forks source link

Handling of station data in ESMValTool #496

Open bascrezee opened 6 years ago

bascrezee commented 6 years ago

As part of C3S_511, data from the 'international soil moisture network' should be read and processed by ESMVal. As a first step, data needs to be cmorized, see #232. This issue/thread is meant to start a discussion with other contributors to ESMVal to find the best way to integrate station data and the collocation between station data and other (gridded) data within ESMVal. Currently, the collocation is handled through the diagnostic scripts themselves, however, it might make sense to take care of this in the preprocessor.

fmassonn commented 6 years ago

Don't forget to systematically add @arunranain in the tasks involving UCL, please. François

mattiarighi commented 5 years ago

Now that the CMORizers are up and running, we are ready to tackle this issue. Some general thoughts for discussion:

Here is an example of the cmorized AERONET data from version 1, just a few of the 871 stations :smile:

OBS_AERONET_ground_Santarem_T0M_od550aer_199201-201512.nc
OBS_AERONET_ground_Santiago_T0M_od550aer_199201-201512.nc
OBS_AERONET_ground_Sao-Martinho-SONDA_T0M_od550aer_199201-201512.nc
OBS_AERONET_ground_Sao-Paulo_T0M_od550aer_199201-201512.nc
OBS_AERONET_ground_Saturn-Island_T0M_od550aer_199201-201512.nc
OBS_AERONET_ground_SEARCH-Centreville2_T0M_od550aer_199201-201512.nc
OBS_AERONET_ground_SEARCH-Centreville_T0M_od550aer_199201-201512.nc

In v1, the variable was just:

        float od550aer(time) ;
                od550aer:standard_name = "atmosp
                od550aer:units = "1" ;
                od550aer:cell_methods = "time" ;
                od550aer:cell_measures = "area" ;
                od550aer:long_name = "Ambient Aerosol Optical Thickness at 550 nm"" ;
                od550aer:_FillValue = 1.e+20f ;

with latitude and longitude given as global attribute of the cmorized nc file:

// global attributes:
                :conventions = "CF/CMOR" ;
                :title = "AERONET station data reformatted for the ESMValTool" ;
                :reference = "Holben, B. N. et al., Rem. Sens. Environ., 16, 1-16, doi:10.1016/S0034-4257(98)00031-5, 1998." ;
                :source = "http://aeronet.gsfc.nasa.gov/cgi-bin/combined_data_access_new" ;
                :tier = 2 ;
                :field = "T0M" ;
                :period = "1992-2015" ;
                :station = "Zhangye" ;
                :latitude = 39.079f ;
                :longitude = 100.276f ;

but as I said above, I think lat and lon should be added as scalar coordinates.

BenMGeo commented 5 years ago

Thank you, Mattia, for bringing this back. So basically a single lat and lon attribute/dimension should be enough? Sounds promising. We are at deliverables these days, maybe our C3S_511 service will have to wait to add to this until end of April.

mattiarighi commented 5 years ago

We would need to test this with Iris, but defining lat and lon as global attributes is definitely not the solution.

jgriesfeller commented 4 years ago

Is there any progress on this issue? In principle I wanted to bring Aeronet into ESMValTool for our next deliverable for the IS-ENES3 project (WP9/JRA2, Task3; 'providing observations to the user'). Regarding the format (admittedly not knowing CMOR very well) I would not put every station in a separate file, but use a station number dimension. Although that implies that every station has to have the same time variable. We have something like that in the data format of the our aerocom project that I can share if you find that helpful. One could also think about using a group for each station...

mattiarighi commented 4 years ago

Although that implies that every station has to have the same time variable.

This would also mean putting all station on the same (arbitrary?) grid. And it could be an issue if you have two or more stations in the same cell.

jgriesfeller commented 4 years ago

latitude and longitude would have the station number as dimension. Just the time has to be the same for all stations. I will show you the aerocom data format later on (I'm in a meeting right now)

jgriesfeller commented 4 years ago

This is a file from the HTAP-II project:

EMEP_BASE_vmrno2_ModelLevelAtStations_2010_Hourly.nc
netcdf htap2_EMEP_BASE_vmrno2_ModelLevelAtStations_2010_Hourly {
dimensions:
    time = UNLIMITED ; // (8760 currently)
    lev = 20 ;
    station = 248 ;
    bnds = 2 ;
    ncl4 = 1 ;
    charlen1 = 6 ;
    charlen2 = 1 ;
    charlen3 = 1 ;
    charlen4 = 21 ;
variables:
    double time(time) ;
        time:standard_name = "time" ;
        time:units = "days since 2001-01-01 00:00:00" ;
        time:long_name = "Time" ;
    double time_bnds(time, bnds) ;
        time_bnds:standard_name = "time" ;
        time_bnds:units = "days since 2001-01-01 00:00:00" ;
        time_bnds:long_name = "bounds coordinates for time" ;
        time_bnds:_FillValue = 9.96920996838687e+36 ;
    double lev(lev) ;
        lev:standard_name = "atmosphere_hybrid_sigma_pressure_coordinate" ;
        lev:long_name = "Alternate hybrid sigma pressure coordinate" ;
        lev:units = "1" ;
    double ap(lev) ;
        ap:standard_name = "atmosphere_hybrid_sigma_pressure_coordinate" ;
        ap:long_name = "Alternate hybrid sigma coordinate ap coefficient" ;
        ap:units = "Pa" ;
    double b(lev) ;
        b:standard_name = "atmosphere_hybrid_sigma_pressure_coordinate" ;
        b:long_name = "Alternate hybrid sigma coordinate b coefficient " ;
        b:units = "1" ;
    double ap_bnds(lev, bnds) ;
        ap_bnds:standard_name = "atmosphere_hybrid_sigma_pressure_coordinate" ;
        ap_bnds:long_name = "Alternate hybrid sigma coordinate ap coefficient for layer bounds" ;
        ap_bnds:units = "Pa" ;
    double b_bnds(lev, bnds) ;
        b_bnds:standard_name = "atmosphere_hybrid_sigma_pressure_coordinate" ;
        b_bnds:long_name = "Alternate hybrid sigma coordinate b coefficient  for layer bounds" ;
        b_bnds:units = "1" ;
    double P0(ncl4) ;
        P0:units = "Pa" ;
        P0:standard_name = "model_reference_air_pressure" ;
        P0:long_name = "Reference" ;
    char stationid(station, charlen1) ;
        stationid:standard_name = "platform_id" ;
        stationid:long_name = "HTAP station ID" ;
    double lon(station) ;
        lon:standard_name = "longitude" ;
        lon:_FillValue = 9.96920996838687e+36 ;
        lon:long_name = "station longitude" ;
        lon:units = "degrees_east" ;
    double lat(station) ;
        lat:standard_name = "latitude" ;
        lat:_FillValue = 9.96920996838687e+36 ;
        lat:long_name = "station latitude" ;
        lat:units = "degrees_north" ;
    double station_elevation(station) ;
        station_elevation:standard_name = "surface_altitude" ;
        station_elevation:_FillValue = 9.96920996838687e+36 ;
        station_elevation:long_name = "station elevation" ;
        station_elevation:units = "m asl" ;
    char network_stationid(station, charlen2) ;
        network_stationid:standard_name = "platform_id" ;
        network_stationid:long_name = "Original Station ID" ;
    char networkid(station, charlen3) ;
        networkid:standard_name = "platform_name" ;
        networkid:long_name = "Network ID" ;
    char station_name(station, charlen4) ;
        station_name:long_name = "HTAP station long_name" ;
    double ps(time, station) ;
        ps:_FillValue = 9.96920996838687e+36 ;
        ps:standard_name = "surface_air_pressure" ;
        ps:long_name = "surface air pressure" ;
        ps:units = "1" ;
    float vmrno2(time, lev, station) ;
        vmrno2:units = "mole mole-1" ;
        vmrno2:standard_name = "mole_fraction_of_nitrogen_dioxide_in_air" ;
        vmrno2:long_name = "NO2 Volume Mixing Ratio" ;
        vmrno2:_FillValue = 9.96921e+36f ;

// global attributes:
        :creation_date = "Thu Oct 29 11:10:38 CET 2015" ;
        :cmor_version = "N/A" ;
        :comment = "Model documentation available via http://iek8wikis.iek.fz-juelich.de/HTAPWiki/modeldocumentation" ;
        :table_id = "HTAP2_ShortStationsListProfile.csv (Oct 2015)" ;
        :project_id = "HTAP2" ;
        :references = "Simpson et al. 2012 (ACP 12(16):7825-7865, doi:10.5194/acp-12-7825-2012)" ;
        :contact = "EMEP MSC-W  <emep.mscw@met.no>" ;
        :parent_experiment_id = "N/A" ;
        :forcing = "N/A" ;
        :model_id = "EMEP MSC-W CTM" ;
        :source = "N/A" ;
        :parent_experiment = "N/A" ;
        :experiment_id = "N/A" ;
        :institute_id = "EMEP MSC-W" ;
        :institution = "Norwegian Meteorological Institute" ;
        :Conventions = "CF-1.0" ;
        :title = "HTAP2/AeroCom file" ;
}
bjoernbroetz commented 4 years ago

As discussed during the IS-ENES telco today with @jgriesfeller and @mattiarighi we should find a solution soon about the question on how to handle station data in the ESMValTool. There are two options identified:

The option for a common standard is preferred on the short term.

jgriesfeller commented 4 years ago

Any progress on how we want to do this? I will only need a couple of days to get a downloader ready and need then a decision about how to proceed.

mattiarighi commented 4 years ago

@ESMValGroup/esmvaltool-coreteam we need to take a decision on how to proceed here. Should we discuss this at the monthly telecon tomorrow?

duncanwp commented 4 years ago

These are the relavent CF Conventions: http://cfconventions.org/Data/cf-conventions/cf-conventions-1.8/cf-conventions.html#appendix-examples-discrete-geometries