ESMValGroup / ESMValTool

ESMValTool: A community diagnostic and performance metrics tool for routine evaluation of Earth system models in CMIP
https://www.esmvaltool.org
Apache License 2.0
218 stars 128 forks source link

Use cases for station data #1119

Open bascrezee opened 5 years ago

bascrezee commented 5 years ago

@zklaus and me discussed yesterday of potential use cases of station data in ESMValTool. The use cases are user-specific, but for design considerations it is good to list them here. We came up with the following:

functionality kind potential location description
plot shared diag function iris plot dots at the station location on top of a map with the colour determined by the value of the variable
collocation preprocessor iris collocation of gridded data with the station data (with in a next step calculation of certain metrics, e.g. RMSE, correllation)
mask preprocessor esmvaltool include footprint of station for masking gridded data
voronoi mask preprocessor esmvaltool calculate thiessen/Voronoi polygones for masking gridded data (Both for the case of inverse comparison of grid-to-point not point-to-grid)
transects ? ? using station data for transects over gridded data? (possible use in iris)

Also, how do we provide station data? single netcdfs? shapefiles?

Feel free to modify the list/add suggestions.

BenMGeo commented 5 years ago

Possible functionalities (suggestions, open for discussion)

Biggest question

bascrezee commented 5 years ago

Thanks Ben. See also #1120 about how to provide station data.

zklaus commented 5 years ago

Nice ideas! @BenMGeo, could you clarify the difference between the two masking options that you suggested and what exactly you have in mind for the transects?

zklaus commented 5 years ago

@lbdreyer, looking at the list at the top of this page, do you have any comments/thoughts on the applicability of iris?

BenMGeo commented 5 years ago

Nice ideas! @BenMGeo, could you clarify the difference between the two masking options that you suggested and what exactly you have in mind for the transects?

The two masking options relate to a) station measurements which have a spatial footprint that they are covering with their measurement (Flux stations for example have a varying footprint depending on wind direction). This usually is not of great use in ESMs at current stage, but for high resolution data it could become relevant at some point. (like for drone observations compared to station data in catchments. This could be done with the ESMValTool as well.) Therefore I would already include something like a bounds option or a polygon input in our backend/core preparation. b) station measurements which do not have a spatial footprint, but might be used within their maximal domain (like all the area the point is closest to) to assess and compare the variation of the gridded data.

Both options should be considered as longterm functionalities, but might not be high priority. Though, not having this in mind can lead to major recoding at some later stage.

For the transects, there is a functionality in iris to plot data along them (https://scitools.org.uk/iris/docs/latest/examples/General/cross_section.html). I think you can easily modify this to plot along a custom spatial line. This might be relevant, for example, for buoy measurements or when handling ERA5 grids as virtual stations. So, when you read a number of stations, you could also provide this as a coordinates-set-object by default (with the/a preprocessor?) because you are already touching the data.

Basically, these are extensions that a number of users might want/need.

BenMGeo commented 5 years ago

Some thoughts/questions for the table above:

functionality kind potential location description comment
plot shared diag function iris plot dots at the station location on top of a map with the colour determined by the value of the variable also, this is highly depending on the diagnostic considering optimal shape, or subsetting in different shapes, etc.; do you just want to have a quicklook here?
collocation preprocessor iris collocation of gridded data with the station data (with in a next step calculation of certain metrics, e.g. RMSE, correllation) don't forget temporal collocation (including averaging/sum/shifts (linear interpolation?)) or do you want to leave this to the diagnostic, as it might be very specific
mask preprocessor esmvaltool include footprint of station for masking gridded data the mask (in my view) should subset the iris cube to mimic ensemble like structures with the station numbers replacing the ensemble numbers, having masked values outside the footprint
voronoi mask preprocessor esmvaltool calculate thiessen/Voronoi polygones for masking gridded data (Both for the case of inverse comparison of grid-to-point not point-to-grid) see above
transects preprocessor iris using station data for transects over gridded data? (possible use in iris) should be something like a slice or similar so you can plot along the above ensemble numbers (in a requested order); a user might use it for differences or trends along the transect, etc.
lbdreyer commented 5 years ago

@lbdreyer, looking at the list at the top of this page, do you have any comments/thoughts on the applicability of iris?

With cf discrete geometries it would be possible to store timeseries for multipe stations in a single netCDF file. I guess it depends on how you plan to process the files later. My concern with shapefiles is that they would be fine for plotting (this is easily done in Cartopy) but then any other analysis (colocation, transects) would need it to be converted to a cube (if you do plan to use iris) in which case loading from netcdf file is much easier

As for your usecases

zklaus commented 5 years ago

In an offline discussion we agreed to proceed by trying to define a CF version of a selected observational dataset. Most likely this will make use of discrete geometries as defined in CF 1.7, Chapter 9.

zklaus commented 5 years ago

I just stumbled across this new feature in the upcoming CF conventions 1.8: geometries. This will allow us in the future to also support polygons in CF compliant netcdf files.

BenMGeo commented 5 years ago

@lbdreyer

* transects - would the transects always be along a fixed point in the coordinate, e.g. along a fixed latitude?

No, they would be along a number of coordinates. Like having the point measurements in a specific order and extracting the information form the compared iris cube. (I expect something like an aggregated cube with the coordinate "station number" instead of "latitude" and "longitude")

BenMGeo commented 5 years ago

@zklaus

I attached 2 examples for station data.

We would need to be able to handle a number of each stations read in within a recipe.

OBS_Stations.zip

zklaus commented 5 years ago

Sorry that I was a bit slow to respond here. A more detailed comment will follow in the coming days, but I wanted to mention already that I have tried to look around a bit more and came across the Polar Prediction project that has produced a report that could be relevant for us. Perhaps start by looking at the last section in there titled "Attachment".

senesis commented 2 years ago

CMIP6 Data Request includes a number of requests for data at site locations, either point data (one level, see e.g. that summary) or profiles (here).