Use cases for station data

bascrezee commented 5 years ago

@zklaus and me discussed yesterday of potential use cases of station data in ESMValTool. The use cases are user-specific, but for design considerations it is good to list them here. We came up with the following:

functionality	kind	potential location	description
plot	shared diag function	iris	plot dots at the station location on top of a map with the colour determined by the value of the variable
collocation	preprocessor	iris	collocation of gridded data with the station data (with in a next step calculation of certain metrics, e.g. RMSE, correllation)
mask	preprocessor	esmvaltool	include footprint of station for masking gridded data
voronoi mask	preprocessor	esmvaltool	calculate thiessen/Voronoi polygones for masking gridded data (Both for the case of inverse comparison of grid-to-point not point-to-grid)
transects	?	?	using station data for transects over gridded data? (possible use in iris)

Also, how do we provide station data? single netcdfs? shapefiles?

Feel free to modify the list/add suggestions.

BenMGeo commented 5 years ago

Possible functionalities (suggestions, open for discussion)

include footprint of station for masking gridded data
calculate thiessen/Voronoi polygones for masking gridded data (Both for the case of inverse comparison of grid-to-point not point-to-grid)
using station data for transects over gridded data? (possible use in iris)

Biggest question

how to provide station data? single netcdfs? shapefiles?

bascrezee commented 5 years ago

Thanks Ben. See also #1120 about how to provide station data.

zklaus commented 5 years ago

Nice ideas! @BenMGeo, could you clarify the difference between the two masking options that you suggested and what exactly you have in mind for the transects?

zklaus commented 5 years ago

@lbdreyer, looking at the list at the top of this page, do you have any comments/thoughts on the applicability of iris?

BenMGeo commented 5 years ago

Nice ideas! @BenMGeo, could you clarify the difference between the two masking options that you suggested and what exactly you have in mind for the transects?

The two masking options relate to a) station measurements which have a spatial footprint that they are covering with their measurement (Flux stations for example have a varying footprint depending on wind direction). This usually is not of great use in ESMs at current stage, but for high resolution data it could become relevant at some point. (like for drone observations compared to station data in catchments. This could be done with the ESMValTool as well.) Therefore I would already include something like a bounds option or a polygon input in our backend/core preparation. b) station measurements which do not have a spatial footprint, but might be used within their maximal domain (like all the area the point is closest to) to assess and compare the variation of the gridded data.

Both options should be considered as longterm functionalities, but might not be high priority. Though, not having this in mind can lead to major recoding at some later stage.

For the transects, there is a functionality in iris to plot data along them (https://scitools.org.uk/iris/docs/latest/examples/General/cross_section.html). I think you can easily modify this to plot along a custom spatial line. This might be relevant, for example, for buoy measurements or when handling ERA5 grids as virtual stations. So, when you read a number of stations, you could also provide this as a coordinates-set-object by default (with the/a preprocessor?) because you are already touching the data.

Basically, these are extensions that a number of users might want/need.

BenMGeo commented 5 years ago

Some thoughts/questions for the table above:

functionality	kind	potential location	description	comment
plot	shared diag function	iris	plot dots at the station location on top of a map with the colour determined by the value of the variable	also, this is highly depending on the diagnostic considering optimal shape, or subsetting in different shapes, etc.; do you just want to have a quicklook here?
collocation	preprocessor	iris	collocation of gridded data with the station data (with in a next step calculation of certain metrics, e.g. RMSE, correllation)	don't forget temporal collocation (including averaging/sum/shifts (linear interpolation?)) or do you want to leave this to the diagnostic, as it might be very specific
mask	preprocessor	esmvaltool	include footprint of station for masking gridded data	the mask (in my view) should subset the iris cube to mimic ensemble like structures with the station numbers replacing the ensemble numbers, having masked values outside the footprint
voronoi mask	preprocessor	esmvaltool	calculate thiessen/Voronoi polygones for masking gridded data (Both for the case of inverse comparison of grid-to-point not point-to-grid)	see above
transects	preprocessor	iris	using station data for transects over gridded data? (possible use in iris)	should be something like a slice or similar so you can plot along the above ensemble numbers (in a requested order); a user might use it for differences or trends along the transect, etc.

lbdreyer commented 5 years ago

@lbdreyer, looking at the list at the top of this page, do you have any comments/thoughts on the applicability of iris?

With cf discrete geometries it would be possible to store timeseries for multipe stations in a single netCDF file. I guess it depends on how you plan to process the files later. My concern with shapefiles is that they would be fine for plotting (this is easily done in Cartopy) but then any other analysis (colocation, transects) would need it to be converted to a cube (if you do plan to use iris) in which case loading from netcdf file is much easier

As for your usecases

plotting each station, would be done via iplt.scatter which is already supported in iris.
colocation - we do have some code for interpolating trajectories which may be suitable. I'd be interest to see whether it is performant enough for you. The original write of that functionality was a bit slow, but there was an update a couple of years ago that sped up the code. There are plans to consider improving the support for these types of usecases so if Iris doesn't currently do exactly what you want it, we could it add it on to our list of issues to address.
transects - would the transects always be along a fixed point in the coordinate, e.g. along a fixed latitude?

zklaus commented 5 years ago

In an offline discussion we agreed to proceed by trying to define a CF version of a selected observational dataset. Most likely this will make use of discrete geometries as defined in CF 1.7, Chapter 9.

zklaus commented 5 years ago

I just stumbled across this new feature in the upcoming CF conventions 1.8: geometries. This will allow us in the future to also support polygons in CF compliant netcdf files.

BenMGeo commented 5 years ago

@lbdreyer

* transects - would the transects always be along a fixed point in the coordinate, e.g. along a fixed latitude?

No, they would be along a number of coordinates. Like having the point measurements in a specific order and extracting the information form the compared iris cube. (I expect something like an aggregated cube with the coordinate "station number" instead of "latitude" and "longitude")

BenMGeo commented 5 years ago

@zklaus

I attached 2 examples for station data.

The DWD data is cmorized with V2
The ISMN data is cmorized standalone and uses an open-source Python package distributed by the data provider (post edited by @bascrezee).

We would need to be able to handle a number of each stations read in within a recipe.

OBS_Stations.zip

zklaus commented 5 years ago

Sorry that I was a bit slow to respond here. A more detailed comment will follow in the coming days, but I wanted to mention already that I have tried to look around a bit more and came across the Polar Prediction project that has produced a report that could be relevant for us. Perhaps start by looking at the last section in there titled "Attachment".

senesis commented 2 years ago

CMIP6 Data Request includes a number of requests for data at site locations, either point data (one level, see e.g. that summary) or profiles (here).

ESMValGroup / ESMValTool

Use cases for station data #1119