Open bascrezee opened 5 years ago
Possible functionalities (suggestions, open for discussion)
Biggest question
Thanks Ben. See also #1120 about how to provide station data.
Nice ideas! @BenMGeo, could you clarify the difference between the two masking options that you suggested and what exactly you have in mind for the transects?
@lbdreyer, looking at the list at the top of this page, do you have any comments/thoughts on the applicability of iris?
Nice ideas! @BenMGeo, could you clarify the difference between the two masking options that you suggested and what exactly you have in mind for the transects?
The two masking options relate to a) station measurements which have a spatial footprint that they are covering with their measurement (Flux stations for example have a varying footprint depending on wind direction). This usually is not of great use in ESMs at current stage, but for high resolution data it could become relevant at some point. (like for drone observations compared to station data in catchments. This could be done with the ESMValTool as well.) Therefore I would already include something like a bounds option or a polygon input in our backend/core preparation. b) station measurements which do not have a spatial footprint, but might be used within their maximal domain (like all the area the point is closest to) to assess and compare the variation of the gridded data.
Both options should be considered as longterm functionalities, but might not be high priority. Though, not having this in mind can lead to major recoding at some later stage.
For the transects, there is a functionality in iris to plot data along them (https://scitools.org.uk/iris/docs/latest/examples/General/cross_section.html). I think you can easily modify this to plot along a custom spatial line. This might be relevant, for example, for buoy measurements or when handling ERA5 grids as virtual stations. So, when you read a number of stations, you could also provide this as a coordinates-set-object by default (with the/a preprocessor?) because you are already touching the data.
Basically, these are extensions that a number of users might want/need.
Some thoughts/questions for the table above:
functionality | kind | potential location | description | comment |
---|---|---|---|---|
plot | shared diag function | iris | plot dots at the station location on top of a map with the colour determined by the value of the variable | also, this is highly depending on the diagnostic considering optimal shape, or subsetting in different shapes, etc.; do you just want to have a quicklook here? |
collocation | preprocessor | iris | collocation of gridded data with the station data (with in a next step calculation of certain metrics, e.g. RMSE, correllation) | don't forget temporal collocation (including averaging/sum/shifts (linear interpolation?)) or do you want to leave this to the diagnostic, as it might be very specific |
mask | preprocessor | esmvaltool | include footprint of station for masking gridded data | the mask (in my view) should subset the iris cube to mimic ensemble like structures with the station numbers replacing the ensemble numbers, having masked values outside the footprint |
voronoi mask | preprocessor | esmvaltool | calculate thiessen/Voronoi polygones for masking gridded data (Both for the case of inverse comparison of grid-to-point not point-to-grid) | see above |
transects | preprocessor | iris | using station data for transects over gridded data? (possible use in iris) | should be something like a slice or similar so you can plot along the above ensemble numbers (in a requested order); a user might use it for differences or trends along the transect, etc. |
@lbdreyer, looking at the list at the top of this page, do you have any comments/thoughts on the applicability of iris?
With cf discrete geometries it would be possible to store timeseries for multipe stations in a single netCDF file. I guess it depends on how you plan to process the files later. My concern with shapefiles is that they would be fine for plotting (this is easily done in Cartopy) but then any other analysis (colocation, transects) would need it to be converted to a cube (if you do plan to use iris) in which case loading from netcdf file is much easier
As for your usecases
In an offline discussion we agreed to proceed by trying to define a CF version of a selected observational dataset. Most likely this will make use of discrete geometries as defined in CF 1.7, Chapter 9.
I just stumbled across this new feature in the upcoming CF conventions 1.8: geometries. This will allow us in the future to also support polygons in CF compliant netcdf files.
@lbdreyer
* transects - would the transects always be along a fixed point in the coordinate, e.g. along a fixed latitude?
No, they would be along a number of coordinates. Like having the point measurements in a specific order and extracting the information form the compared iris cube. (I expect something like an aggregated cube with the coordinate "station number" instead of "latitude" and "longitude")
@zklaus
I attached 2 examples for station data.
We would need to be able to handle a number of each stations read in within a recipe.
Sorry that I was a bit slow to respond here. A more detailed comment will follow in the coming days, but I wanted to mention already that I have tried to look around a bit more and came across the Polar Prediction project that has produced a report that could be relevant for us. Perhaps start by looking at the last section in there titled "Attachment".
CMIP6 Data Request includes a number of requests for data at site locations, either point data (one level, see e.g. that summary) or profiles (here).
@zklaus and me discussed yesterday of potential use cases of station data in ESMValTool. The use cases are user-specific, but for design considerations it is good to list them here. We came up with the following:
Also, how do we provide station data? single netcdfs? shapefiles?
Feel free to modify the list/add suggestions.