Data Package Specification Improvements

p-a-s-c-a-l commented 5 years ago

Here we collect some improvements to the Data Package Specification and the the implementation that come up during the creation of the 1st Data Package.

Since the node types are already implemented in Drupal and (example) data packages exist, no heavy structural changes to the the specification should be made. Adding some properties should be o.k.

[ ] Categories for Data Package Resources:
We need to be able to assign (several?) categories to resources in order to be able group interrelated resources, e.g. when population exposure resources are split into different age groups. See also https://github.com/clarity-h2020/map-component/issues/2#issuecomment-441670340
[x] Different Resource Representations
The same resource may be available in different representations (data formats, service interfaces) and thus accessible by different paths, e.g. the same Hazard Layers are accessed as image by WMS URL (path property) by the Map Component and as coverage by WCS URL by the Table Component and the Impact Model Service. We could simple add the same resource twice but with different values for path and mediatype and format but we should also indicate somehow that those different resources are in fact just another representation of the same data.
[ ] Thresholds
Thresholds (expressed by percentages relative to the baseline) for normalising absolute values (e.g. number of hot days) to low/medium/high for tabular visualisation should become part of the Data Package Resource. We could add one text property that contains a simple JSON Format like the one used to define Criteria Functions for Scenario Analysis.
[ ] tbc

fgeyer16 commented 5 years ago

For the Categories we can use a simple textfield if needed with multiple values possible. In a second step should there be a kind of auto completion, so that the creator of the datapackage can easily reuse categorisations he has already made?

For the different resource representations I think the easiest way would be to use the name property of a dataset to group the resources. So every representation ahs to have the same name. To distinguish between the several representations the Format or media type property has to be displayed aside the name always. (add always the field format or media type to views display or display mode.) I have already implemented it for the display of the dataset in tthe Datapackage.

maesbri commented 5 years ago

categories: yes, I initially planned to add a "category" property that may have one or more values. However, the way I was modelling this is by having "exposure elements" (e.g., population, buildings, infrastructure, etc) and each of these would have vulnerability classes (I think this is closer to the "categories" you mentioned), e.g., population:age-group-0-14, population:age-group-15-64, population:age-group>65, etc.
thresholds: I had defined the following threshold object (name values could be low, medium or high ... or we can give more freedom if we wish)

"threshold": { "name": low, "range": { "lower": 0, "upper": 3.8} }}

different resources representations can be grouped under the same resource name as indicated by @fgeyer16 . In addition, I am introducing a new optional property in the resource object called "mapping_service_type" with possible values wms, wfs, wcs, osm or tms in case the resource is available by any of those service types.

p-a-s-c-a-l commented 5 years ago

The categories I had in mind are more general and not limited to Exposure, e.g. we need to group background layers into categories like "Open Street Map" or Hazards layers into "Heat Hazards" etc. Since @therter needs this now, so we should keep it simple for now (text field) and extend it later.
for thresholds absolute values for upper / lower are not enough, see https://github.com/clarity-h2020/data-package/issues/8#issuecomment-444515812 and https://github.com/clarity-h2020/data-package/issues/8#issuecomment-444863730. Maybe we could just add another property relativeTo (e.g. "increase in baseline") to indicate that upper and lower represent percentages of other values.

p-a-s-c-a-l commented 5 years ago

Updated Specification: https://github.com/clarity-h2020/data-package/tree/master/docs Example: https://github.com/clarity-h2020/data-package/tree/master/examples/dc1-naples

humerh commented 5 years ago

1) I would suggest to introduce a "Ressource_id" in addition to the "name". The Ressource_id should be auto generated id and should identify a specific ressource

2) I would propose to change the content of the ressource description. In the first example "agricultural-areas" we should not describe ONE specific output format (here: "encoding": "UTF-8", "format": "geojson", "mediatype": "application/json" and that all followed by and URL/Path for this special output format. I think, the paramters should be: Service_type: OGC/WFS Type_name: clarity:agricultural_areas Path: "http://services.clarity-h2020.eu:8080/geoserver/clarity/ows?service=WFS&version=1.0.0"

Maybe the user wants to access the data in another way, the he/she needs another format.
For emikat we will access the WFS service with the protocol type: "SHAPE-ZIP"

3) The same for the grid results like "hot-days". I do not want to use the TIF Output format fixed. I want to query with the format "application/x-zip" Geoserver supports this format, but your definition forbids this query. This is not ok for us. We propose also here to specify: Service_type: OGC/WCS Type_name: clarity:Tx75p_consecutive_max_EUR-11_ICHEC-EC-EARTH_historical_r12i1p1_SMHI-RCA4_v1_day_19710101-20001231_netcdf3 Path: "https://clarity.meteogrid.com/geoserver/wcs?SERVICE=WCS&VERSION=2.0.1"

maesbri commented 5 years ago

ok for the "id" property for the resource (instead of ressource_id). My proposal for this new id is to build it like this: datapackage.id+"#"+ [auto-generated-sequential-number]

for points 2 and 3 I have to think a little bit more on how to describe that. However, nothing prevents us from adding additional formats to our specification (i.e,, shape-zip and application/x-zip)

maesbri commented 5 years ago

I have dedicated some time to think of various possibilities concerning points 2 and 3, and I came to the following conclusion: It should be up to the owner/producer of the datapackage to decide which formats he wants to provide his/her data in the package (from the ones allowed in our specification, and we can more to our specification if we need). Of course, it is then up to the consumer of the datapackage to decide whether the package contents suit or not his/her needs.

Being said that, the possibility of using WFS and WCS services enables more flexibility to provide in a single resource object the same dataset in various formats, and we should take advantage of such possibility.

My intention is to keep it as simple as possible since the CLARITY datapackage has introduced already many properties, which means more complexity for processing the package. In that sense, my proposal is to keep using the "path" property for encoding the whole wms/wfs/wcs request for obtaining the online resource.

In addition, we have the optional property "service_type" (a replacemente for the "mapping_service_type" I added in the last update of the specification) to indicate what kind of service it is i,e., "ogc:wms", "ogc:wfs", "ogc:wcs", "ogc:wms-t", "osm", etc...

With just those two parameters any program could "intelligently" parse the url, the parameters and request the resource in any other format that is offered by the wms/wfs/wcs service and that it is more suitable for its porpuses. Note that the type_name property you were proposing is already in the path url request and therefore it is not necessary to specify it again. The only parameter that I use useful to repeat (in the url and as property) is the "service_type" as it might not be obvious sometimes to deduce it from the url (and it might not necessarily be an ogc service).

p-a-s-c-a-l commented 5 years ago

With just those two parameters any program could "intelligently" parse the url, the parameters and request the resource in any other format that is offered by the wms/wfs/wcs service and that it is more suitable for its purposes.

This works only if you can provide one URL template that is suitable for all formats. E.g. consider the following really simple example: https://clarity.meteogrid.com/geoserver/wcs?SERVICE=WCS&VERSION=2.0.1"

'wcs' ist not only a request parameter but also part of the URI itself (/geoserver/wcs). The parameter VERSION=2.0.1 is probably not valid for WFS / WCS, a WMS request might need an additional layers parameter, etc. So how could such a common template for the most common OGC services we are using (WMS, WCS, WFS) look like?

maesbri commented 5 years ago

I don't think this is a problem since the approach I proposed still works with that url.

In Java (and I guess in many other programming languages) you can "dissect" a url (in Java with the URL object) and extract:

the base url: https://clarity.meteogrid.com/geoserver/wcs
the query string parameters: SERVICE=WCS&VERSION=2.0.1
and probably other things...

With that information, the client could build a new request. It just has to find in this case the format key-value parameter and replace "(geo)tif" by any other format supported by the service (for the whole list just use a GetCapabilities).

VERSION parameter is not mandatory in an OGC request, if not provided the service assumes the latest one. If its present in the path url, then the client should decide whether to use it as it is or ask for a different one that it can understand ...

The additional layer parameters should be either already in the path url or they can be modified by the client if needed (e.g.,width and height in a WMS query ... the data package producer can provide a default ones, but this should be setup in the query by the mapping client depending on its own screen resolution).

maesbri commented 5 years ago

Different Resource Representations: I have added the "mapview" property, so now, we can provide an alternative "view" of the georesource via a wms or similar mapping service that can be added as layer (not a wfs or wcs) in a map client. Check the resource specification for more details.

maesbri commented 5 years ago

I have updated the json descriptor for the Naples data package according to the latest changes I made today in the specification.

therter commented 5 years ago

ok, then we can get the wms information from the mapview property and we can use the same format within the internal data model of the Map-Component.

p-a-s-c-a-l commented 5 years ago

This can be considered done.

clarity-h2020 / data-package

Data Package Specification Improvements #9