clarity-h2020 / data-package

CLARITY Data Package Specification, Documentation and Examples
https://clarity-h2020.github.io/data-package/
GNU General Public License v3.0
3 stars 3 forks source link

Data Package Specification Improvements #9

Closed p-a-s-c-a-l closed 5 years ago

p-a-s-c-a-l commented 5 years ago

Here we collect some improvements to the Data Package Specification and the the implementation that come up during the creation of the 1st Data Package.

Since the node types are already implemented in Drupal and (example) data packages exist, no heavy structural changes to the the specification should be made. Adding some properties should be o.k.

fgeyer16 commented 5 years ago

For the Categories we can use a simple textfield if needed with multiple values possible. In a second step should there be a kind of auto completion, so that the creator of the datapackage can easily reuse categorisations he has already made?

For the different resource representations I think the easiest way would be to use the name property of a dataset to group the resources. So every representation ahs to have the same name. To distinguish between the several representations the Format or media type property has to be displayed aside the name always. (add always the field format or media type to views display or display mode.) I have already implemented it for the display of the dataset in tthe Datapackage.

maesbri commented 5 years ago

"threshold": { "name": low, "range": { "lower": 0, "upper": 3.8} }}

p-a-s-c-a-l commented 5 years ago
p-a-s-c-a-l commented 5 years ago

Updated Specification: https://github.com/clarity-h2020/data-package/tree/master/docs Example: https://github.com/clarity-h2020/data-package/tree/master/examples/dc1-naples

humerh commented 5 years ago

1) I would suggest to introduce a "Ressource_id" in addition to the "name". The Ressource_id should be auto generated id and should identify a specific ressource

2) I would propose to change the content of the ressource description. In the first example "agricultural-areas" we should not describe ONE specific output format (here: "encoding": "UTF-8", "format": "geojson", "mediatype": "application/json" and that all followed by and URL/Path for this special output format. I think, the paramters should be: Service_type: OGC/WFS Type_name: clarity:agricultural_areas Path: "http://services.clarity-h2020.eu:8080/geoserver/clarity/ows?service=WFS&version=1.0.0"

Maybe the user wants to access the data in another way, the he/she needs another format.
For emikat we will access the WFS service with the protocol type: "SHAPE-ZIP"

3) The same for the grid results like "hot-days". I do not want to use the TIF Output format fixed. I want to query with the format "application/x-zip" Geoserver supports this format, but your definition forbids this query. This is not ok for us. We propose also here to specify: Service_type: OGC/WCS Type_name: clarity:Tx75p_consecutive_max_EUR-11_ICHEC-EC-EARTH_historical_r12i1p1_SMHI-RCA4_v1_day_19710101-20001231_netcdf3 Path: "https://clarity.meteogrid.com/geoserver/wcs?SERVICE=WCS&VERSION=2.0.1"

maesbri commented 5 years ago

ok for the "id" property for the resource (instead of ressource_id). My proposal for this new id is to build it like this: datapackage.id+"#"+ [auto-generated-sequential-number]

for points 2 and 3 I have to think a little bit more on how to describe that. However, nothing prevents us from adding additional formats to our specification (i.e,, shape-zip and application/x-zip)

maesbri commented 5 years ago

I have dedicated some time to think of various possibilities concerning points 2 and 3, and I came to the following conclusion: It should be up to the owner/producer of the datapackage to decide which formats he wants to provide his/her data in the package (from the ones allowed in our specification, and we can more to our specification if we need). Of course, it is then up to the consumer of the datapackage to decide whether the package contents suit or not his/her needs.

Being said that, the possibility of using WFS and WCS services enables more flexibility to provide in a single resource object the same dataset in various formats, and we should take advantage of such possibility.

My intention is to keep it as simple as possible since the CLARITY datapackage has introduced already many properties, which means more complexity for processing the package. In that sense, my proposal is to keep using the "path" property for encoding the whole wms/wfs/wcs request for obtaining the online resource.

In addition, we have the optional property "service_type" (a replacemente for the "mapping_service_type" I added in the last update of the specification) to indicate what kind of service it is i,e., "ogc:wms", "ogc:wfs", "ogc:wcs", "ogc:wms-t", "osm", etc...

With just those two parameters any program could "intelligently" parse the url, the parameters and request the resource in any other format that is offered by the wms/wfs/wcs service and that it is more suitable for its porpuses. Note that the type_name property you were proposing is already in the path url request and therefore it is not necessary to specify it again. The only parameter that I use useful to repeat (in the url and as property) is the "service_type" as it might not be obvious sometimes to deduce it from the url (and it might not necessarily be an ogc service).

p-a-s-c-a-l commented 5 years ago

With just those two parameters any program could "intelligently" parse the url, the parameters and request the resource in any other format that is offered by the wms/wfs/wcs service and that it is more suitable for its purposes.

This works only if you can provide one URL template that is suitable for all formats. E.g. consider the following really simple example: https://clarity.meteogrid.com/geoserver/wcs?SERVICE=WCS&VERSION=2.0.1"

'wcs' ist not only a request parameter but also part of the URI itself (/geoserver/wcs). The parameter VERSION=2.0.1 is probably not valid for WFS / WCS, a WMS request might need an additional layers parameter, etc. So how could such a common template for the most common OGC services we are using (WMS, WCS, WFS) look like?

maesbri commented 5 years ago

I don't think this is a problem since the approach I proposed still works with that url.

In Java (and I guess in many other programming languages) you can "dissect" a url (in Java with the URL object) and extract:

With that information, the client could build a new request. It just has to find in this case the format key-value parameter and replace "(geo)tif" by any other format supported by the service (for the whole list just use a GetCapabilities).

VERSION parameter is not mandatory in an OGC request, if not provided the service assumes the latest one. If its present in the path url, then the client should decide whether to use it as it is or ask for a different one that it can understand ...

The additional layer parameters should be either already in the path url or they can be modified by the client if needed (e.g.,width and height in a WMS query ... the data package producer can provide a default ones, but this should be setup in the query by the mapping client depending on its own screen resolution).

maesbri commented 5 years ago

Different Resource Representations: I have added the "mapview" property, so now, we can provide an alternative "view" of the georesource via a wms or similar mapping service that can be added as layer (not a wfs or wcs) in a map client. Check the resource specification for more details.

maesbri commented 5 years ago

I have updated the json descriptor for the Naples data package according to the latest changes I made today in the specification.

therter commented 5 years ago

ok, then we can get the wms information from the mapview property and we can use the same format within the internal data model of the Map-Component.

p-a-s-c-a-l commented 5 years ago

This can be considered done.