SEMICeu / DCAT-AP

This is the issue tracker for the maintenance of DCAT-AP
https://joinup.ec.europa.eu/solution/dcat-application-profile-data-portals-europe
74 stars 24 forks source link

SHACL: Constraints on locn:geometry, dcat:bbox, and dcat:centroid #175

Open andrea-perego opened 3 years ago

andrea-perego commented 3 years ago

The current SHACL constraints allow max 1 instance of these properties.

However, GeoDCAT-AP allows multiple instances of them, provided they use different geometry encodings (i.e., datatypes). Based on this, the GeoDCAT-AP XSLT returns 3 instances of these properties, using each a different datatype: GML, WKT, GeoJSON. As a result, a validation error is raised.

Unless there is a strong reason to keep this cardinality constraint, it would be desirable to revise it accordingly.

Please also note that the current solution is not optimal, in terms of interoperability. Different geometry encodings are used, depending on platform support, purpose, and use (e.g., not all geometry encodings are equally expressive - in the sense that some of them can represent only a subset of the existing geometry types). To deal with this, GeoDCAT-AP allows, since version 1, multiple geometry encodings, requiring nonetheless that at least GML or WKT is used, in order to ensure interoperability (see §B.6.10 of the GeoDCAT-AP 2 specification). This approach is also documented in the W3C/OGC Spatial Data on the Web Best Practices (see Best Practice 5).

So, if only 1 instance of these properties should be allowed, a constraint should be put also on the datatype (i.e., geometry encoding) to be used, otherwise interoperability will not be ensured.

bertvannuffelen commented 3 years ago

The multiple cardinality is actually guaranteed by the multiplicity of geographical coverage (dct:spatial) (see sectoin 4.4.2).

_:mydataset dct:spatial _:location1, _:location2.

The representation of the different locations can be expressed in multiple ways: as a precise polygon (GML, WKT), or as an approximation (centroid, bbox). It is fair to state that for DCAT-AP information about a location is restricted to one description per expession: so not multiple bbox descriptions for a single location. That is what the cardinality constraints on e.g. dcat:bbox express.

So DCAT-AP allows multiple locations to be associated, and the locations be expressed in different representations.

andrea-perego commented 3 years ago

@bertvannuffelen said:

So DCAT-AP allows multiple locations to be associated, and the locations be expressed in different representations.

This is, however. semantically different.

Multiple instances of dct:spatial are meant to point to different locations - the rationale being that the spatial coverage may include multiple, non-contiguous, geographical areas (dct:Location's).

On the other hand, multiple instances of locn:geometry (dcat:bbox, dcat:centroid), each using a different datatype (WKT, GML, GeoJSON), denote different encodings of the same geometry (bounding box, centroid) for the same geographical area (dct:Location).

The situation is similar to the one of properties, like dct:title, which are supposed to occur only once, but multiple instances are allowed if they use a different language.

bertvannuffelen commented 3 years ago

I understand. It is a limitation that only one (not specified) serialization of the geographical description is allowed.

But is it not the case that every serialization GML, WKT, GeoJSON can be converted into each-other using an deterministic algorithm? In that case the constraint is not that problematic, as a consumer of the data can always covert it to its desired serialization. Am I wrong?

andrea-perego commented 3 years ago

Yes, a conversion could be done (with some caveat). However, to my knowledge, not all platforms are doing this.

There was a long discussion on this topic in the GeoDCAT-AP WG, at the time of the first version of the specification, about allowing just 1 geometry encoding. The final conclusion was that this was not going to work, as different platforms were supporting different geometry encodings. Hence, the decision of adopting the current approach.

This might be revisited in future releases of GeoDCAT-AP (although no issues have been raised so far). But at the moment the problem is that the existing GeoDCAT-AP records (as those on the European Data Portal) have multiple instances of these properties, and therefore do no pass the DCAT-AP validation check.

bertvannuffelen commented 2 years ago

@andrea-perego unfortunately this issue has not been addressed during any of the WG meetings. My apologizes, this slipped through the mazes of the net. I will propose to label it as future work.

To summerize the request: But if I reading your request correctly: you request to allow more than one serialization for geometries. Am I correct that the situation is comparable with language-aware strings where the restriction is not more than 1 value per language. And in this case the same interpretation is requested: (a) namely not more than 1 geo-serialization is required? Or (b) should we lift it even more and allow any amount of value (e.g. to allow a WKT string with CRS A and a WKT string with CRS B)?

To summerize the current ways

init-dcat-ap-de commented 1 year ago

I was not able to find out what the solution in 3.0 will be.