misalignment between GeoDCAT-AP dct:Standard properties and XSLT transformation rules

NielsHoffmann commented 1 month ago

I'm looking into the xslt -> GeoDCAT-AP transformation and validation for the 3.0.0 pilot, participating from Geonovum (NL).

I'm testing with metadata records from the dutch national georegister that are inspire/HVD items.

Properties like referenceSystem and serviceProtocol (and probably some more) are either double typed as BlankNodeOrIRI and skos:Concept or only as skos:Concept in the shacl profile, but the xslt is not always typing those resources as skos:Concept.

This results in shacl validation errors. Happy to elaborate further or provide pull requests if needed.

Kind regards, Niels Hoffmann (Geonovum)

jakubklimek commented 1 month ago

Related to #111

bertvannuffelen commented 1 month ago

@NielsHoffmann, in addition to the issue to which @jakubklimek is referring, this is in general a hard to resolve issue.

It is actually related with what the expectations are from a real data exchange. It finds its origin in the fact that we do not define in our data specifications the syntax of data exchange between two systems, but leave that open for interpretation.

_:ds a dcat:Dataset;
   dcat:theme <http://publications.europa.eu/resource/authority/data-theme/AGRI>.

seems very well defined. The URI nal:AGRI is from a trusted source, it deferences, and the returned machine readable information is very elaborated (including the knowledge is is a skos:Concept).

However the data specification leaves this knowledge of 'trust' open. Why is it wrong that the following is shared?

_:ds a dcat:Dataset;
   dcat:theme <http://publications.europa.eu/resource/authority/data-theme/AGRI>.

<http://publications.europa.eu/resource/authority/data-theme/AGRI> a skos:Concept.

Sharing the last triple is actually superfluous. It does not add anything. It only enlarges the payload. It is unclear where to draw the line of adding information that can be retrieved by reference. Shall we add the label? In all EU languages? With the reference to the conceptscheme it belongs? The full definitions, or mappings to other codelists? The boundaries are very difficult to draw in a generic way from a data specification perspective.

In a concrete data exchange, the data exchange context may stipulate that all context information about the codes used must be shared. Because the recieving system does not make any assumption on prior knowledge. But at the same time, the agreement might be that just sharing the code ID (here the URI) is more as sufficient. Because the recieving system has uploaded in its background knowlegde the full codelist.

Note that in dataspecification there is also often a case where codes are being used with a lot of agreement as shown above together with code where the agreement is very low. For instance considering a case where the value is not a proper dereferenceable URI, the following is then intended according to the data specification.

_:ds a dcat:Dataset;
   dct:subject _:blankNode1.

_:blankNode1 a skos:Concept;
   skos:prefLabel "Agriculture, fisheries, forestry and food"@en.

I intentionally also changed in the example the property, because such variations in data quality expectactions may change from property to property, while semantically (structurally) the same range information is expected. Making different choices in generic data specifications per property goes to far. That is left for implementations.

I agree this feels uncomfortable that we cannot settle ourselves to one data exchange expectation. But it also has benefits. If the validation shows the error for the first example, then one has to question ourselves if information is missing. In this case, it is not, because the dereferencing will resolve it. With that information one can decide how to adapt the data exchange validation process: either by augmenting the validator with the background knowlegde of the codelist, either by removing the test and trusting that the supplied values are skos:Concept. Either approach is fine. In this way the validation process will be molded to the expectations of the actual data exchange.

NielsHoffmann commented 1 month ago

@bertvannuffelen Thank you for your elaborate response. One forward would possibly be to split the shacl file into multiple files, like already has been done for DCAT-AP 3.0 (and DCAT-AP-NL30).

SEMICeu / GeoDCAT-AP

misalignment between GeoDCAT-AP dct:Standard properties and XSLT transformation rules #141