SEMICeu / iso-19139-to-dcat-ap

Reference XSLT-based implementation of GeoDCAT-AP
European Union Public License 1.2
15 stars 9 forks source link

Check compliance with SHACL definitions #22

Open andrea-perego opened 3 years ago

andrea-perego commented 3 years ago

A first test on the XSLT output against the SHACL definitions returns validation errors concerning instances whose class is not explicitly specified (as dct:LinguisticSystem, dct:Frequency, and other code list values).

andrea-perego commented 3 years ago

A first test on the XSLT output against the SHACL definitions returns validation errors concerning instances whose class is not explicitly specified (as dct:LinguisticSystem, dct:Frequency, and other code list values).

XSLT revised accordingly, via PR https://github.com/SEMICeu/iso-19139-to-dcat-ap/pull/23

There are still some validation issues. The following list is limited to those raising a sh:Violation:

  1. Missing foaf:primaryTopic: The GeoDCAT-AP XSLT uses foaf:primaryTopic only when the resource has a URI, otherwise it uses its inverse foaf:isPrimaryTopicOf. To be verified if this is acceptable - I raised this issue in DCAT-AP: https://github.com/SEMICeu/DCAT-AP/issues/174
  2. Too many instances of dcat:bbox and locn:geometry: The DCAT-AP SHACL constraints allow only 1 instance for each of these properties, whereas GeoDCAT-AP allows multiple instances provided they have a different datatype. Accordingly, the GeoDCAT-AP XSLT returns 3 instances: GML, WKT, and GeoJSON. I raised this issue in DCAT-AP: https://github.com/SEMICeu/DCAT-AP/issues/175
  3. The range of dct:format is not dct:MediaTypeOrExtent: This is a bug. Following the revision done in DCAT-AP 2.0.1, GeoDCAT-AP 2.0.0 makes use of dct:MediaType. An issue was already raised in DCAT-AP - see https://github.com/SEMICeu/DCAT-AP/issues/173
  4. Value of dct:format must be a URI reference: The GeoDCAT-AP XSLT uses URI references when they are specified in the source records; otherwise, it uses a blank node, with the format specified via a textual label (rdfs:label). To be verified if this SHACL constraint is correct: in DCAT-AP, using a URI reference for values from controlled vocabularies is recommended (sh:Warning), not mandatory (sh:Violation). I raised an issue in DCAT-AP: https://github.com/SEMICeu/DCAT-AP/issues/176 . On the other hand, it is worth considering whether the GeoDCAT-AP XSLT should be revised to map textual labels in the original records to URIs, whenever possible (see proposal in https://github.com/SEMICeu/iso-19139-to-dcat-ap/issues/24).
  5. Code list values typed as skos:Concept's must have a skos:prefLabel: This happens when these values are URI references. It can be fixed by importing the additional controlled vocabularies used by GeoDCAT-AP in the GeoDCAT-AP SHACL shapes graph (or maybe in the DCAT-AP one). Therefore, there is no longer the need of typing them as skos:Concept's, and the corresponding revision done via PR https://github.com/SEMICeu/iso-19139-to-dcat-ap/pull/23 can be rolled back.
andrea-perego commented 3 years ago
  1. [...] there is no longer the need of typing them as skos:Concept's, and the corresponding revision done via PR #23 can be rolled back.

Fixed via PR https://github.com/SEMICeu/iso-19139-to-dcat-ap/pull/26

andrea-perego commented 3 years ago
  1. Value of dct:format must be a URI reference: The GeoDCAT-AP XSLT uses URI references when they are specified in the source records; otherwise, it uses a blank node, with the format specified via a textual label (rdfs:label). To be verified if this SHACL constraint is correct: in DCAT-AP, using a URI reference for values from controlled vocabularies is recommended (sh:Warning), not mandatory (sh:Violation). I raised an issue in DCAT-AP: SEMICeu/DCAT-AP#176 . On the other hand, it is worth considering whether the GeoDCAT-AP XSLT should be revised to map textual labels in the original records to URIs, whenever possible (see proposal in #24).

This has been addressed (at least partially) in PR https://github.com/SEMICeu/iso-19139-to-dcat-ap/pull/26 , by adding mappings from textual descriptions of distribution formats to URIs. For more details, see #24