Class validation (explicit)

costezki commented 7 years ago

You declare a violation in property shape dataset-conformsTo if the object (sh:class) is not a http://purl.org/dc/terms/Standard.


dcat:Dataset
  rdf:type sh:Shape ;
  sh:property [
      sh:predicate dcterms:conformsTo ;
      sh:class dcterms:Standard ;
      sh:name "conforms to" ;
      sh:severity sh:Violation ;
    ] ;```

My[Makx] question is how you can validate this in a case where the object does not explicitly declare itself to be a dct:Standard? 

In addition, in OWL terms, doesn’t something become a dct:Standard as a result of the fact that it is made the object of dct:conformsTo? I guess the question applies to all class validations.

costezki commented 7 years ago

Short answer: the validator needs to have access to object node declaration/definition.

Long answer: only explicit statements shall be validated and no OWL inference is/shall be taken into consideration because the former operates in a closed world whereas the latter is an open world assumption. In the above example, maybe the object of dct:conformsTo is a dct:Standard (which is the correct case but not explicitly stated) but it may be something else (which is the wrong case and also not stated).

The same holds for validation of whether the individual belongs to a controlled list or nor. The controlled list has to be loaded as part of the data in order to perform successful validation, otherwise we have no means to check that it is the case and thus report a violation.

makxdekkers commented 7 years ago

I see the point. However, it seems to me that the URI that identifies the object may not provide the information about what it is. In this case, I am pretty sure that descriptions of guidelines (even if they exist in RDF) do not declare themselves to be a dct:Standard. Could a minimum check be that the value is a URI rather than a string? I am also thinking of cases where the object is identified by a URI, but the URI is temporarily unavailable -- this would then lead to a violation for all metadata that refers to that URI which seems unfair.

costezki commented 7 years ago

Yes, We could set the constraint that the Objects are URIs, this is a straight forward check.

So and is the specific case of dct:Standard, shall replace the class constraint to arbitrary URIs? What other property/class pairs fall into this scenario (when we expect not to have access to their definitions)?

makxdekkers commented 7 years ago

I think there are several properties where the object may not explicitly declare to be an instance of the class that is expected in the DCAT-AP specification.

I can imagine that not every thing that is the object of a Dataset/dct:accessRights statement declares itself to be a dct:RightsStatement, or even less likely that the Web page that is linked by a Dataset/dcat:landingPage statement will say somewhere that it is a foaf:Document.

Other cases may be more complex; for example, an organisation may be declared as an instance of rov:RegisteredOrganization which is a subclass of org:FormalOrganization, a subclass of org:Organization which is a subclass of foaf:Agent. So sometimes you might have to follow your nose along a chain of relationships until you find whether the object satisfies the class requirement.

There is also the case of http://publications.europa.eu/resource/authority/continent/AFRICA, the URI of a continent to be used as the object of Dataset/dct:spatial, which is expected to be an instance of dct:Location. http://publications.europa.eu/resource/authority/continent/AFRICA is declared a skos:Concept but also a http://www.w3.org/2003/01/geo/wgs84_pos#SpatialThing and a http://www.opengis.net/ont/geosparql#SpatialObject, neither of which (as far as I can find) declares a relationship with dct:Location. This might be a situation that occurs with other NALs too.

costezki commented 7 years ago

Makx, your comment leads me to think about two things:

drawing between what may be considered internal and what is inherently external to the DCAT-AP
for proper harmonization which aspects are easy to change and which ones are "more correct" to change.

Firstly, by distinguishing internal vs external I mean the classes whose instances we target in our validation (in our case dcat:Catalog, dcat:CatalogRecord, dcat:Dataset etc.) and classes that whose instances are adjacent thus do not constitute the primary target of the validation (for example skos:Concept, ConceptScheme, dct:Location, dct:Frequency, dct:RightsStamenet etc.)

Drawing this distinction may be useful in defining the shape definitions only for the classes considered internal for an AP (only those that we want and can ensure a valid instantiation). In our case this includes only dcat:Dataset, dcat:Catalog, dcat:CatalogRecord, dcat:Distribution and leaves aside (es external) the rest of them: dct:Frequency, dct:LicenseDocument, dct:LinguisticSystem, dct:Location, dct:MediaTypeOrExtent, dct:PeriodOfTime, dct:ProvenanceStatement, dct:RightsStatement, dct:Standard, spdx:Checksum, skos:Concept, skos:ConceptScheme, vcard:Kind, adms:Identifier, foaf:Agent, foaf:Document (exhaustive list). Assuming that we do not always have the power to change the way external resources are instantiated such as controlled lists (even if we can actually influence it).

Doing that will also eliminate the need to have the complete definition of external resources during the validation process and thus a mere URI reference will suffice; this way putting the responsibility of valid instantiation on the publishers of those resources (e.g. MDR for EuroVoc and NALs).

Secondly, about the harmonization of external and internal resources: We can ask the responsible bodies to adapt the controlled vocabulary definitions for example mdr:Country/mdr:Continent to also instantiate dct:Location; OR we can extend the DCAT-AP shape definition to be more permissive and accept dct:Location or geo:SpatialThing or geosparql:SpatialObject.

To conclude: If we can restrict the scope of shape definition to core/internal classes only which will lead to a massive cuts shrinking it to only dcat:Dataset, dcat:Catalog, dcat:CatalogRecord, dcat:Distribution and we can eventually move the other ones into an "optional" module.

Additionally we can do both relax the shape constraints where possible to allow instances of more than one class (e.g. dct:Location, goe:SpatialThing and goe:SpatialObject) as we already do for xsd:Date and xsd:DateTime and also ask publishers of controlled vocabularies to provide a "preferred" instantiation of the controlled vocabularies for example mdr:Country as dct:Location.

makxdekkers commented 7 years ago

@costezki I agree that it would make sense to limit the strong validation to the 'internal' classes. If we really wanted to say something about the 'external' ones, could an alternative approach be to check them with three possible outcomes? E.g. a. PASS if the resource identified by the URI declares itself to be an instance of the expected class b. WARN if the resource identified by the URI is either unavailable, does not declare its class or declares a class different from the one expected c. FAIL if there is no URI. It seems to me that all SHACL validators are going to run into this problem; as long as not everything is flawlessly described according to Semantic Web rules, you are bound to encounter links to things that do not play the game correctly.

andrea-perego commented 7 years ago

@makxdekkers , @costezki ,

I agree with the idea of not requiring linked resources (foaf:Agent's, dct:Standard's, skos:Concept's) to have an explicit class declaration when they are denoted by URIs.

The approach I was thinking is to use an OR here - i.e., the shape is valid if (a) the object is denoted by a URI OR (b) it matches the shape defined for it.

This general rule should be customised depending on whether DCAT-AP requires or not the target resource to have a URI. Actually, if I'm not mistaken, DCAT-AP does not require the use of URIs, but rather it recommends them for controlled vocabularies, etc. So, this should always result in a WARNING (sh:Warning), not an ERROR (sh:Violation).

About this point:

If we can restrict the scope of shape definition to core/internal classes only which will lead to a massive cuts shrinking it to only dcat:Dataset, dcat:Catalog, dcat:CatalogRecord, dcat:Distribution and we can eventually move the other ones into an "optional" module.

I'm a bit concerned about the idea of moving them into an optional module. Some of these external classes (e.g., vcard:Kind) have to carry information that needs to be "attached" to the dataset record (e.g., contact email), so it is important that they can be validated.

In any case, I don't see a specific problem in keeping in the same shapes graph both the "internal" and the "external" ones: whether they will be used or not depends on how the property shapes are defined.

I would restrict the use of "modules" to very limited and clearly identified use cases.

costezki commented 7 years ago

mandatory class validations moved to an additional module dcat-ap-mandatory-classes.shapes.ttl

SEMICeu / dcat-ap_shacl

Class validation (explicit) #7