Open amivanoff opened 1 month ago
Well, validator accepts the spec and validates something. And DCAT-AP 3.0 has the same repetitions in https://github.com/SEMICeu/DCAT-AP/blob/master/releases/3.0.0/shacl/dcat-ap-SHACL.ttl and no one cares. But there are some disadvantages:
dcterms:publisher
property: one for each constraint, in different parts of a spec file. And each property shape requires a unique IRI and 5 text lines, mainly with repetitions:
shacl:maxCount
shacl:maxCount
shacl:BlankNodeOrIRI
foaf:Agent
shacl:class foaf:Agent
subsumes/overrides shacl:nodeKind shacl:BlankNodeOrIRI
.sh:order
and other metadata).The shapes are automatically generated via a publication system that is used by Semic, to be checked at overall but I will keep an eye on this to see if it improves, thanks for reporting.
The fact that for a property it is splitted on multiple property shapes it is by design, to be modular.
Thanks again, any feedback on the model?
What kind of feedback are you interested in and what type of issues are your willing to address at this lifcycle stage of the spec? The minor ones (e.g. collisions in the spec like ones reported previousely by @VladimirAlexiev)? Or the bigger ones (missing properties/classes)?
At the first stage of the spec consumption we are dealing with a bunch of minor issues (i.e. the first type of issues).
For example, another issue is "Different prefixes for the same dc/terms namespace in DCAT 3 and in MLDCAT-AP profile, also conflicting with the established common practice".
# in DCAT 3 ontology
@prefix dcterms: <http://purl.org/dc/terms/> .
# in MLDCAT-AP SHACL
@prefix dc: <http://purl.org/dc/terms/> .
The issue looks similar to this issue https://github.com/SEMICeu/MLDCAT-AP/issues/7 by @VladimirAlexiev.
According to the prefix lookup service https://prefix.cc
To summarize:
The "dc" prefix should be changed to "dcterms" in MLDCAT-AP for the following reasons:
The better (but it seems way more disruptive) way could be to switch the overall DCAT stack (DCAT ontology, all AP profiles) from "dc" and "dcterms" to "dct" prefix.
Should we report this kind of issues here? Or it is better to address here only bigger ones (missing properties/classes)?
We "un-modularized" property shapes, made the SHACL shapes more "human-friendly" (at least we hope so :) ) and fixed some our issues/struggles in our fork repository https://github.com/agentlab/MLDCAT-AP while trying to stay compatible as much as possible with the spec. Maybe it could be helpful to someone with similar goals.
@amivanoff your comment and your resolution shows one of the main challenges for SHACL artefacts. Your objective is to have a human manageable, and somehow human readable formulation of the contraints expressed in that file.
The formulation you created has a number of considerations:
But the most challeging aspect is maintenance and compleneteness. Our generators can generate a variant of your suggestion but because of 1 we switched. For the usage of validation (use the file as-is in a shacl engine) the condensed or splitted version has no impact.
DCAT-AP devotes a whole section on validation (https://semiceu.github.io/DCAT-AP/releases/3.0.0/#validation-of-dcat-ap). You will see that there a human managed collection of shapes is added. Those are manually maintained because they target, as explained there, various validation situations. In principle, a large part of those could be done by referring to the generated ones (a first approach for that use is findable in https://semiceu.github.io/DCAT-AP/releases/3.0.0-hvd/#validation).
This brings us to the main advantage for the DCAT-AP ecosystem is that a collection of named individual constraints allows to relate requirements (and in this case the SHACL formulation of the requirement) to interlink. The DCAT-AP profile of the Swedisch Geocatalogue can refer directly to a requirement in the SHACL. It makes comparisons in that way easier and decisions more transparant. Towards interlinked specifications.
Of-course improvements can be made, and some of your and other comments on the SHACL indicate issues, and will be resolved over time. But in this ecosystem of overlapping use of the same data with respect to different requirements we can take benefit of the power of linked data to offer additional services.
As a last note: the SHACL of the SEMIC specifications is a consequence of what is written in the HTML. Not vice versa. It only reflects the constraints that easily can be written in SHACL.
I hope this answer provides you some insights on the why of the taken approach.
Well, today different shapes representations with gifferent goals in ming are technically allowed and even advised by some people. Sometimes in the future the "brave new world" of RDF-Star (or RDF1.2) will unite us all (with possibilities to add anything to a triple)... If we live long enough ))) In the meantime, because we rely heavily on property shapes completeness in our dynamic Web UIs generation and dynamic shapes-based SPARQL queries generation, we will start with non-modular shapes and then test "normalized" version later on.
@bertvannuffelen your reason 1 is false: the examples given in the description do NOT describe the individual error. They all describe the field, and are all the same.
Two more defects:
@bertvannuffelen your reason 1 is false: the examples given in the description do NOT describe the individual error. They all describe the field, and are all the same.
@VladimirAlexiev, if you divide "one-propertyshape-for-one-property" into several "smaller-propertyshapes" with only one constraint in each of this "smaller-propertyshapes", then the shacl:message
in each of it will be the "individual error message", I think.
I am seen such an approach to the SHACL validation for the first time in years. But its working 😊
Example:
1 One "one-propertyshape-for-one-property" for the dc:publisher
property of dcat:Catalog
class. The error message here is for the whole property (field).
<#CatalogShape/publisher> rdfs:seeAlso "https://semiceu.github.io/MLDCAT-AP/releases/2.0.0#Catalogue.publisher";
shacl:name "publisher"@en;
shacl:description "An entity (organisation) responsible for making the Catalogue available."@en;
shacl:path dc:publisher;
shacl:nodeKind shacl:BlankNodeOrIRI; #not working here
shacl:class foaf:Agent;
shacl:minCount 1;
shacl:maxCount 1;
shacl:message "All publisher's constraints are wrong"@en .
2 Several "smaller-propertyshapes" the dc:publisher
property of dcat:Catalog
class. The error messages here are for the individual constraints of a property (field).
<#CatalogShape/93f73e69bb03d2928fcf758a253ef316becdf9b9> rdfs:seeAlso "https://semiceu.github.io/MLDCAT-AP/releases/2.0.0#Catalogue.publisher";
shacl:name "publisher"@en;
shacl:description "An entity (organisation) responsible for making the Catalogue available."@en;
shacl:path dc:publisher;
shacl:nodeKind shacl:BlankNodeOrIRI; #will work only if you disable the shacl:class rule b3ec0655204c62a2531244aaeab12f1a2c5e5b5d
shacl:message "Only publisher's nodeKind constraint is wrong"@en .
<#CatalogShape/b3ec0655204c62a2531244aaeab12f1a2c5e5b5d> rdfs:seeAlso "https://semiceu.github.io/MLDCAT-AP/releases/2.0.0#Catalogue.publisher";
shacl:name "publisher";
shacl:description "An entity (organisation) responsible for making the Catalogue available."@en;
shacl:path dc:publisher;
shacl:class foaf:Agent;
shacl:message "Only publisher's class constraint is wrong"@en .
<https://semiceu.github.io/MLDCAT-AP/releases/2.0.0#CatalogShape/a0ccdf3bd7f5d161d07f375a26e68c18ca91dc19> rdfs:seeAlso "https://semiceu.github.io/MLDCAT-AP/releases/2.0.0#Catalogue.publisher";
shacl:name "publisher"@en;
shacl:description "An entity (organisation) responsible for making the Catalogue available."@en;
shacl:path dc:publisher;
shacl:minCount 1;
shacl:message "Only publisher's minCount constraint is wrong"@en .
<#CatalogShape/67dcdb36167ca7969c0532898e11a98e9c2a80f5> rdfs:seeAlso "https://semiceu.github.io/MLDCAT-AP/releases/2.0.0#Catalogue.publisher";
shacl:name "publisher"@en;
shacl:description "An entity (organisation) responsible for making the Catalogue available."@en;
shacl:path dc:publisher;
shacl:maxCount 1;
shacl:message "Only publisher's maxCount constraint is wrong"@en .
Two more defects:
- seeAlso should be a URL
- the shapes better have rdf:type (though the spec says it's optional)
Yes, seeAlso as string -- this is definitely a bug. rdf:type shacl:PropertyShape
-- this is desirable like "good style".
I am seen such an approach to the SHACL validation for the first time in years. But its working 😊
To me it looks like "technology abusing" but yeah, "it's working"...
And I could not see any other way to do a granular internationalizable error messages for each constraint of a property shape on each of all EU languages... Besides deep internationalization of the Jena SHACL Validator internal mechanics (or RDF4J, or any other open source SHACL validator).
But I presume, Jena's maintainers would not be happy to make Jena speaks another 23 languages besides English.
Do not know what @HolgerKnublauch thinks about this "language attack" on shapes...
It seems, the SEMIC guys takes localization/internationalization of validator error reports VERY seriously. They want to provide as much as possible of validator report to the user (i.e. integration specialist?) on a local language. But they also don't want any compromises on error details. So they want both error messages:
Maybe it is better to call it "a bunch of property constraints", not a "property shape". Because in this case "property shape" is not specified explicitly in spec in it's complete form. It is not reifyable/addressable (no IRI). Property shape is constructed by validator in runtime as a conjunction of a class shape and a shacl:path
.
In the current SHACL version it is indeed required to define a separate shape whenever you want to specify a different message. With the upcoming 1.2 I hope we can generalize this so that reification can be used to attach message (and severity and possibly more) to each constraint triple. That should help here.
@amivanoff
I am seen such an approach to the SHACL validation for the first time in years. To me it looks like "technology abusing". But its working 😊
This is not abuse. This is a way to make individual checks more atomic, thus easier to generate.
the detailed validator error message on particular constraint in English (because validator will never speak Portuguese), authored by validator engine developers; and the localized Portuguese (and other langs) error message on particular constraint from this "one constraint property shape"
The spec says that:
messages
, but they must be in different languages.
This is exactly like skos:prefLabel (sh:uniqueLang
). Presumably, these messages say the same thing, i.e. are mere translations.message
, the engine should generate one at its discretion.
And yes, it's reasonable to assume that Jena-generated messages will be in English only for the time being.@amivanoff Do you see any spec change required for multilingual translations?
required to define a separate shape whenever you want to specify a different message
Yes, but only if they say different things. Multiple shapes are not needed to accommodate multiple translations.
@VladimirAlexiev, yes, one shape per constraint, and this shape contains multiple translations for the message
property.
This issue enables a capability to preserve property shape reification (if authors will be willing to).
But one aspect still stands unhandled in this issue
It is related to the "message substitution" semantics. In "message substitution/redefinition" example from the issue above it could be only one of two cases:
In case 2, we lose detailed information from a validator "but found 2" (i.e., how many constraint violations for this object-propertyshape have been found). So with "error message substitution" we could translate general messages only which do not take into account specific data situation.
In the DCAT-AP SHACL profile colleagues tried to dump sh:message
altogether and use a custom unresolvable https://purl.eu/ns/shacl#message
to save both: the message from a validator and the translated message from shape.
In the released DCAT 3.0 version they did not use any of message
at all.
All of it raises a question of sh:message
usefulness in general. With sh:message
we could translate "general advices" only, at the cost of a more detailed error message from validator. We could not have both messages (detailed message from vaildator in english and "general advice" message, translated to another language). We could not have a detailed validator message translated.
I could not grasp if this issue could help with all above
@amivanoff
I am seen such an approach to the SHACL validation for the first time in years. To me it looks like "technology abusing". But its working 😊
This is not abuse. This is a way to make individual checks more atomic, thus easier to generate.
exactly, but also
- multiple profile management becomes simpler: one can point to one individual constraint rather to a collection of constraints.
- cross-referencing can be made more precise: e.g. the seeAlso can be for each constraint pointing to the appropriate location.
All these have to do with use-cases of designing a business UI for an Validation service where the result is guiding the user to the most important issues to resolve. Today validators like https://www.itb.ec.europa.eu/shacl/dcat-ap/upload produce a technical table Error - message - relatedValue. And then the hunt is on. One has to be an RDF expert to find the source (which is in most cases trivial for an RDF expert) but the resolvement is harder.
To illustrate the above the following 3 values are licences found in an open data portal (value of dct:license). The first is acceptable but the second and probably also the 3 not.
http://dcat-ap.de/def/licenses/cc-by/4.0
"N06abcab9a78347dca72ba692979c3cdc"
http://dcat-ap.de/def/licenses/CC%20BY%204.0
Being able to cross-reference to https://semiceu.github.io/DCAT-AP/releases/3.0.0-hvd/#c3 in case of HVD compliance is a valuable motivation for publishers to get at least rid of the second, but likely also for the third. That is different from the validator does not like it. With such a cross reference the RDF expert can more easily motivate the dataset owner (some publisher in some agency) to adapt its source metadata.
SHACL is in our context also a mean to provide service to non-technical RDF staff.
Property shapes with the same
shacl:path
and different generated IRIs repeats twice or sometimes even 3-4 times in the Turtle spec. JSON-LD affected also.For example, several property shapes repeats just for the
CatalogShape
class shape;Just one concrete example for the
foaf:homepage
property shape (lang tag stripped):Some times property some shape variants misses cardinality restriction. Some times it differs with
shacl:nodeKind shacl:BlankNodeOrIRI
orshacl:class
.If it needs a variability in value restrictions (BlankNodeOrIRI or concrete class), the correct way is to use
sh:or
, I think.If property shape's IRIs weren't random, this would be a minor problem 😊 But as it is, it seems it's an error. And an "adoption blocker" one.