SEMICeu / DCAT-AP

This is the issue tracker for the maintenance of DCAT-AP
https://joinup.ec.europa.eu/solution/dcat-application-profile-data-portals-europe
78 stars 24 forks source link

SHACL rules differ in shacl vs html directory (e.g byteSize) #399

Open barthanssens opened 4 weeks ago

barthanssens commented 4 weeks ago

There seems to be a difference between the separate "HTML" SHACL files and the (aggregated ?) version published in the SHACL directory

E.g. https://github.com/SEMICeu/DCAT-AP/blob/master/releases/3.0.0/html/shacl/shapes.ttl still lists dcat:byteSize as type xsd:decimal, while https://github.com/SEMICeu/DCAT-AP/blob/master/releases/3.0.0/shacl/dcat-ap-SHACL.ttl uses the correct type xsd:nonNegativeInteger

barthanssens commented 4 weeks ago

Also the other way round: the shacl/dcat-ap-SHACL explicitly checks if e.g. accessURL is an rdfs:Resource, while the html/shacl/shapes.ttl does not (nor does html/shacl/ranges.ttl)

NielsHoffmann commented 1 week ago

I also noted this discrepancy, in relation to the GeoDCAT-AP 3.0.0 pilot... e.g. DataSet spatialResolutionInMeters is only specified as maxCount 1 and datatype xsd:decimal in the separate shapes.ttl.

Whereas in the dcat-ap-SHACL.ttl it is specified as xsd:decimal as well as nodeKind Literal.

The respec document only specifies the xsd:decimal property. So I tend to believe the seperate files are the 'official correct' ones. This poses a problem with the GeoDCAT pilot though, as the GeoDCAT-AP shacl file seems to be based on the dcat-ap-SHACL file.

init-dcat-ap-de commented 1 week ago

My guess would be that the shapes in releases/3.0.0/shacl/dcat-ap-SHACL.ttl are the "official" ones. They use the uuids for every single property instead of blank nodes and that was a planned improvement, afaik. (So you can easily deactivate shapes and add error messages in your language.)

The need for nodeKind "Literal" is implicit if you use a data type. The range is a literal typed as xsd:decimal.

NielsHoffmann commented 1 week ago

The decision to use uuids for properties is a seperate issue/decision from the decision to merge all shapes into 1 file.

I think it is indeed very good to have unique names for each individual property. But I think it is actually a 'cleaner architecture' to provide separate files for the different levels of constraints, the way the original shapes in the /html/shacl/ folder are organized.

For those that want all constraints in one file, it is easier to merge the separate files into one, then it is to provide 1 file and leave it to the users to split it out again into separate files. Also while authoring/maintaining the shacl files I think it is far easier to work with multiple small files that one big one.

The scenarios I would typically implement are checking for required properties (which is what the shapes.ttl specifies) before checking the range or recommended properties. So from that perspective the different shacl files make perfect sense to me.

barthanssens commented 1 week ago

Well, I also prefer separate files, but one big file is also fine with me... But at least the results should be the same :-)

init-dcat-ap-de commented 6 days ago

I would assume that the single-file-solution is auto-generated from the SSOT-UML-diagram while the multi-file-solution is a continuation of the handwritten rules for version 2.

If I am right, there would probably no further work on the multi-file-solution.

NielsHoffmann commented 5 days ago

I would like the respec document to be explicit about this. Currently section 17 of the specification describes the separate files with their respective validation profiles. In that same section in the last paragraph the version with uuid's in introduced as 'extra'.

The validator (https://www.itb.ec.europa.eu/shacl/dcat-ap/upload) provides both options (separate files, and the uuid version) but gives different validaion results.

GeoDCAT-AP efforts are built on top of this profile so to me it seems important to be clear about the basis we are currently building on top of. see: https://github.com/SEMICeu/GeoDCAT-AP/issues/142