DOI-DO / dcat-us

Data Catalog Vocabulary (DCAT) - United States Profile Chief Data Officers Council & Federal Committee on Statistical Methodology
Other
58 stars 6 forks source link

SHACL shapes do not pass SHACL-SHACL validation #139

Closed ajnelson-nist closed 9 months ago

ajnelson-nist commented 12 months ago

Name: Alex Nelson

Affiliation: I am an employee of the National Institute of Standards and Technology. I am also a community member of the Cyber Domain Ontology in some leadership roles.

Type of issue: Schema (specifically, SHACL shapes)

Issue: A review of the file dcat-us_3.0_shacl_shapes.ttl in today's state raises several SHACL-SHACL validation errors -- that is, errors specific to SHACL syntax. Unfortunately, these errors cause the shapes graph to fail to load in a SHACL-executing engine. An example shape that has errors is dcat-us-shp:Document_Shape-creator (link is to today's version of that file).

In total, this pySHACL (version 0.24.0) command[^1], which runs SHACL-SHACL validation of the shapes graph before attempting to validate the data graph, reports 70 errors across the graph:

# (current working directory: top source directory of repository)
pyshacl \
  --metashacl \
  --shacl shacl/dcat-us_3.0_shacl_shapes.ttl \
  docs/examples/activity.ttl

Recommended change(s):

[^1]: Disclaimer: Participation by NIST in the creation of the documentation of mentioned software is not intended to imply a recommendation or endorsement by the National Institute of Standards and Technology, nor is it intended to imply that any specific software is necessarily the best available for the purpose.

ajnelson-nist commented 12 months ago

Apologies, I should have included this in the initial post: This contribution is made only by myself, and is not being made by the National Institute of Standards and Technology or any other organization.

ajnelson-nist commented 11 months ago

I've drafted some changes to address SHACL-SHACL validation errors, and will be happy to send some more PRs akin to #162 after this week's holiday.

However, I also tried running the shapes against the examples in this repository, and saw there are several validation issues raised. A shell transcript is at the end of this post.

I'd made a remark in my initial post on adding a Continuous Integration process. That seems like it might be a bigger discussion that will expand into whether the examples should conform to all, or just some, of the SHACL shapes. Would that be better handled in a separate Issue? (I'm not sure how amenable your workflow is to receiving new Issues at the moment.)

Shell transcript, using pyshacl[^1]:

pyshacl \
        --metashacl \
        --shacl shacl/dcat-us_3.0_shacl_shapes.ttl \
        docs/examples/activity.ttl
Validation Report
Conforms: False
Results (10):
Constraint Violation in MinCountConstraintComponent (http://www.w3.org/ns/shacl#MinCountConstraintComponent):
    Severity: sh:Violation
    Source Shape: dcat-us-shp:Concept_Shape-inScheme
    Focus Node: ex:CensusActivity
    Result Path: skos:inScheme
    Message: Less than 1 values on ex:CensusActivity->skos:inScheme
Constraint Violation in MinCountConstraintComponent (http://www.w3.org/ns/shacl#MinCountConstraintComponent):
    Severity: sh:Violation
    Source Shape: dcat-us-shp:Concept_Shape-prefLabel
    Focus Node: ex:CensusActivity
    Result Path: skos:prefLabel
    Message: Less than 1 values on ex:CensusActivity->skos:prefLabel
Constraint Violation in MinCountConstraintComponent (http://www.w3.org/ns/shacl#MinCountConstraintComponent):
    Severity: sh:Violation
    Source Shape: dcat-us-shp:Catalog_Shape-dataset
    Focus Node: ex:NationalCensus
    Result Path: dcat:dataset
    Message: Less than 1 values on ex:NationalCensus->dcat:dataset
Constraint Violation in MinCountConstraintComponent (http://www.w3.org/ns/shacl#MinCountConstraintComponent):
    Severity: sh:Violation
    Source Shape: dcat-us-shp:Catalog_Shape-description
    Focus Node: ex:NationalCensus
    Result Path: dcterms:description
    Message: Less than 1 values on ex:NationalCensus->dcterms:description
Constraint Violation in MinCountConstraintComponent (http://www.w3.org/ns/shacl#MinCountConstraintComponent):
    Severity: sh:Violation
    Source Shape: dcat-us-shp:Catalog_Shape-publisher
    Focus Node: ex:NationalCensus
    Result Path: dcterms:publisher
    Message: Less than 1 values on ex:NationalCensus->dcterms:publisher
Constraint Violation in MinCountConstraintComponent (http://www.w3.org/ns/shacl#MinCountConstraintComponent):
    Severity: sh:Violation
    Source Shape: dcat-us-shp:Catalog_Shape-title
    Focus Node: ex:NationalCensus
    Result Path: dcterms:title
    Message: Less than 1 values on ex:NationalCensus->dcterms:title
Constraint Violation in MinCountConstraintComponent (http://www.w3.org/ns/shacl#MinCountConstraintComponent):
    Severity: sh:Violation
    Source Shape: dcat-us-shp:Dataset_Shape-description
    Focus Node: ex:Census2020Dataset
    Result Path: dcterms:description
    Message: Less than 1 values on ex:Census2020Dataset->dcterms:description
Constraint Violation in MinCountConstraintComponent (http://www.w3.org/ns/shacl#MinCountConstraintComponent):
    Severity: sh:Violation
    Source Shape: dcat-us-shp:Dataset_Shape-identifier
    Focus Node: ex:Census2020Dataset
    Result Path: dcterms:identifier
    Message: Less than 1 values on ex:Census2020Dataset->dcterms:identifier
Constraint Violation in MinCountConstraintComponent (http://www.w3.org/ns/shacl#MinCountConstraintComponent):
    Severity: sh:Violation
    Source Shape: dcat-us-shp:Dataset_Shape-publisher
    Focus Node: ex:Census2020Dataset
    Result Path: dcterms:publisher
    Message: Less than 1 values on ex:Census2020Dataset->dcterms:publisher
Constraint Violation in MinCountConstraintComponent (http://www.w3.org/ns/shacl#MinCountConstraintComponent):
    Severity: sh:Violation
    Source Shape: dcat-us-shp:Dataset_Shape-title
    Focus Node: ex:Census2020Dataset
    Result Path: dcterms:title
    Message: Less than 1 values on ex:Census2020Dataset->dcterms:title

[^1]: Disclaimer: Participation by NIST in the creation of the documentation of mentioned software is not intended to imply a recommendation or endorsement by the National Institute of Standards and Technology, nor is it intended to imply that any specific software is necessarily the best available for the purpose.

fellahst commented 11 months ago

Alex,

Thank you so much for your detailed feedback. Your ticket is currently under review and we will come back to you as soon as possible with remediation for the issue you raised. Thanks for your patience.

hkdctol commented 10 months ago

Would like to discuss further as I don't have background/understanding on this

fellahst commented 9 months ago

An updated SHACL shapefile has been commited in the repository that fixes the issues. Also updated the activity,ttl to get the required fields to pass validation. A more complete example has been added in the Git repository that pass the validation: https://github.com/DOI-DO/dcat-us/blob/main/docs/examples/example1-dcat-us-3.0.ttl

ajnelson-nist commented 9 months ago

Hi @fellahst ,

Thank you for the updates!

I've checked the DCAT-US 3 SHACL graph as I did before, and it now appears to be conformant with SHACL syntactic requirements.

I noticed not all of the examples under docs/examples currently conform against the shapes, though. I tried this[^1] Bash one-liner:

ls docs/examples/*ttl | while read x; do echo $x; pyshacl --shacl shacl/dcat-us_3.0_shacl_shapes.ttl ${x} ; done 2>&1 | egrep '^Conforms' | sort | uniq -c

I got these results:

  16 Conforms: False
  16 Conforms: True

I appreciate making all of the examples conformant might be distracting from each example's purpose. But on the other hand, examples might be copied with the hope of starting an application from a "known passing" state.

Could a README be added to docs/examples/ to describe which examples are provided expecting to be minimally-conformant demonstrations, and/or which are provided to highlight only certain terms' usage?

Also, with example1-dcat-us-3.0.{json,ttl} now generated, do you have any further thoughts on adding a Continuous Integration process? I'm happy to discuss a few different practices that have been used in a community I work with.

[^1]: Disclaimer: Participation by NIST in the creation of the documentation of mentioned software is not intended to imply a recommendation or endorsement by the National Institute of Standards and Technology, nor is it intended to imply that any specific software is necessarily the best available for the purpose.

mrratcliffe commented 9 months ago

@fellahst-- what is the response to Alex Nelson's question above about adding a README?

fellahst commented 9 months ago

Thank you for your insightful feedback and the excellent suggestion regarding the addition of a README to clarify the purpose and conformity status of examples in the docs/examples directory. We will make more examples SHACL conformant by adding the missing required fields and add a README.txt if we can not make them all compliant without significant work.

Incorporating a Continuous Integration (CI) process, particularly for the automated validation of new examples, is indeed a wise move forward. I am eager to delve into and discuss the CI practices you have suggested when we will enter the implementation phase. Such practices promise to significantly enhance our project by maintaining consistent compliance and streamlining updates.

mrratcliffe commented 9 months ago

+1

fellahst commented 9 months ago

I went the extra-mile to make sure that every single file of the 123 examples are validating against the SHACL file. You can run the following command:

find docs/examples/ -name '*.ttl' -print0 | xargs -0 -I{} sh -c 'echo {}; pyshacl --shacl shacl/dcat-us_3.0_shacl_shapes.ttl {}' | egrep '^Conforms' | sort | uniq -c