DataONEorg / sem-prov-ontologies

Ontologies focused on scientific observations and scientific workflow provenance.
https://ontologies.dataone.org
18 stars 7 forks source link

Explore SHACL validation for checking ontologies #102

Open amoeba opened 3 years ago

amoeba commented 3 years ago

We discovered on Slack today that some a property we expected (mosaic:hasBasis) wasn;t present on every mosaic:Campaign and it should have been. Manually checking the ontology after every change is time-consuming and error-prone. It'd be great to write a set of SHACL shape constraints that we could use to check for some of these things.

I'll build out a GHA that does some basic checking and then we could probably brainstorm a more complete set of checks and look at applying the process to the other ontologies.

amoeba commented 3 years ago

This looks pretty promising. With just a simple constraint:

mosaic:CampaignShape
    a sh:NodeShape ;
    sh:targetClass mosaic:00000001;
    # Every Campaign has at least one mosaic:hasBasis triple
    sh:property [
        sh:path mosaic:00000034 ;
        sh:minCount 1 ;
    ] .

PySHACL catches the exact problem we saw today:

; pyshacl -s shapes.shacl -df xml -sf turtle ../MOSAiC.owl
Validation Report
Conforms: False
Results (4):
Constraint Violation in MinCountConstraintComponent (http://www.w3.org/ns/shacl#MinCountConstraintComponent):
    Severity: sh:Violation
    Source Shape: [ sh:minCount Literal("1", datatype=xsd:integer) ; sh:path <https://purl.dataone.org/odo/MOSAIC_00000034> ]
    Focus Node: odo:MOSAIC_00000005
    Result Path: <https://purl.dataone.org/odo/MOSAIC_00000034>
    Message: Less than 1 values on odo:MOSAIC_00000005-><https://purl.dataone.org/odo/MOSAIC_00000034>
Constraint Violation in MinCountConstraintComponent (http://www.w3.org/ns/shacl#MinCountConstraintComponent):
    Severity: sh:Violation
    Source Shape: [ sh:minCount Literal("1", datatype=xsd:integer) ; sh:path <https://purl.dataone.org/odo/MOSAIC_00000034> ]
    Focus Node: odo:MOSAIC_00000008
    Result Path: <https://purl.dataone.org/odo/MOSAIC_00000034>
    Message: Less than 1 values on odo:MOSAIC_00000008-><https://purl.dataone.org/odo/MOSAIC_00000034>
Constraint Violation in MinCountConstraintComponent (http://www.w3.org/ns/shacl#MinCountConstraintComponent):
    Severity: sh:Violation
    Source Shape: [ sh:minCount Literal("1", datatype=xsd:integer) ; sh:path <https://purl.dataone.org/odo/MOSAIC_00000034> ]
    Focus Node: odo:MOSAIC_00000019
    Result Path: <https://purl.dataone.org/odo/MOSAIC_00000034>
    Message: Less than 1 values on odo:MOSAIC_00000019-><https://purl.dataone.org/odo/MOSAIC_00000034>
Constraint Violation in MinCountConstraintComponent (http://www.w3.org/ns/shacl#MinCountConstraintComponent):
    Severity: sh:Violation
    Source Shape: [ sh:minCount Literal("1", datatype=xsd:integer) ; sh:path <https://purl.dataone.org/odo/MOSAIC_00000034> ]
    Focus Node: odo:MOSAIC_00000018
    Result Path: <https://purl.dataone.org/odo/MOSAIC_00000034>
    Message: Less than 1 values on odo:MOSAIC_00000018-><https://purl.dataone.org/odo/MOSAIC_00000034>
amoeba commented 3 years ago

I just did a quick look-over to see what checks might make sense to implement as a first pass:

Some of the other parts of the ontology are a bit confusing so I'll stop there and chat with @mpsaloha.

mpsaloha commented 3 years ago

@amoeba those look like good suggestions for constraints! Note that the MOSAIC Ontology is in OWL, and so is OWA. Thus, while all Campaigns do have a Basis, that doesn't mean that all Campaigns in our Ontology must have an associated Basis, unless we decide to "require it" (hence SHACL which is CWA). Thus, the lack of some Campaign having a Basis or having a Chief Scientist was not an"it should have been there" (as you phrased it in your first comment on this Issue), but rather "it might be useful if it were there". There are LOTS OF additional "It might be useful" predicates I could have filled out in the MOSAIC Ontology, but I didn't do these for lack of time, or suspicion they would not be leveraged in our Web UI. Happy to discuss this further if this doesn't make perfect sense.

amoeba commented 3 years ago

You might have to define OWA and CWA for me. Other than that, your comment makes sense.

What I want to do is help you and @laijasmine get the work you both need to do on MOSAiC done quickly and efficiently so if we can add SHACL validation rules to help catch things like the hasBasis thing then that'd make me happy.

Are any of the rules above ones you want?

laijasmine commented 3 years ago

all of the above rules look good to me except for the last one i'm not sure about and Mark will need to confim: Every Deployment has a single deployedSystem

amoeba commented 3 years ago

Thanks @laijasmine. I'll touch base with @mpsaloha at some point here.

amoeba commented 3 years ago

We discussed part of this on our salmantics call this week and we talked about the point above: Should this ontology be comprehensive over all of the MOSAiC expedition or just what PANGAEA or we have? We decided that we should aim to be comprehensive. We're presenting an PDF soon and are hoping to have some conversations about the ontology and the project as a whole with relevant folks.

I've merged an initial skeleton for this kind of checking onto the develop branch but haven't added all of the constraints I listed above. I'm going to leave this issue open with the intent to revisit this at some point.