ESIPFed / science-on-schema.org

science-on-schema.org - providing guidance for publishing schema.org as JSON-LD for the sciences
Apache License 2.0
113 stars 32 forks source link

add SHACL shape to validate SO namespace #59

Closed mbjones closed 3 years ago

mbjones commented 4 years ago

On today's call, we agreed to update the guidance docs to recommend using the https variant of the schema.org namespace with a trailing slash /. We also discussed whether the trailing slash should be required in that it would be validated using a SHACL shape. This request is to add a SHACL shape to test that the namespace has a trailing slash, and if not, then throw an error. With this, we would have:

mbjones commented 4 years ago

After further discussion on the call, we identified that SHACL may not work for this validation because it will treat the two namespace strings with and without a slash as different namespaces. We really need more of a pre-parser step that checks the namespace in @vocab before it is handed off for SHACL validation. @fils do you have thoughts on what would be best to handle this in Fence or other tools?

datadavev commented 4 years ago

Here's a brute force solution using SHACL, only tests for SO:Dataset. Basically if it finds any node in a graph with a bad namespace then it fails validation.

I expect there's a more elegant way to do it:

@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix sh: <http://www.w3.org/ns/shacl#> .
@prefix d1: <http://ns.dataone.org/schema/SO#> .

d1:DatasetBad1Shape
    a sh:NodeShape ;
    sh:targetClass <https://schema.orgDataset/> ;
    sh:message "Expecting SO namespace of <https://schema.org/> not <https://schema.org>" ;
    sh:not [
        sh:path rdf:type ;
        sh:minCount 1;
    ].
d1:DatasetBad2Shape
    a sh:NodeShape ;
    sh:targetClass <http://schema.org/Dataset> ;
    sh:message "Expecting SO namespace of <https://schema.org/> not <http://schema.org/>" ;
    sh:not [
        sh:path rdf:type ;
        sh:minCount 1;
    ].
d1:DatasetBad3Shape
    a sh:NodeShape ;
    sh:targetClass <http://schema.orgDataset/> ;
    sh:message "Expecting SO namespace of <https://schema.org/> not <http://schema.org>" ;
    sh:not [
        sh:path rdf:type ;
        sh:minCount 1;
    ].

Edit: Here's a worked example using pyshacl: https://so-tools.readthedocs.io/en/latest/test_namespace.html

mbjones commented 4 years ago

That looks great @datadavev. @fils can you add this to Fence too? Where is the collection of definitive shapes we're using for validation?

rduerr commented 4 years ago

+1 on @mbjones comments

danbri commented 4 years ago

Would this complain if people used external extensions to make richer schema.org Dataset descriptions?

datadavev commented 4 years ago

The SHACL looks for http://schema.orgDataset/, https://schema.orgDataset/, or http://schema.org/Dataset and complains if found. It is agnostic with all other constructs, so will not complain about external extensions.

danbri commented 4 years ago

Great, thanks!

On Tue, 31 Dec 2019 at 13:23, Dave Vieglais notifications@github.com wrote:

The SHACL looks for http://schema.orgDataset/, https://schema.orgDataset/, or http://schema.org/Dataset and complains if found. It is agnostic with all other constructs, so will not complain about external extensions.

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/ESIPFed/science-on-schema.org/issues/59?email_source=notifications&email_token=AABJSGIUQR5NMG42ZOFOR3LQ3NBWPA5CNFSM4JUL64F2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEH4GTXI#issuecomment-569928157, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABJSGJJ5XI2VYXKQUA23YTQ3NBWPANCNFSM4JUL64FQ .

mbjones commented 4 years ago

Seems like this SHACL shape is ready to be added to the guidelines. Should this be in v1.1 or v1.2? For now I will put it in v1.2 to avoid delaying the 1.1 release, but feel free to move it up if you know how it should be incorporated @datadavev @fils

fils commented 4 years ago

@mbjones I'm fine with 1.2 personally. There are some improvements to the way "recommendations" can be done in a SHACL shape.

Also, I'm doing some updates and will be talking with @datadavev about them today. So based on that we might make changes which would further support 1.2 as a target.

Also adding in some points about frames which I think will also be important along side shapes. So again, time to review and include that.

I've updated Fence at https://fence.gleaner.io/ as part of getting ready to chat with Dave.
Added in framing as a test option. Can now pull the geospatial elements properly based on the current Science on Schema guidance. Other code routes these then into base geometries to pull KML, GeoJSON, WKT etc from.

mbjones commented 3 years ago

Looks like initial SHACL support is ready to go. Can you please merge the PR @datadavev if you agree?

datadavev commented 3 years ago

@mbjones It's probably ok. My hesitation is that it should really be accompanied by a bunch of test cases since SHACL is sufficiently complicated that non-obvious errors and omissions may be present.

mbjones commented 3 years ago

ok, I merged PR #103 with the initial SHACL support. While this is not yet documented fully nor a complete service, it will likely be useful to many groups. The shape files do not fully specify a conformance suite to the guidelines, but they are a useful good start.

I think we should pick up the SHACL work for v1.3 to get agreement on the conformance shapes for various use cases, and provide documentation on usage. So, while the PR #103 is closed in the v1.2 release, we should open new issues in v1.3 for the documentation and shape changes we want to see. Thanks @datadavev !