dcmi / dctap

DC Tabular Application Profile
https://dcmi.github.io/dctap/
34 stars 10 forks source link

Cookbook: section on SHACL #77

Open kcoyle opened 2 years ago

kcoyle commented 2 years ago

The Cookbook needs a section filled in on converting a DCTAP to SHACL.

sfolsom commented 1 year ago

Will this section of the Cookbook provide guidance on sh:message for providing more context on sh:severity violations and warnings? Or is there somewhere else where this information is available? I've seen open related issues/comments, so maybe this hasn't been decided.

philbarker commented 1 year ago

@sfolsom we might cover how to define severity and a message in extensions to DCTAP (e.g. as extra columns in the table) so that they can be encoded SHACL; but they will be suggestions only, nothing about converting to SHACL will be normative.

kcoyle commented 1 year ago

@sfolsom We are always looking for examples that folks can relate to. "Real" examples tend to be too complex, but if you can tell/send a reduced example from your work that however "looks real" that would be appreciated. It doesn't have to be in code - a use case would be great. Thanks.

sfolsom commented 1 year ago

Thanks for the response. The idea of keeping SHACL-specific work non-normative makes sense.

To provide context to the question, I'm part of a PCC group thinking about interoperability of BIBFRAME data, and we're starting to define shapes, and thinking about how to have meaningful validation reports that include more than pass/fail violations. There's a number of properties that we have designated as "required if applicable" where a sh:Violation or sh:Warning with a sh:resultMessage would be useful.

sfolsom commented 2 months ago

I have another question about extending DCTAP to support SHACL validation. We have a use case where we want to define a target of a shape as entities that use a specific property where the object of that property is a specific type. For example we need two shapes... 1.) A shape for bf:Electronics that are bf:instancesOf works that are typed as bf:Monograph 2.) A shape for bf:Electronics that are bf:instancesOf works that are typed as bf:Serial.

I noticed https://github.com/philbarker/TAP2SHACL/blob/main/examples/SHACLPerson/shapes.csv as an implementation of an extension for SHACL, and I was wondering if the cookbook for SHACL might eventually include a type vocabulary for things like class, objectsOf, subjectsOf, SPARQLTarget. I snuck SPARQLTarget in there :) because I can't think of another what to define the to shapes for bf:Electronics without using sh:SPARQLTarget.

If there is an easier way to define these types of targets using a simpler DCTAP implementation/extension, I'd be grateful for the guidance.

philbarker commented 2 months ago

@sfolsom we always felt that there would be things that can be done in languages like SHACL and ShEx that would go beyond what could be covered in a simple tabular format. I don't think anyone wants DC TAP as a competitor to those standards, so we deliberately have avoided creating it as such.

TAP2SHACL already goes beyond what can be done with a TAP alone, for example with the node shape targets. Currently there are no plans to extend it to cover the SHACL Advanced Features.

sfolsom commented 2 months ago

Sorry, I wasn't clear. By "extending", I wasn't suggesting DCTAP would expand formally to include more complicated use cases. I was wondering if the cookbook for DCTAP to SHACL might make suggestions or point to suggestions/implementations made elsewhere who have extended DCTAP that go beyond DCTAP's scope. I'm struggling to find other initiatives attempting to implement DCTAP together with SHACL for validation of RDF.

kcoyle commented 2 months ago

@sfolsom Is there any chance you can share all or part of your DCTAP? At least a part where you run into this problem. I'm hoping that having more context will help me think about this ;-).

sfolsom commented 2 months ago

@kcoyle, here's a copy of spreadsheet for serial electronic bf:Instances, where we have a target of bf:Electronic: https://docs.google.com/spreadsheets/d/13CU7B-RoLTIVgnZWt68PrEjcqVYfuNdvwBkw6xsb-zk/edit?gid=66557658#gid=66557658.

Here's one for monograph electronic bf:Instances that also uses bf:Electronic for targets: https://docs.google.com/spreadsheets/d/19R-ZbA0as-EPWKnGvP3dkncN7yX5n6_Q18-meLGS_tQ/edit?gid=66557658#gid=66557658.

Using classes as targets works when we want the same validation tests every time we would come across a given class, but we expect some different properties when the bf:Electronic is for a monograph vs. serial. I realize this is beyond what DCTAP is set up to handle, but I thought I'd raise it here since you all have considered the boundaries and potential mappings between DCTAP and SHACL.

I may be wrong, but the only way I could find in SHACL to define conditional targets like this was to use sh:SPARQLTarget and have queries like...

Serial Electronic Target

big:Serial:Instance:Electronic
    a sh:NodeShape ;
    sh:target [
        a sh:SPARQLTarget ;
        sh:prefixes bf: ;
        sh:select """
            SELECT ?SerialElectronic
            WHERE {
                ?SerialElectronic a bf:Electronic .
                ?SerialElectronic bf:instanceOf ?Serial .
        ?Serial a bf:Serial .
            }
            """ ;
    ] ;

Monograph Electronic Target

 big:Monograph:Instance:Electronic
    a sh:NodeShape ;
    sh:target [
        a sh:SPARQLTarget ;
        sh:prefixes bf: ;
        sh:select """
            SELECT ?MonographElectronic
            WHERE {
                ?MonographElectronic a bf:Electronic .
                ?MonographElectronic bf:instanceOf ?Monograph .
          ?Monographic a bf:Monograph .
            }
            """ ;
    ] ;

It would be odd to store SPARQL Queries in the DCTAP, but maybe that's what we have to resort to.

kcoyle commented 2 months ago

@sfolsom I thought I answered this but it got lost somehow.

We did talk about storing code in DCTAP cells, but there is a very good chance it would get mangled in the CSV format. If you find a way to make that work I see no reason why not. We also discussed that one could create a file of the needed code and store it with an IRI, then place that IRI in the DCTAP, adding column. This gets around the CSV problems, but adds a level of complexity for processing. (Note: reusing columns for new functions probably is a road to confusion. Creating columns for specific types of data is a better idea.)

I assume that you are "limited" to actual Bibframe in your instance data but if the bf:Electronic validation rule is different for Monographs and Serials doesn't that imply that subclasses of bf:Electronic for MonographElectronic and SerialElectronic are needed? Would they resolve this problem?

sfolsom commented 1 month ago

@kcoyle, thanks for this history and insight. I had a similar train of thought about the classes... new classes or if electronic-ness and print-ness should/could be traits of classes like bf:SerialInstance and bf:MonographInstance. As you anticipated though, we're limited to existing BF classes.

We're taking inspiration from Phil's "target" column extension, and might have to have a column that corresponds to sh:SPARQLTarget, and as you said, figure out how to get from the DCTAP to where we're storing the actual SPARQL.