dcmi / dctap

DC Tabular Application Profile
https://dcmi.github.io/dctap/
32 stars 10 forks source link

Consider allowing multiple valueShapes #13

Closed justinlittman closed 3 years ago

justinlittman commented 3 years ago

For mapping Sinopia (BIBFRAME) Profiles to DCTAP (see https://github.com/LD4P/dctap), we need to treat valueShapes as "zero or more" rather than the specified "zero or one".

Sinopia Profiles allow a property to have multiple shapes. For example, the http://id.loc.gov/ontologies/bibframe/identifiedBy property can be a pcc:bf2:Identifiers:LCCN, pcc:bf2:Identifiers:ISBN, pcc:bf2:Identifiers:Local, or pcc:bf2:Identifiers:Other..

kcoyle commented 3 years ago

@justinlittman We could allow more than one (and I will now review the vocab document because we did that before addressing multiple values). In such a case, though, there will be nothing in the profile that I'm aware of to direct software to decide which one is viable in that particular instance. This will be a case of leaving it up to the actual humans who are creating the metadata. I also think that this will work ok with validation software, as long as the set in valueShape is treated as an OR (as we have designated for multiple values in a cell.)

The group meets tomorrow and I will put this on our agenda, and get back to you.

tombaker commented 3 years ago

@justinlittman @kcoyle I wonder if we mean the same thing by "shape". Without knowing exactly how this is modeled, I'd be inclined to putbf:identifiedBy into the property ID column, and the question would be where to put the pcc: options:

I'm assuming that when you say "multiple" shapes (or identifiers), you mean that any one identifier would be an LCCN, ISBN, Local, or Other - not that a property could be expected to take an identifier that was both an LCCN and an ISBN.

tombaker commented 3 years ago

@justinlittman To elaborate: in DCTAP parlance (and in RDF validation languages like ShEx and SHACL), a "shape" is a construct that groups the statement constraints for statements about one specific subject. In other words, a shape is a construct within the application profile, not something one would normally find in instance data.

justinlittman commented 3 years ago

pcc:bf2:Identifiers:* are shapes, not datatypes or other value constraints.

Here is the DCTAP for a pcc:bf2:Identifiers:ISBN:

shapeID,shapeLabel,propertyID,propertyLabel,mandatory,repeatable,ordered,valueNodeType,valueConstraint,valueShape,note
pcc:bf2:Identifiers:ISBN,Identifiers--ISBN,http://www.w3.org/1999/02/22-rdf-syntax-ns#type,Class,true,false,,IRI,http://id.loc.gov/ontologies/bibframe/Isbn,,
pcc:bf2:Identifiers:ISBN,Identifiers--ISBN,http://sinopia.io/vocabulary/hasResourceTemplate,Profile ID,false,false,,LITERAL,pcc:bf2:Identifiers:ISBN,,
pcc:bf2:Identifiers:ISBN,Identifiers--ISBN,http://www.w3.org/1999/02/22-rdf-syntax-ns#value,ISBN,false,false,,LITERAL,,,
pcc:bf2:Identifiers:ISBN,Identifiers--ISBN,http://id.loc.gov/ontologies/bibframe/qualifier,Qualifier,false,false,,LITERAL,,,
pcc:bf2:Identifiers:ISBN,Identifiers--ISBN,http://id.loc.gov/ontologies/bibframe/note,Note,false,true,,BNODE,,pcc:bf2:Note:GeneralNote,
pcc:bf2:Identifiers:ISBN,Identifiers--ISBN,http://id.loc.gov/ontologies/bibframe/status,"Incorrect, Invalid or Canceled?",false,true,,LITERAL|IRI,,sinopia:LabeledResource,
justinlittman commented 3 years ago

Here is RDF that exemplifies the http://id.loc.gov/ontologies/bibframe/identifiedBy property:

<> a <http://id.loc.gov/ontologies/bibframe/Instance>;
    <http://id.loc.gov/ontologies/bibframe/identifiedBy> _:b1.
_:b1 a <http://id.loc.gov/ontologies/bibframe/Lccn>;
    <http://www.w3.org/1999/02/22-rdf-syntax-ns#value> "2010919352"@eng.
<> <http://id.loc.gov/ontologies/bibframe/identifiedBy> _:b2.
_:b2 a <http://id.loc.gov/ontologies/bibframe/Isbn>;
    <http://www.w3.org/1999/02/22-rdf-syntax-ns#value> "0-044510-27-2"@eng;
    <http://id.loc.gov/ontologies/bibframe/status> <http://id.loc.gov/vocabulary/mstatus/cancinv>.
<http://id.loc.gov/vocabulary/mstatus/cancinv> <http://www.w3.org/2000/01/rdf-schema#label> "canceled or invalid".
<> <http://id.loc.gov/ontologies/bibframe/identifiedBy> _:b3.
_:b3 a <http://id.loc.gov/ontologies/bibframe/Isbn>;
    <http://www.w3.org/1999/02/22-rdf-syntax-ns#value> "978-0-944510-27-8"@eng;
    <http://id.loc.gov/ontologies/bibframe/note> _:b4.
_:b4 a <http://id.loc.gov/ontologies/bibframe/Note>;
    <http://www.w3.org/2000/01/rdf-schema#label> "Labeled \"New ISBN\""@eng.
tombaker commented 3 years ago

@justinlittman Great - thank you for the explanation!

I agree with @kcoyle that this is an option that should be allowed (but also that it might not be possible for us to nail down exactly what it means in the generic sense).

kcoyle commented 3 years ago

I'm going to close this if there are no objections. There is no prohibition to having multiple valueShapes in TAP. In general, multiple values in a cell are treated as having the logic of "OR" - X,Y = X OR Y. The difficulty with that is that there must be some logic external to the TAP that would determine which is appropriate for the situation.

propertyID propertyLabel mandatory repeatable valueNodeType valueShape
bf:identifiedBy Identifier FALSE TRUE BNODE :b1,:b2

There is no reason why the row could not be repeated in this case. Given the value of mandatory = false on each, this becomes not a choice between the two (or more) but allows them to be included independent of each other.

propertyID propertyLabel mandatory repeatable valueNodeType valueShape
bf:identifiedBy Identifier FALSE TRUE BNODE :b1
bf:identifiedBy Identifier FALSE TRUE BNODE :b2

These solutions are not as concrete as an "if-then-else" or an "and-or-not" formulation, but the simplicity of the tabular format makes such decision trees very difficult. It remains to be seen if we will extend the TAP vocabulary beyond these limitations, but this is a good use case for exploring that idea.

tombaker commented 3 years ago

@kcoyle

I'm going to close this if there are no objections. There is no prohibition to having multiple valueShapes in TAP.

I do object to closing this one, for several reasons:

        Statement Constraint
            propertyID:          http://id.loc.gov/ontologies/bibframe/note
            propertyLabel:       Note
            mandatory:           False
            repeatable:          True
            valueNodeType:       bnode
            valueShape:          pcc:bf2:Note:GeneralNote
        Statement Constraint
            propertyID:          http://id.loc.gov/ontologies/bibframe/status
            propertyLabel:       Incorrect, Invalid or Canceled?
            mandatory:           False
            repeatable:          True
            valueNodeType:       literal|iri
            valueShape:          sinopia:LabeledResource

Basically, I'm not questioning the model above on its own terms, but on how it fits with DCTAP. @kcoyle has also rightly worried about the "about the ability of the TAP to fully define "OR" for either [data creation or data validation] when the choice leads to paths with different criteria".

In general, I think we should keep the base DCTAP model as simple as possible and resist complexification, which can and will happen in extensions to the base model (either as community extensions or in a TAP-XL).

kcoyle commented 3 years ago

This has been resolved and added to the TAP vocabulary document to state that multiple values are allowed.