dcmi / dcap

DC Tabular Application Profile - supporting materials
28 stars 12 forks source link

REQUIREMENT: May specify Open/Closed #33

Closed kcoyle closed 3 years ago

kcoyle commented 5 years ago

17 #19

"Must be able to include information about what to do with metadata received or encountered that is not included in the profile itself."

kcoyle commented 5 years ago

There may be more than one type of "open" - accept any metadata properties, or accept any properties from a limited set of vocabularies.

kcoyle commented 3 years ago

This relates to the entire profile and therefore does not fit on any of the rows in the csv template. That means that we may need a manifest file of some type that gives information for the profile itself. That file could include the administrative information: who created it, when, etc.

kcoyle commented 3 years ago

From an email to the list from Tom Baker:

"In ShEx, "openness" is an attribute of a shape [1]. Maybe we could have an (optional) column like 'shapeOpen', with a value of True/False, Yes/No, or whatever. " https://shex.io/shex-primer/#closed-shapes

To contemplate when we get here: should we allow open/closed to be only on shapes or on the entire profile (if we figure out a mechanism for the latter)?

tombaker commented 3 years ago

@kcoyle I suggest we support open/closed only on shapes. Reasons:

tombaker commented 3 years ago

... or my original proposal for shapeOpen, but since we might want to say that the default interpretation of a profile is "open", shapeClosed might make more sense because we would not encourage people to use this element at all unless they want to close their shapes.

philbarker commented 3 years ago

@tombaker but what then of what you might call the "top level" or "default" shape? What I mean is that if the simplest profile can be just a list of properties (no declared shapes at all), how would you say whether that was a closed or open list? I'm guessing you would stipulate that if one wanted to say that it was either open or closed would have to assign the list a shape.

It might be useful to decide on what the default is.

tombaker commented 3 years ago

@philbarker An interesting question. To be translated into ShEx, a "shapeless" list of properties would need to be turned into a shape for which "closed" (or "open") could be specified. In this sense, a shapeless list of properties could be seen as having an implied (anonymous) shape.

I have been assuming that the default should be "open", as it is in ShEx, but am curious to hear the case for "closed". In the absence of a default, we would in effect be saying that it could be either open or closed unless deliberately specified - in effect, saying that the minimal profile would need to have at least two columns.

kcoyle commented 3 years ago

@tombaker "Closed" for a profile means that properties or shapes that are not included in the profile are invalid for matching on the profile. It would be the same as marking all shapes as "closed". This seems to me to be the likely intention, rather than having some shapes closed and some open. I say likely because people would generally think of their metadata (aka the entire profile) as open or closed. I have no objection to putting open and closed on a shape, but in either case we have to think about what that means:

It seems to me that the nature of an application profile is to create a defined metadata environment, which would be "closed" in the RDF sense. An open metadata set would be somewhat contrary to the intention of a profile. What does seem especially relevant for those ingesting metadata created by others would be the instruction to ignore any shapes/properties/values that are not included in the profile, thus creating a metadata set that conforms to the profile. So I am arguing for "closed" as the default, and am not sure that our first template version will need "open", but we should poll the community for that option.

philbarker commented 3 years ago

I would assume the default default (i.e. if the spec says nothing) would be "open" because that is the de facto situation with no profile, however, I can see the argument for the intention in creating an AP being to close down some options--hence the question. (Aside: I recall discussions arising from OAI-PMH mandating Dublin Core metadata but nothing being mandatory in Dublin Core.)

My 2c regarding @kcoyle's Qs

  • If a shape is "closed" does that mean that only the properties in the shape are valid for that shape?

That is what I assume it means

  • Could "open" also refer to value shapes, allowing the object of a property to be a node not in the profile?

If the valueShape for a property is an open shape that lists no mandatory properties then that value could be any non-literal.

  • Could "open" refer to value constraints, e.g. taking the object value from a domain not listed in the constraints? Or not in another way conformant to a constraint?

I don't think so. If so then there would be no point in having the constraint. BTW, I've tried examples like this which have a similar effect:

property Mand Repeat valueType constraint note
rdf:type y n URI sdo:Book must be schema.org/Book
rdf:type n y URI can be anything in addition to schema.org/Book
  • Does "closed" mean that other shapes/properties/values are ignored, or do they throw an error during validation?

That would be an implementation issue. In a spec I would word this as such data "may be ignored" "may trigger an error or warning". If I were creating the data I would want a validator warning to tell me I was using terms that may be ignored. If I were receiving data and had decided to keep all data as it was sent I wouldn't want an error/warning. If I were writing a validator I would make these warning configurable through a "strictness level" setting or similar.

kcoyle commented 3 years ago

I did my usual shallow->middling dive into the documents, and here's what I believe is true:

The "open" in RDF schema or OWL corresponds only vaguely to what we are discussing here for profiles. The "open world assumption" is a somewhat different beast, as per the OWL documentation:

"If some fact is not present in a database, it is usually considered false (the so-called closed-world assumption) whereas in the case of an OWL 2 document it may simply be missing (but possibly true), following the open-world assumption."

Instead, our sense of "open/closed" is directly related to validation. SHACL has this description of "closed":

"If $closed is true then there is a validation result for each triple that has a value node as its subject and a predicate that is not explicitly enumerated as a value of sh:path in any of the property shapes declared via sh:property at the current shape."

Therefore, in SHACL, closed is explicitly about a subject/predicate pair. I'm less clear about the ShEx use of closed because of how that documentation is worded:

" In a ShEx schema, a shape may be defined to match only RDF data nodes that have outgoing triples matching the given set of triple constraints and no other outgoing triples. A shape declaration can be qualified to mean "this set of outgoing triples and no others" by using the keyword CLOSED."

What I am unclear about on this is "outgoing triples", although the document also says:

" A node in the subject position has an outgoing arc and a node in the object position has an incoming arc."

So I believe this also refers to triples with a specific property but I can't tell if the "outgoing arc" can include the object node's value. @ericprud ?

ericprud commented 3 years ago

The concept of closed is intended to be the same for both languages. In ShEx it applies to "outgoing arcs" of the node being validated, meaning all triples with that node as a subject and any predicate or object. So if the predicate of an outgoing arc isn't mentioned in a closed shape, it's flagged as a violation.

The only predicates SHACL recognizes when testing closed-ness are those in the top-level triple constraints, which makes closed-ness more complicated. For instance, if a closed schema required a foaf:name or a schema:name, and the data had a foaf:name, that triple would be flagged as a violation. The work-around is to list any properties buried in expressions in a property called sh:ignoredProperties.

tombaker commented 3 years ago

@kcoyle @philbarker @ericprud Karen, you wrote: 'What does seem especially relevant for those ingesting metadata created by others would be the instruction to ignore any shapes/properties/values that are not included in the profile, thus creating a metadata set that conforms to the profile. So I am arguing for "closed" as the default...'

If you are suggesting that "closed" be the default because ingesters of metadata should be able to ignore triple patterns not included in a profile (though I am unsure if you are arguing this), then I'd point out:

Re: "A node in the subject position has an outgoing arc and a node in the object position has an incoming arc." (from the Primer), you wrote: "So I believe this also refers to triples with a specific property but I can't tell if the "outgoing arc" can include the object node's value."

The example given in the Primer includes a schema:

my:IssueShape {
  ex:state [ex:unassigned ex:assigned];
 ^ex:reportedIssue @my:UserShape
}

my:UserShape {
  foaf:name LITERAL;
  foaf:mbox IRI+
}

and matching data:

inst:Issue1 a ex:Issue ;
    ex:state        ex:unassigned .
inst:User1 a foaf:Person ;
    foaf:name       "Bob Smith" ;
    ex:reportedIssue inst:Issue1 ;
    foaf:mbox       <mailto:bob@example.org> .

A shape can always describe the object node's value associated with a given property (but can also leave it unspecified). If you meant to ask whether the "INCOMING arc" can include the object node's value, then the line ^ex:reportedIssue @my:UserShape could be read as meaning:

@ericprud Closing my:UserShape would make the data invalid (because the ex:reportedIssue triple is not covered), would it not?

As I see it, inverse triple constraints are a good example of what an expressive constraint language like ShEx can cover, but would be very awkward to express in a simple CSV format.

kcoyle commented 3 years ago

This discussion is continued at https://github.com/dcmi/dctap/issues/8