bartkl / metamorph

Metamorph is a Clojure libary that enables the generation of an Avro schema from a given input SHACL model.
https://bartkl.github.io/metamorph/
Apache License 2.0
2 stars 3 forks source link

Extend Avro schema generation to deal with `sh:class` #21

Closed bartkl closed 1 year ago

bartkl commented 2 years ago

Currently, we support only sh:node and sh:datatype. This should be extended to deal with sh:class as well. If both sh:node and sh:class are specified, give precedence to sh:node.

bartkl commented 2 years ago

From the SHACL spec:

Note that multiple values for sh:class are interpreted as a conjunction, i.e. the values need to be SHACL instances of all of them.

Let's start with assuming a single sh:class statement, and later on deal with this possibility. Whatever choice we make, it also needs to be documented in the wiki.

bartkl commented 2 years ago

We have an issue, and this might be another case where a RDFS/OWL reasoner and/or SHACL validator is desired.

Mapping sh:class to record field datatype

sh:class constrain the value of a property by requiring it to be an instance of the provided class. How do we go about this?

We map property shapes onto record fields, and map its type depending on what constraints are defined:

sh:class provides the class from the vocabulary that the value node must be an instance of, which is all you need from a validation perspective. But how do we map this to the type of the Avro record field?

Bad solution

One option that I feel is not desired is to query for node shapes registered for that target class, and somehow choosing one to use for mapping onto a record type. Technically this is a solution, but it violates semantic purity in several ways. Most notably, if a node shape should apply, it would have been specified using sh:node.

Right solution?

Rather, I think the answer might simply be to create a record for the class. We actually alreay do that with node shapes, since we use the sh:targetClass to create the record. What differs here, of course, is that node shapes provide property shapes and the vocabulary class does not. But in the vocabulary we can query for properties that apply to this class (i.e. properties with a domain compatible with it). This gets complicated though, since you also want to check for domains that are superclasses. Not only that. You get the same question all over again: these properties aren't property shapes, so we don't have constraints information. How does this affect our mapping choices? One example: cardinality isn't described, so we can't say anything about it (which is okay if we assume the all-fields-optional approach though). And what about value type constraints? Without those we only have the domain/range of the properties themselves, which should not be used this way since they are definitions for inferencing types of things, not for validation purposes.

So, in the case of sh:class we seem to have these options:

  1. Query properties from the vocabulary that apply to the class, and map these onto record fields without validation information (?)
  2. Don't provide properties, yielding an empty record

Thinking out loud.

bartkl commented 2 years ago

More general reason for needing reasoners One reason why a reasoner might be necessary is when mapping only at the structural level isn't sufficient due to significant differences in semantics or expressive power.

For instance, I can map the Python statement c = Cat() to RDFS like :c a :Cat, but what about the for-loop construct? RDFS has no such notion, so I cannot map Python's loop onto RDFS. However, if I expand the loop -i.e. apply its semantics- I get loose expressions which probably can be mapped.

The issue with this user story is similar.

bartkl commented 2 years ago

For now:

Make sure to document this in the wiki.

bartkl commented 1 year ago

I actually believe we should simply not support this.

sh:class is a value type constraint, and in the context of schema declaration (as we do here) it makes no sense.

The semantics suggested in the final solution above is very complicated and far removed from the SHACL semantics.

Moving this to discussions, but I think this is simply n/a.