Define structure for validation report

geneontology / go-shapes

Schema for Gene Ontology Causal Activity Models defined using RDF Shapes

2 stars 0 forks source link

Define structure for validation report #14

Open goodb opened 4 years ago

goodb commented 4 years ago

When we apply the shapes to a go_cam model, we need to formalize what the code should be providing in response. The shex libraries provide a mapping of the RDF nodes in the model to the labels of the shapes in the provided schema. This alone seems insufficient for users. I'm thinking of a response that would require some additional logic, something that contained additional elements like:

boolean for if the model as a whole should be called 'valid' according to the schema - similar to the OWL consistency check. This might be refined into subtypes of model-level quality.
A human readable explanation of 1.
anything else? I was thinking it would be useful to integrate the shape validation with the OWL validation so the OWL inference report could go in here as well.

On computing model-level validity, I'm thinking something like: For each named individual in the model:

It must have an RDF type and a biolink category (these should probably be added to the root gocamentity shape).
The BL:category annotation should match a predefined shape. e.g. anything tagged bl:category [GoMolecularFunction:] must match the shape and must not match anything else.
Anything else ?

balhoff commented 4 years ago

The BL:category annotation should match a predefined shape. e.g. anything tagged bl:category [GoMolecularFunction:] must match the shape and must not match anything else.

How does this interact with "inheritance"/shape intersection? The following definitions imply to me that a node matching the <Complex> shape will have two values for bl:category: GoComplex: and GoMolecularEntity:. Is that a problem for this principle?

<Complex> @<MolecularEntity> {
   bl:category [GoComplex:]  ;
}// rdfs:comment  "a protein complex"

<MolecularEntity>  EXTRA bl:category {
   bl:category [GoMolecularEntity:]  ;
}// rdfs:comment  "a molecular entity (a gene product, chemical, or complex typically)"

goodb commented 4 years ago

I think it is, but its something we could implement around if we needed to. Basically, do we allow multiple BL categories for individual nodes or not? I feel like we probably do not want to recreate hierarchies with category tags. So here we should either make a subbshape of @ if we need to refer to complexes in shapes or just eliminate the shape and use only .

cmungall commented 4 years ago

I think explanations will be massively important in the long run but we have some time to defer on this as we can make do with geeky explanations in the short term while the modeling group iterates over some of the basics.

I do think we will need to refer to complexes in the schema, for example to state the expected has-part structure

goodb commented 4 years ago

@cmungall they key thing is to get the computation of the multi-node, model-level validity in place. Once that is done, the explanations, geeky or otherwise, will fall out easily.