Closed goodb closed 3 years ago
Also tagging @dougli1sqrd
I'm going to leave some minerva shex validation output opinions/questions here.
So when we're talking Shex validation output in minerva, should we try json? Like for an error, we need the model that failed, the shape(s) that failed? and any annotations attached to those shapes/expressions I guess?
{
"model": "an ID",
"shape": "shape URI",
"annotations:" [ annotations... ]
}
Something like this anyway?
@dougli1sqrd sure, whatever fits into your rule result structure can be done. It might be useful for you to look at the json that the minerva server is currently generating for shex and owl validation:
https://github.com/geneontology/minerva/pull/242 for #212
It would be nice, though not necessary if the service JSON was pretty similar to the command line JSON output.
Okay cool. I do have a question about how we want to do the annotations. Should we include shape annotations as well as triple pattern annotations? How would we organize that in the output? Alternatively, we could decide by convention to only put annotations in on place or another?
It's just that if I'm reading the annotations and looking out for gorule text, do I need to look at both the shape annotations, and the triple expressions annotations?
Additionally, maybe we could mint a specific gorule relation that indicates that this expression/shape is for a gorule? @cmungall do you have thoughts about that?
// go:rule GORULE:00000008
or something?
Ah I had a second look: So we have:
"violations": [
{
"explanations": [
{
"shape_id": "http://purl.obolibrary.org/obo/go/shapes/MolecularFunction",
"constraints": [
{
"target_node_uri": "http://model.geneontology.org/R-HSA-73930/R-HSA-73930",
"property_id": "http://purl.obolibrary.org/obo/RO_0002413",
"intended_range_shapes": [
"http://purl.obolibrary.org/obo/go/shapes/MolecularFunction"
]
}
]
}
],
"node_id": "http://model.geneontology.org/R-HSA-73930/R-HSA-110360",
"commentary": "http://model.geneontology.org/R-HSA-73930/R-HSA-110360 did not match http://purl.obolibrary.org/obo/go/shapes/MolecularFunction"
}
]
How about we add a key annotations
parallel to both shape_id
and within the constraint
object?
So:
"annotations": [
{
"predicate": "go:rule" // or whatever,
"object": "GORULE:00000008" // or whatever
}, ...
]
That way when this is parsed, I can tell which annotations are shape level, and which are constraint level.
@kltm thoughts?
Pinging @vanaukenk to follow this for the shex work.
@dougli1sqrd I think it would be helpful to me to see a mocked-up blob of what the JSON output is to look like in this instance. I'm interested in having an overall consistent layout for all of the expected errors coming through: shex, owl, rules, etc.
@kltm:
"violations": [
{
"explanations": [
{
"shape_id": "http://purl.obolibrary.org/obo/go/shapes/MolecularFunction",
"annotations": [
{
"predicate": "go:rule",
"object": "GORULE:00000012"
}
]
"constraints": [
{
"target_node_uri": "http://model.geneontology.org/R-HSA-73930/R-HSA-73930",
"property_id": "http://purl.obolibrary.org/obo/RO_0002413",
"intended_range_shapes": [
"http://purl.obolibrary.org/obo/go/shapes/MolecularFunction"
]
"annotations": [
{
"predicate": "go:rule",
"object": "GORULE:00000008"
}
]
}
]
}
],
"node_id": "http://model.geneontology.org/R-HSA-73930/R-HSA-110360",
"commentary": "http://model.geneontology.org/R-HSA-73930/R-HSA-110360 did not match http://purl.obolibrary.org/obo/go/shapes/MolecularFunction"
}
]
@dougli1sqrd Okay, I may be a little slow here, but what is the subject then? The shape_id? Wouldn't then relation then be something like has_constraint
or something? I'm not really understanding the duplication of the rule...
Annotations can be placed at the shape level or at the constraint level, so that's why it's in two different places.
Annotations only have a "predicate" and an "object". I suppose that means the constraint or the shape is the implicit subject.
For shapes, note we now have GORULE:0000056: https://github.com/geneontology/go-site/blob/master/metadata/rules/gorule-0000056.yaml
Based on the title of this ticket it could be closed. The Minerva command line client can now take validate the consistency of batches of input go-cams. e.g. java -Xmx8g -jar minerva-cli.jar --validate-go-cams -c ./catalog-v001-for-noctua.xml -i ./blazegraph.jnl
There is useful discussion there about how the results ought to be delivered. Perhaps that should be its own issue/project.
Given a blazegraph journal containing go-cam models or a folder containing go-cam files, output whether each go-cam is logically consistent and agrees with the shex schema. Use the same tbox ontologies as are used to drive minerva.
Provide results in format suitable for use in pipeline (and for manual debugging).
Pipeline wants things from rules: OWL: https://github.com/geneontology/go-site/blob/master/metadata/rules/gorule-0000019.md SHEX: https://github.com/geneontology/go-site/blob/master/metadata/rules/gorule-0000056.yaml
Pipeline output is a summary, ordered by taxon groupings. Example: https://gist.github.com/dougli1sqrd/1f7b970c2030417cc5570ea5932aca3b
{ "taxon": "mouse/taxon:1235", (this will be filled in once taxon information is added to the models upstream - starting point is taxon 'other') "number_of_models": 58962, "number_of_models_in_error": 5200, "number_of_correct_models": 53762, "messages": { "gorule-0000019": [ { "level": "ERROR", "model-id": "gomodel:123455", "type": "Violates GO Rule", "message": "GORULE:0000019: Model is logically inconsistent", "obj": "", "taxon": "", "rule": 19 }, { "level": "WARNING", "model-id": "gomodel:123455", "type": "Violates GO Rule", "message": "GORULE:0000056: Model does not match GO-CAM schema", "obj": "", "taxon": "", "rule": 56 }] } } }