geneontology / minerva

BSD 3-Clause "New" or "Revised" License
6 stars 8 forks source link

provide OWL and shex consistency check for command line #254

Closed goodb closed 3 years ago

goodb commented 4 years ago

Given a blazegraph journal containing go-cam models or a folder containing go-cam files, output whether each go-cam is logically consistent and agrees with the shex schema. Use the same tbox ontologies as are used to drive minerva.

Provide results in format suitable for use in pipeline (and for manual debugging).

Pipeline wants things from rules: OWL: https://github.com/geneontology/go-site/blob/master/metadata/rules/gorule-0000019.md SHEX: https://github.com/geneontology/go-site/blob/master/metadata/rules/gorule-0000056.yaml

Pipeline output is a summary, ordered by taxon groupings. Example: https://gist.github.com/dougli1sqrd/1f7b970c2030417cc5570ea5932aca3b

{ "taxon": "mouse/taxon:1235", (this will be filled in once taxon information is added to the models upstream - starting point is taxon 'other') "number_of_models": 58962, "number_of_models_in_error": 5200, "number_of_correct_models": 53762, "messages": { "gorule-0000019": [ { "level": "ERROR", "model-id": "gomodel:123455", "type": "Violates GO Rule", "message": "GORULE:0000019: Model is logically inconsistent", "obj": "", "taxon": "", "rule": 19 }, { "level": "WARNING", "model-id": "gomodel:123455", "type": "Violates GO Rule", "message": "GORULE:0000056: Model does not match GO-CAM schema", "obj": "", "taxon": "", "rule": 56 }] } } }

kltm commented 4 years ago

Also tagging @dougli1sqrd

dougli1sqrd commented 4 years ago

I'm going to leave some minerva shex validation output opinions/questions here.

So when we're talking Shex validation output in minerva, should we try json? Like for an error, we need the model that failed, the shape(s) that failed? and any annotations attached to those shapes/expressions I guess?

{
  "model": "an ID",
  "shape": "shape URI",
  "annotations:" [ annotations... ]
}

Something like this anyway?

goodb commented 4 years ago

@dougli1sqrd sure, whatever fits into your rule result structure can be done. It might be useful for you to look at the json that the minerva server is currently generating for shex and owl validation:

https://github.com/geneontology/minerva/pull/242 for #212

It would be nice, though not necessary if the service JSON was pretty similar to the command line JSON output.

dougli1sqrd commented 4 years ago

Okay cool. I do have a question about how we want to do the annotations. Should we include shape annotations as well as triple pattern annotations? How would we organize that in the output? Alternatively, we could decide by convention to only put annotations in on place or another?

It's just that if I'm reading the annotations and looking out for gorule text, do I need to look at both the shape annotations, and the triple expressions annotations?

Additionally, maybe we could mint a specific gorule relation that indicates that this expression/shape is for a gorule? @cmungall do you have thoughts about that? // go:rule GORULE:00000008 or something?

dougli1sqrd commented 4 years ago

Ah I had a second look: So we have:

"violations": [
          {
            "explanations": [
              {
                "shape_id": "http://purl.obolibrary.org/obo/go/shapes/MolecularFunction",
                "constraints": [
                  {
                    "target_node_uri": "http://model.geneontology.org/R-HSA-73930/R-HSA-73930",
                    "property_id": "http://purl.obolibrary.org/obo/RO_0002413",
                    "intended_range_shapes": [
                      "http://purl.obolibrary.org/obo/go/shapes/MolecularFunction"
                    ]
                  }
                ]
              }
            ],
            "node_id": "http://model.geneontology.org/R-HSA-73930/R-HSA-110360",
            "commentary": "http://model.geneontology.org/R-HSA-73930/R-HSA-110360 did not match http://purl.obolibrary.org/obo/go/shapes/MolecularFunction"
          }
        ]

How about we add a key annotations parallel to both shape_id and within the constraint object? So:

"annotations": [
  {
    "predicate": "go:rule" // or whatever,
    "object": "GORULE:00000008" // or whatever
  }, ...
]

That way when this is parsed, I can tell which annotations are shape level, and which are constraint level.

goodb commented 4 years ago

@kltm thoughts?
Pinging @vanaukenk to follow this for the shex work.

kltm commented 4 years ago

@dougli1sqrd I think it would be helpful to me to see a mocked-up blob of what the JSON output is to look like in this instance. I'm interested in having an overall consistent layout for all of the expected errors coming through: shex, owl, rules, etc.

dougli1sqrd commented 4 years ago

@kltm:

"violations": [
          {
            "explanations": [
              {
                "shape_id": "http://purl.obolibrary.org/obo/go/shapes/MolecularFunction",
                "annotations": [
                  {
                    "predicate": "go:rule",
                    "object": "GORULE:00000012"
                  } 
                ]
                "constraints": [
                  {
                    "target_node_uri": "http://model.geneontology.org/R-HSA-73930/R-HSA-73930",
                    "property_id": "http://purl.obolibrary.org/obo/RO_0002413",
                    "intended_range_shapes": [
                      "http://purl.obolibrary.org/obo/go/shapes/MolecularFunction"
                     ]
                     "annotations": [
                       {
                         "predicate": "go:rule",
                         "object": "GORULE:00000008"
                       } 
                      ]
                    }
                 ]
              }
            ],
            "node_id": "http://model.geneontology.org/R-HSA-73930/R-HSA-110360",
            "commentary": "http://model.geneontology.org/R-HSA-73930/R-HSA-110360 did not match http://purl.obolibrary.org/obo/go/shapes/MolecularFunction"
          }
        ]
kltm commented 4 years ago

@dougli1sqrd Okay, I may be a little slow here, but what is the subject then? The shape_id? Wouldn't then relation then be something like has_constraint or something? I'm not really understanding the duplication of the rule...

dougli1sqrd commented 4 years ago

Annotations can be placed at the shape level or at the constraint level, so that's why it's in two different places.

Annotations only have a "predicate" and an "object". I suppose that means the constraint or the shape is the implicit subject.

dougli1sqrd commented 4 years ago

For shapes, note we now have GORULE:0000056: https://github.com/geneontology/go-site/blob/master/metadata/rules/gorule-0000056.yaml

goodb commented 4 years ago

Based on the title of this ticket it could be closed. The Minerva command line client can now take validate the consistency of batches of input go-cams. e.g. java -Xmx8g -jar minerva-cli.jar --validate-go-cams -c ./catalog-v001-for-noctua.xml -i ./blazegraph.jnl

There is useful discussion there about how the results ought to be delivered. Perhaps that should be its own issue/project.