geneontology / minerva

BSD 3-Clause "New" or "Revised" License
6 stars 8 forks source link

add cardinality to shex validation report #330

Closed goodb closed 4 years ago

goodb commented 4 years ago

The shex schema may specify e.g.

enabled_by: @<InformationBiomacromolecule> {1};

indicating that something should have one and only one enabled_by attribute.

If a model violates that constraint by having 0 or 3 enabled_by attributes, the current implementation does correctly mark the model invalid and flag the offending node. However, it does not explain what is wrong. This ticket is to add an indication to the Violation report indicating a cardinality violation.

probably in ShexValidator.getUnmetConstraints() probably end up with an extension to the ShexConstraint class like: intended_cardinality actual_cardinality

then amend the human/tab readable text generated by ShexValidationReport

goodb commented 4 years ago

This is working now (PR on the way). Heads up to @kltm there is another possible piece of data in the json object coming back to capture this. If there is a cardinality violation, it will report the expected cardinality associated with the problem node and property, and the observed. Like this for an expected cardinality in the interval 0 to 1 with an observed (nobjects) of 2:

 "shex-validation": {
                "node-matched-shapes": {},
                "is-conformant": false,
                "tracker": "https://github.com/geneontology/go-shapes/issues",
                "rule-file": "https://github.com/geneontology/go-shapes/blob/master/shapes/go-cam-shapes.shex",
                "violations": [
                    {
                        "explanations": [
                            {
                                "shape": "obo:go/shapes/MolecularFunction",
                                "constraints": [
                                    {
                                        "property": "RO:0002333",
                                        "cardinality": "[0; 1]",
                                        "nobjects": 2
                                    }
                                ]
                            }
                        ],
                        "node": "gomodel:5e8c0c4e00000004/5e8c0c4e00000006"
                    }
                ]
            }

Given that this a simple extension of the existing structure and casual testing indicates it doesn't break anything I intend to put it in as it is. If/when we have an opportunity to do more than show the json blob in the UI we can iterate on any desired changes to that output structure.

goodb commented 4 years ago

Heads up to @tmushayahama @vanaukenk @kltm that the update live on noctua-dev as of July 17, 2020 has a bug related to the title. If the reasoner/validator is run prior to adding a title to the model, it (correctly) notices a cardinality violation (zero values for title) but incorrectly hangs for some reason...

kltm commented 4 years ago

@goodb To clarify, "hangs" here is that minerva freezes up?

goodb commented 4 years ago

@kltm minerva actually doesn't freeze, but the client chokes. Its a problem for any missing metadata, not just the title. The noctua client is looking for something that it can't find though I haven't pinpointed what is wrong with the response yet. It dies on i = n.types(); in the chunk below. which evaluates to null.

    G(a.violations(), function(e) {
                        var t = e.node_id(),
                            n = a.get_node(t),
                            r = "",
                            i = n.types();
                        x.each(i, function(e) {
                            r += U(e)
                        }), "" === r && (r = t), e.explanations() && 0 < e.explanations().length ? G(e.explanations(), function(e) {
                            e.shape && (c[e.shape] || (c[e.shape] = []), c[e.shape].push(r))
                        }) : (c[["unknown"]] || (c[["unknown"]] = []), c.unknown.push(r))
                    });
goodb commented 4 years ago

@kltm I have a pretty good hunch about what is wrong. For shex validation checks on model-level metadata, the node id is actually the id of the model. I suspect that there is no node representing the model as a whole in the client graph representation. Hence when it looks for the types of the node in question, it produces the error and croaks. See the value for node in the screenshot here.

Screen Shot 2020-07-17 at 1 40 38 PM

The current approach is working for the other context of the command line validator. If we want to make use of the metadata checks from shex, it looks like we will need to think about how to handle them in the noctua client. We could have it check to see if the node id is the same as the model id to detect them. Or, I could adjust the response to indicate them some other way. Let me know what you think.

kltm commented 4 years ago

@goodb Ah, understood--that sounds like it's likely to be the story. Certainly, we should make the client less crashy, so we can follow up on that. It feels a little weird to me that a "node", in this context, can also be a model. That said, we'll have to deal with model-level violations at some point in some way anyways (missing metadata, etc.). What do you think of the two paths: 1) we jimmy everything into the current payload (what we have now and fix the client) vs 2) have the payload discriminate these two types/levels (minerva fix, and client fix)?

goodb commented 4 years ago

@kltm I guess I'm not too passionate here. My temptation is to your 1) -> put the logic into the client as it is very simple (does model_id = node id?) and the rest of it needs to be handled in the client anyway (e.g. where does the report show up?). If there were important differences in the structure of the errors coming back for the model versus nodes in the model then I would think differently, but as far as I can tell, the basic structure is always going to be the same for both.

If I'm missing something about the client that makes it easier/better to change the server code, let me know and we can go the other way with it.

kltm commented 4 years ago

@goodb Yeah, I think I agree with you at this point. There is something a little wonky to me about node-as-model--maybe I just find it aesthetically displeasing--but I agree that it's probably best to just deal in the client for the time being.

https://github.com/geneontology/noctua/issues/683

goodb commented 4 years ago

@kltm agree its a little weird when thinking in terms of a model being a single closed world as we often do. But, when thinking of collections of models in an RDF context, it makes more sense (at least to me).