geneontology / minerva

BSD 3-Clause "New" or "Revised" License
6 stars 8 forks source link

It should be impossible to inject invalid OWL into models #58

Closed cmungall closed 7 years ago

cmungall commented 8 years ago

From https://github.com/geneontology/noctua-models/issues/18#issuecomment-248775569

It seems if the client tries to pass in something like

ClassAssertion("IDA", iri)

The OWLAPI unquestioningly accepts it, but when it saves it results in invalid OWL

Can we first have a unit test to test this hypothesis. If it turns out to be true we need to address. The best strategy is not clear and may be part of a general improvement in CURIE handling

balhoff commented 8 years ago

Yes, you can create an IRI in OWL API like "IDA". It will then serialize it without complaint. But it will fail to parse it back in. I'm not sure where best to put a test for this in minerva. Going through JsonOrJsonpBatchHandler.m3Batch seems to check whether the class IRI is known (regardless of whether it is a valid form). So I think on that path the nefarious class would need to already exist in the tbox.

Could these possibly come in through the "Function companion" tool?

kltm commented 8 years ago

The IDs through most paths should be pretty solid. This seems to happen with a couple of widgets where the user just hits return (or tab) immediately without selecting anything. While it would be nice if minerva would protect itself better in this case (not being able to re-read of disk is pretty bad), we could also implement checks in the client and force the UI to be more consistent in selection.

cmungall commented 8 years ago

I think the FC tool should be safe. More likely the ability of curators to paste in their own classes.

Requiring a class be in the tbox is too strict (we have that turned off, as some curators need to paste CURIEs of uniprot classes not loaded). But this would be the right place in the code to check.

On 23 Sep 2016, at 7:33, Jim Balhoff wrote:

Yes, you can create an IRI in OWL API like "IDA". It will then serialize it without complaint. But it will fail to parse it back in. I'm not sure where best to put a test for this in minerva. Going through JsonOrJsonpBatchHandler.m3Batch seems to check whether the class IRI is known (regardless of whether it is a valid form). So I think on that path the nefarious class would need to already exist in the tbox.

Could these possibly come in through the "Function companion" tool?

You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub: https://github.com/geneontology/minerva/issues/58#issuecomment-249209089

balhoff commented 8 years ago

@cmungall I guess you mean pasting a class into the "Edit Instance" panel. I see that you can paste in any text there and it will be accepted as an IRI. In file mode the OWL API will save this. In Blazegraph mode the bad ID can be entered, but when you save some Rio errors will be logged, e.g. "Subject URI was invalid". The model will not be saved, but nothing seems to be communicated to the front-end (the bad type will still appear until Minerva is restarted).

balhoff commented 8 years ago

Aargh but the error is logged and swallowed in the OWL API so it is reported as a successful save. And if there are other changes it actually will be saved, partially...

kltm commented 8 years ago

IIRC, if you throw the error far enough, it should be picked up by the response handler and get the error back to the client.

balhoff commented 8 years ago

Yes—in this case the error is only logged inside OWL API. It is swallowed before we can see it.

balhoff commented 8 years ago

I think the right place to check incoming IRIs is at CurieHandler.getIRI. At a minimum we can prevent relative IRIs completely by making sure the text passed to IRI.create at least contains a colon (it should have been expanded to http://... by the curie map). To be even more strict we could verify that the result is a valid java.net.URL (it will throw an error for unknown protocols). However this would disallow valid identifiers that are not URLs like urn:uuid:12345. An in-between solution would be to have a small list of valid protocols like http, urn, mailto, and throw an error if the expanded IRI doesn't start with one of these. Would this be missing something?

cmungall commented 8 years ago

I'm pretty sure http is only only protocol, but famous last words and all that. I think extending to the protocols you mention should be safe.

General comment: is the CURIE handling worth spinning out into a separate project

On 3 Oct 2016, at 11:59, Jim Balhoff wrote:

I think the right place to check incoming IRIs is at CurieHandler.getIRI. At a minimum we can prevent relative IRIs completely by making sure the text passed to IRI.create at least contains a colon (it should have been expanded to http://... by the curie map). To be even more strict we could verify that the result is a valid java.net.URL (it will throw an error for unknown protocols). However this would disallow valid identifiers that are not URLs like urn:uuid:12345. An in-between solution would be to have a small list of valid protocols like http, urn, mailto, and throw an error if the expanded IRI doesn't start with one of these. Would this be missing something?

You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub: https://github.com/geneontology/minerva/issues/58#issuecomment-251194286

kltm commented 8 years ago

@balhoff Okay, I'm now testing on minerva at, I believe, a corrected revision: https://github.com/geneontology/minerva/commit/cde48f8595ae06ee5f7fe111752c5d34d8a80f47

I can still add an item, say a gene product for an annoton, using the string "fooble". In the OWL, this comes up as:

Class: <fooble>

It then saves to disk just fine. A restart of minerva works, as well as viewing the model again.

Is this correct behavior?

balhoff commented 8 years ago

@kltm I can't reproduce your test. Can you check on the latest version?

kltm commented 7 years ago

I apparently had versioning issues. I can no longer reproduce this when under adult supervision on the production server.