Gocamgen model with has_input should pass

dustine32 commented 4 years ago

In branch: https://github.com/geneontology/go-shapes/tree/dustine32-test-has_input

I have this gocamgen-generated test TTL file with a single assertion individual: This model fails both java and python validators despite appearing to follow the ShEx spec:

Protein <-enabled_by- MolecularFunction -has_input-> MolecularEntity

Here's the python validator output:

File: ../test_ttl/go_cams/should_pass/WB_WBGene00000903_partial.ttl Success: False PASS: 4 FAIL: 1
  FAIL: http://model.geneontology.org/0b6b7849-258c-4445-892e-e480610c63fd/09f78549-aae3-4e12-be2f-e27aa601a408 SHAPE: http://purl.obolibrary.org/obo/go/shapes/MolecularFunction REASON:   Testing <http://model.geneontology.org/0b6b7849-258c-4445-892e-e480610c63fd/09f78549-aae3-4e12-be2f-e27aa601a408> against shape N1b900799bad94646b0f0bdc18dea6b82
    Triples:
      <http://model.geneontology.org/0b6b7849-258c-4445-892e-e480610c63fd/09f78549-aae3-4e12-be2f-e27aa601a408> rdf:type obo:GO_0005160 .
      <http://model.geneontology.org/0b6b7849-258c-4445-892e-e480610c63fd/09f78549-aae3-4e12-be2f-e27aa601a408> rdf:type owl:NamedIndividual .
   2 triples exceeds max {1,1}
  Testing <http://model.geneontology.org/0b6b7849-258c-4445-892e-e480610c63fd/09f78549-aae3-4e12-be2f-e27aa601a408> against shape N1b900799bad94646b0f0bdc18dea6b82
    Testing <http://model.geneontology.org/0b6b7849-258c-4445-892e-e480610c63fd/0b95360f-8394-4a17-9687-004a234b64fa> against shape N6baf29f3024240789b58b9a33f3380f4
      Triples:
      <http://model.geneontology.org/0b6b7849-258c-4445-892e-e480610c63fd/0b95360f-8394-4a17-9687-004a234b64fa> rdf:type UniProtKB:P04202 .
      <http://model.geneontology.org/0b6b7849-258c-4445-892e-e480610c63fd/0b95360f-8394-4a17-9687-004a234b64fa> rdf:type owl:NamedIndividual .
   2 triples exceeds max {1,1}
  Testing <http://model.geneontology.org/0b6b7849-258c-4445-892e-e480610c63fd/09f78549-aae3-4e12-be2f-e27aa601a408> against shape N1b900799bad94646b0f0bdc18dea6b82
    Testing <http://model.geneontology.org/0b6b7849-258c-4445-892e-e480610c63fd/0b95360f-8394-4a17-9687-004a234b64fa> against shape N6baf29f3024240789b58b9a33f3380f4
      Testing UniProtKB:P04202 against shape http://purl.obolibrary.org/obo/go/shapes/OwlClass
           No matching triples found for predicate rdf:type
  Testing <http://model.geneontology.org/0b6b7849-258c-4445-892e-e480610c63fd/09f78549-aae3-4e12-be2f-e27aa601a408> against shape N1b900799bad94646b0f0bdc18dea6b82
    Testing <http://model.geneontology.org/0b6b7849-258c-4445-892e-e480610c63fd/0b95360f-8394-4a17-9687-004a234b64fa> against shape N6baf29f3024240789b58b9a33f3380f4
      Triples:
      <http://model.geneontology.org/0b6b7849-258c-4445-892e-e480610c63fd/0b95360f-8394-4a17-9687-004a234b64fa> rdf:type UniProtKB:P04202 .
      <http://model.geneontology.org/0b6b7849-258c-4445-892e-e480610c63fd/0b95360f-8394-4a17-9687-004a234b64fa> rdf:type owl:NamedIndividual .
   2 triples exceeds max {1,1}
  Testing <http://model.geneontology.org/0b6b7849-258c-4445-892e-e480610c63fd/09f78549-aae3-4e12-be2f-e27aa601a408> against shape N1b900799bad94646b0f0bdc18dea6b82
    Testing <http://model.geneontology.org/0b6b7849-258c-4445-892e-e480610c63fd/0b95360f-8394-4a17-9687-004a234b64fa> against shape N6baf29f3024240789b58b9a33f3380f4
         No matching triples found for predicate rdf:type
  Testing <http://model.geneontology.org/0b6b7849-258c-4445-892e-e480610c63fd/09f78549-aae3-4e12-be2f-e27aa601a408> against shape N1b900799bad94646b0f0bdc18dea6b82
    Node kind mismatch have: URIRef expected: bnode
  Testing <http://model.geneontology.org/0b6b7849-258c-4445-892e-e480610c63fd/09f78549-aae3-4e12-be2f-e27aa601a408> against shape N1b900799bad94646b0f0bdc18dea6b82
    Triples:
      <http://model.geneontology.org/0b6b7849-258c-4445-892e-e480610c63fd/09f78549-aae3-4e12-be2f-e27aa601a408> rdf:type obo:GO_0005160 .
      <http://model.geneontology.org/0b6b7849-258c-4445-892e-e480610c63fd/09f78549-aae3-4e12-be2f-e27aa601a408> rdf:type owl:NamedIndividual .
   2 triples exceeds max {1,1}
  Testing <http://model.geneontology.org/0b6b7849-258c-4445-892e-e480610c63fd/09f78549-aae3-4e12-be2f-e27aa601a408> against shape N1b900799bad94646b0f0bdc18dea6b82
       No matching triples found for predicate rdf:type
Final report >> all files successful: False

Strange here is that I get a 2 triples exceeds max {1,1} cardinality violation for predicate rdf:type when I'm using this predicate for something that seems so fundamental to our models: "X is an Individual" and "X is of class Y".

Running the validator against the rest of the WB:WBGene00000903 model with this assertion individual removed (so that only simple GP->term assertions remain), I get a PASS result. So I think this would indicate that my general OWL syntax in these models is OK; I'm guessing it's this has_input relation that's causing problems.

@balhoff @goodb Are you able to spot anything here that I can change to get it to pass?

Thanks!

goodb commented 4 years ago

The problem with this one was that UniProtKB:P04202 doesn't appear to exist in neo in rdf.geneontology.org . Hence the constraint: has_input: @ *;

is violated by obo:RO_0002233 http://model.geneontology.org/0b6b7849-258c-4445-892e-e480610c63fd/0b95360f-8394-4a17-9687-004a234b64fa ;

and

http://model.geneontology.org/0b6b7849-258c-4445-892e-e480610c63fd/0b95360f-8394-4a17-9687-004a234b64fa a UniProtKB:P04202, owl:NamedIndividual .

Changing to a different identifier for the protein, which neo knows, allows the file to validate.

Possibly relevant to https://github.com/geneontology/neo/issues/24

dustine32 commented 4 years ago

@vanaukenk @ukemi This is the example line containing the "invalid" UniProtKB:

WB  WBGene00000903  enables GO:0005160  PMID:8910282|WB_REF:WBPaper00002600 ECO:0000250 UniProtKB:P04202    20141020    WB

Something that seems odd to me is that this UniProtKB:P04202 is a mouse protein, though this may be a legit example of a multi-species annotation? It is to a binding descendant term.

Regardless of the differing species issue, as @goodb suggests, we can fix most of these errors by switching to using MOD identifiers known to NEO in the with/from and extensions columns. Some ways to actually move forward:

"Fix" the with/from, extensions UniProt usage upstream in the MOD GPADs by switching to MOD identifiers or other identifiers known to NEO.
Load all of SwissProt into NEO

@vanaukenk @ukemi What do you think about this?

There's also the issue of variable prefix usage (http://identifiers.org/wormbase/WBGene00000903 [how go_context.jsonld and thus gocamgen resolves WB:] vs http://identifiers.org/WB:WBGene00000903 [resolvable by identifiers.org]), that could also bring this error back but, since the GPADs use CURIE's, I think this is a gocamgen->NEO problem.