geneontology / noctua-annotation-review

0 stars 0 forks source link

ShEX violations #35

Open ukemi opened 3 years ago

ukemi commented 3 years ago

I'm not sure that this is the correct place for this ticket, but as I test ART, I am seeing a worrying number of ShEX violations in models. If we are to use ShEX in a meaningful way, we need to address these sooner rather than later. Otherwise they will become like the boy who cried wolf. Many of them seem to be arising because entities cannot be found. we need to have systematic ways of identifying errors and correcting them.

vanaukenk commented 3 years ago

@ukemi Is it possible that these are actually issues with NEO? If so, let's put a ticket in the NEO tracker.

ukemi commented 3 years ago

There is already one there for the missing mouse anatomical structures. But some seemed to be obsolete evidence codes and just general weirdness that was difficult to interpret. It's a catch 22. I would be curious to see how many of these are resolved once NEO works, but I doubt the obsolete or merged terms will be handled. As I said, I'm not sure which tracker this belongs, but it is concerning to see so many errors.

balhoff commented 3 years ago

@ukemi is any of it related to CHEBI? We have some issues there with not including all the terms people have used in models. We should probably just include all of CHEBI; to be honest I can't remember what the push back was.

ukemi commented 3 years ago

The ones I've seen are what appear to be obsolete or merged evidence codes (??). Missing anatomical structures (EMAPA). Missing EMAPA is from a previous QC of imports, not ART. I think there are cases where annotation rules have changed over time, but the older models are failing. I'd like to know how many production models fail and get a feeling for the kinds of errors that we are seeing. I haven't come across any related to ChEBI yet.

ukemi commented 3 years ago

I just tracked down one of the EMAPA identifiers and it looks like it is an alt ID issue.

[Term] id: EMAPA:18535 name: spleen primordium namespace: anatomical_structure alt_id: EMAPA:18659 synonym: "splenic primordium" EXACT [] relationship: ends_at TS23 ! TS23 relationship: part_of EMAPA:17025 ! dorsal mesogastrium relationship: part_of EMAPA:18765 ! hemolymphoid system relationship: starts_at TS22 ! TS22

So I'm guessing that is indeed a NEO issue and should be resolved at that level.

balhoff commented 3 years ago

So I'm guessing that is indeed a NEO issue and should be resolved at that level.

Just to head off (frequent) confusion; this is go-lego. go-lego includes all of NEO (which is just the OWL-ified gene terms from GPIs/GAFs). go-lego also includes (portions of) all the various domain ontologies.

ukemi commented 3 years ago

Thanks @balhoff. I was indeed confused by that. I thought NEO was all entities used by Noctua.