geneontology / go-annotation

This repository hosts the tracker for issues pertaining to GO annotations.
BSD 3-Clause "New" or "Revised" License
34 stars 10 forks source link

Some relations in the Noctua GPAD are not being converted from labels to identifiers correctly #5092

Open ukemi opened 7 months ago

ukemi commented 7 months ago

As part of our QC, @LiNiMGI and I noticed that there are annotations not passing because they have 'invalid' relations in annotation extensions. Here is the list:

Invalid Relation in GO-Property GOREL:0000032 exists_during.
GOREL:0001019 results_in_division_of.
GOREL:0098790 regulates_translation_of. RO:0002232 has_end_location
RO:0002449 directly negatively regulates activity of.

In the Noctua GPAD output, relations are represented by their labels, but in the 'new' GPAD, they are represented by identifiers. I think most of them have been updated (or the policy for their usage has been updated) and the following changes need to be made: GOREL:0000032 exists_during. ---> RO:0002491 (existence starts and ends during) GOREL:0001019 results_in_division_of. ---> RO:0002233 (has_input-new policy and should be changed in the models? Although I can't find any documentation)
GOREL:0098790 regulates_translation_of. ---> deprecated (2 annotations from SynGO. see below) RO:0002232 has_end_location ---> This one is correct, but not in MGI. These extensions are the result of an inference that isn't very useful, but we could add it.
RO:0002449 directly negatively regulates activity of. ----> RO:0002630 (I think this is an incorrect translation of 'directly negatively regulates' See below).

[Typedef] id: regulates_translation_of name: obsolete regulates translation of def: "OBSOLETE. A relationship that holds between a process and a protein coding gene whose translation is regulated by that process." [] comment: Regulation of translation includes processes that regulate the levels of mature mRNA available for translation, as well as direct regulation of the rate of translation. comment: Usage of this relation has been deprecated in favor of 'has direct input'. subset: display_for_curators subset: valid_for_annotation_extension xref: GOREL:0098790 property_value: local_domain BFO:0000015 xsd:string property_value: local_range "PR:000000001 SO:0000673 SO:0000704" xsd:string property_value: usage "Relates a regulatory process or function to the gene whose translation it regulates." xsd:string {xref="GOC:dos"} is_obsolete: true

Annotation in the original Noctua file: MGI MGI:1345138 enables GO:0140678 PMID:15962011 ECO:0000315 20230504 MGI part_of(GO:0045742),directly_negatively_regulates(GO:0060090) contributor=https://orcid.org/0000-0001-7476-6306|noctua-model-id=gomodel:6446bfcb00001000|model-state=production

Annotation in Lori’s import file: Invalid Relation in GO-Property (11:Annotation_Extensions,12:Annotation_Properties): cannot find RO:,BFO: id: RO:0002449 MGI:MGI:1345138 RO:0002327 GO:0140678 PMID:15962011 ECO:0000315 2023-05-04 MGI BFO:0000050(GO:0045742),RO:0002449(GO:0060090) contributor=https://orcid.org/0000-0001-7476-6306|noctua-model-id=gomodel:6446bfcb00001000|model-state=production

sierra-moxon commented 7 months ago

from managers call: perhaps a question for @balhoff - is there a stale mapping somewhere in the ontology build? - @ukemi volunteers to discuss with @balhoff check ontobio for these IDs (maybe there is a mapping file somewhere that don't have the ids specifically)

MGI awesome curators found this, but its not MGI specific :)

sierra-moxon commented 7 months ago

sierra: follow up with a small group call: David, Li, Sierra, Jim, Kimberly, Seth to figure out the relations mapping

balhoff commented 7 months ago

@sierra-moxon I'm not sure this is exactly the issue or not, but GPADs out of Noctua use only these relations for extensions: https://github.com/geneontology/minerva/blob/df132bc5f1846c99735936f38058ee49fc339c84/minerva-converter/src/main/resources/org/geneontology/minerva/legacy/sparql/gpad-extensions.rq#L17

That list has been evolving based on feedback ever since it was created, so if there are some that need to be added there, it would be no problem.

pgaudet commented 6 months ago

@sierra-moxon

Li and I are looking into this again. For example for the first pair: GOREL:0000032 exists_during. ---> RO:0002491 (existence starts and ends during)

I don't find GOREL:0000032 in the 'conversion' file @balhoff pasted in the previous comment. We do see RO:0002491.

Do you know where these obsolete relations are coming from ? So we could update whatever file is injecting bad data. (see first comment for full list of incorrect relations coming out in the Noctua GPAD export).

Thanks, Pascale

ukemi commented 6 months ago

From the GOA mouse gaf: UniProtKB P46935 Nedd4 located_in GO:0043197 PMID:25505317 IDA C E3 ubiquitin-protein ligase NEDD4 Nedd4|Kiaa0093|Nedd-4|Nedd4-1|Nedd4a|Rpf1 protein taxon:10090 20160412 SynGO part_of(CL:0002608),exists_during(GO:0035235)

pgaudet commented 6 months ago

I see - this is a SynGO issue. Are all SynGO ? do you know?

ukemi commented 6 months ago

I don't know if they are all SynGO.

LiNiMGI commented 6 months ago

many of the MGI annotations are not passing, mostly due to RO:0002449 and RO:0002232. See David's example above for RO:0002449 -incorrect translation of 'directly negatively regulates' ...

ukemi commented 6 months ago

Note that we could add those to MGI, but I thought the plan was to replace or obsolete them. That would make adding them a bad idea on the MGI end. @vanaukenk

ukemi commented 6 months ago

We went ahead and added RO:0002232 has_end_location into MGI, since it seems to be still inferred. The fate is the hands of the GOC.