geneontology / go-annotation

This repository hosts the tracker for issues pertaining to GO annotations.
BSD 3-Clause "New" or "Revised" License
34 stars 10 forks source link

Matrix: amino acid metabolism/protein targeting 2 (targeted to peroxisome) #2425

Closed ValWood closed 5 years ago

ValWood commented 5 years ago

PIPOX | Peroxisomal sarcosine oxidase |   | protein targeting to peroxisome |   | Reactome | Homo sapiens | TAS |   | peroxisomal sarcosine oxidase pthr10961 | protein |   | Reactome:R-HSA-9033241 | 20181121

ValWood commented 5 years ago

and

HMGCL | Hydroxymethylglutaryl-CoA lyase, mitochondrial |   | protein targeting to peroxisome |   | Reactome | Homo sapiens | TAS |   | family not named pthr42738 | protein |   | Reactome:R-HSA-9033241 | 20181121

ValWood commented 5 years ago

and

BAAT | Bile acid-CoA:amino acid N-acyltransferase |   | protein targeting to peroxisome |   | Reactome | Homo sapiens | TAS |   | acyl-coenzyme a thioesterase-related pthr10824 | protein |   | Reactome:R-HSA-9033241 | 20181121

ValWood commented 5 years ago

and

AGXT | Serine--pyruvate aminotransferase |   | protein targeting to peroxisome |   | Reactome | Homo sapiens | TAS |   | aminotransferase class v pthr21152 | protein |   | Reactome:R-HSA-9033241 | 20181121

deustp01 commented 5 years ago

Per GO usage, the entities that mediate protein targeting to the peroxisome should be associated with with the protein-targeting GO term but the entities that are targeted should not be, but in fact both the mediating proteins and the proteins that are targeted have gotten associated. That's a mistake; no argument. The practical issue here is that we've annotated targeting as a three step process. First, mediator protein binds cytosolic cargo to form a complex. Second, some sort of conformational magic happens such that, third, the complex is now oriented so that when it dissociates the cargo is released into the peroxisomal lumen. Nothing in the annotation distinguishes the mediator role from the cargo role so there are no clues to enable our GAF-generating script to distinguish the two and only tag the mediators with the GO term.

A short-term fix is simply to remove the GO process term from Reactome pathway R-HSA-9033241. Targeting may be mediated by some kind of pore structure, however, and if this is true then it might be possible to re-do the annotations as a transport process rather than as a series of binding reactions. If there are experimental data to support that re-annotation, then distinguishing the mediator and cargo roles is easy.

@valwood, @pgaudet, any advice here?

ValWood commented 5 years ago

I'm not sure of the solution but target molecules as process participants is quite pervasive. Here is one I saw today

REAC | P52948 | NUP98 | | GO:0016925 | protein sumoylation | ECO:0000304(TAS) | ECO:0000304 | (TAS) | | Reactome:R-HSA-4655355

ans I think there are qite a few similar ones on the tracker. For some, because the number of known targets is large these really over-inflate the numbers of gene products annotated to these processes. It worries me that this will mask some enrichments.

So is there no clever way that targets can be distinguised in Reactome? If so blocking annotation export for certain pathways seems like a short term solution. If there is no way to identify these from the Reactome end, I can't think of a way from the GAF. I am currently identifying violations using the matrix tool, but this will only find a small subset of the problem pathways. I guess it's a start....

deustp01 commented 5 years ago

In general we can distinguish targets, e.g., substrates of reactions and cargoes in transport processes. The problem is binding reactions, where we treat all participants as equal - there's no bind-er and bind-ee.

The change in our GAF-generation script to distinguish catalysts and transporters from their substrates and cargoes, and only attach GO terms to the catalysts and transporters was only made about a year ago, so there may well be a backlog of tracker items from the bad old days, that may in fact have been cleaned up.

pgaudet commented 5 years ago

@deustp01 when can we check again whether the annotations have been corrected ?

deustp01 commented 5 years ago

protein targeting to peroxisome - that change will require curator work to determine whether events now annotated as binding can be reworked to be annotated as transport, so uncertain when a correction will be visible.

The NUP98 annotation to SUMOylation noted above REAC | P52948 | NUP98 | | GO:0016925 | protein sumoylation | ECO:0000304(TAS) | ECO:0000304 | (TAS) | | Reactome:R-HSA-4655355 should already be fixed.

pgaudet commented 5 years ago

OK. Let me know if any mappings should be removed meanwhile.

deustp01 commented 5 years ago

OK. I have removed the "protein targeting to peroxisome" GO_BP annotation from pathway R-HSA-4655355. That will keep the term from propagating onto target proteins like PIPOX that set off this ticket (and the roles of the proteins that mediate the targeting are well annotated by others so little or no information is lost from GO). This change will become visible on the Reactome web site and in iots GAF with our next release, in mid-September.

deustp01 commented 5 years ago

OK to close this ticket?

ValWood commented 5 years ago

For me good to close if fixed rather than wait. This way if I come across the sem violation again I will know its already fixed. Is that OK @pgaudet ?

pgaudet commented 5 years ago

Sorry, I am not clear if I need to change mappings while we wait for the Reactome changes to trickle through.

deustp01 commented 5 years ago

The peroxisome problem is fixed (imperfectly; see my last comment before "OK to close", above). The changes and amputations will not be visible until September. Any reason @pgaudet can't remove mappings now?

pgaudet commented 5 years ago

Guess not ! I'll do that.

pgaudet commented 5 years ago

Can't find GO_0006625 in the reactome_xrefs_import.owl file

I do find GO:0016925; should I go ahead and remove that one?

pgaudet commented 5 years ago

@deustp01

I can't find GO_0006625 in the reactome_xrefs_import.owl file

I do find GO:0016925; should I go ahead and remove that one?

deustp01 commented 5 years ago

No. I'll fix that one from our end by removing the GO process term from the SUMOylation pathway. With out next release in September, that will have the effect of preventing the GO SUMOylation term from propagating onto the proteins that undergo SUMOylation. The SUMOylation function terms will persist, so the annotations for proteins that mediate SUMOyation O(but not their substrates / targets) will keep their annotations and involvement in SUMOylation process will still be inferrable. Bottom line, some information loss but not too much. Second bottom line, same outcome as for peroxisome import.

Okay?

deustp01 commented 5 years ago

I should have read more of the thread. I have now fixed Reactome as described for SUMO, as for the peroxisome case previously. So do whatever is needed on the GO side.

pgaudet commented 5 years ago

Interesting. I find neither GO term now. Closing.

deustp01 commented 5 years ago

I only touched our internal database, not the public one, and even the public one communicates with GO only via a GAF we generate at the time of each release, so it's hard to see how my edits for peroxisom e protein import and SUMOylation could affect what Pascale sees in GO. Hmm