geneontology / go-site

A collection of metadata, tools, and files associated with the Gene Ontology public web presence.
http://geneontology.org
BSD 3-Clause "New" or "Revised" License
46 stars 89 forks source link

reactome2go mapping is not complete, missing expected mappings #1993

Open suzialeksander opened 1 year ago

suzialeksander commented 1 year ago

We have a user that is trying to cross-reference >1000 Reactome terms to GO, and cannot find some of these in the reactome2go: http://current.geneontology.org/ontology/external2go/reactome2go. The list includes

@cmungall found specific cases, like https://reactome.org/content/detail/R-HSA-4641258 which should be GO:0090090 negative regulation of canonical Wnt signaling pathway but R-HSA-4641258 is not in the mapping.

@ukemi added

The mapping is in Reactome, http://noctua.geneontology.org/editor/graph/gomodel:R-HSA-4641258?

But it seems to be missing from GO: http://amigo.geneontology.org/amigo/term/GO:0090090 Click on mappings.

There were a few emails on this last week, will consolidate them in this ticket.

suzialeksander commented 1 year ago

From @deustp01:

The current version of our GAF, which should correspond to the version of Reactome on our web site, has two lines for R-HSA-4641258. However, I have not looked at any GO QA reports on that GAF to see if these lines were accepted to build the Amigo data set.

UniProtKB Q76N89 HECW1_HUMAN involved_in GO:0090090 REACTOME:R-HSA-4641258 TAS P protein taxon:9606 20221119 Reactome

UniProtKB P62877 RBX1_HUMAN involved_in GO:0090090 REACTOME:R-HSA-4641258 TAS P protein taxon:9606 20221119 Reactome

This user, as she says, is trying to systematically associate GO BP terms to Reactome events.

I suggested that she work on mappings from the Reactome side and pointed her to our “query” API (https://reactome.org/ContentService/#/query/findById) that takes the Reactome stable ID R-HSA-# of an instance as input, and returns all the attributes of that instance, including its GO BP annotation if it has one. I expect that this exercise will turn up additional unexpected Reactome usage of GO BP terms. I will follow up with her to try to get a look at those results to see if we can feed them into the pathways2GO project to improve this part of Reactome : GO alignment.

suzialeksander commented 1 year ago

from @cmungall

It looks like it might be a problem on the Reactome side Peter?

We get our mappings from https://reactome.org/download/current/Reactions2GoTerms_human.txt

But R-HSA-4641258 isn't in here

curl -L -s https://reactome.org/download/current/Reactions2GoTerms_human.txt | grep R-HSA-4641258

However, the reactome website links R-HSA-4641258 to a GO term

suzialeksander commented 1 year ago

From Adam Wright:

The step in the Jenkins pipeline that generates the file Reactions2GoTerms_human.txt is Data Export: https://github.com/reactome/data-export/blame/main/src/main/java/org/reactome/server/export/tasks/ReactionsGoTerms.java

It looks like the Reactions2GoTerms_human.txt code was written by Antonio about 2 years ago. https://github.com/reactome/data-export

The pipeline that generates the GAF files gene_association.reactome.gz is the Download Directory pipeline. You can read about this step that creates the file here: https://github.com/reactome/release-download-directory/tree/main/src/main/java/org/reactome/release/downloaddirectory/GenerateGOAnnotationFile It looks like the code was initially written by Justin about 4 years ago and then main written by Joel about 2 years ago

When I do a query for the stId "R-HSA-4641258" I see that it is a pathway (website or Neo4j)

When I am looking a the code that creates the Reactions2GoTerms: https://github.com/reactome/data-export/blame/main/src/main/java/org/reactome/server/export/tasks/ReactionsGoTerms.java#L25

I see that the query is only looking for reactions. Therefore I would not expect it to show up in the results. That being said, I don't know the historical context behind this file. If GO is interested in having a similar file but with anything that relates to a GO term I am sure we could add a new file type perhaps the Gene Association File is sufficient for getting the other information.

The reaction2GoTerms_human.txt file is fairly straightforward. Joel Weiser would be able to answer more with regard to the GAF.

This is a long thread I could also be missing something.

deustp01 commented 1 year ago

David and I went through Chris’s list last week and here is what we found (e-mailed to some people but not posted to this ticket - sorry):

David and I just went through Chris’s list and here is what we found: escaped filter Reactome:R-HSA-1483063 > GO:cardiolipin synthase (CMP-forming) ; GO:0043337 alien process Reactome:R-HSA-372342.1 > GO:acetyl-CoA biosynthetic process from pyruvate ; GO:0006086 pathway Reactome:R-HSA-446205 > GO:GDP-mannose biosynthetic process from fructose-6-phosphate ; GO:0061729 pathway Reactome:R-HSA-5617833.2 > GO:cilium assembly ; GO:0060271 pathway Reactome:R-HSA-5620912.1 > GO:ciliary basal body-plasma membrane docking ; GO:0097711 pathway Reactome:R-HSA-5620924.2 > GO:intraciliary transport involved in cilium assembly ; GO:0035735 escaped filter Reactome:R-HSA-6787447 > GO:tRNA-5-taurinomethyluridine 2-sulfurtransferase ; GO:0061708 pathway Reactome:R-HSA-70171 > GO:canonical glycolysis ; GO:0061621 pathway Reactome:R-HSA-71336 > GO:pentose-phosphate shunt ; GO:0006098 alien process Reactome:R-HSA-71397.1 > GO:acetyl-CoA biosynthetic process from pyruvate ; GO:0006086 ** alien process Reactome:R-HSA-71849.1 > GO:lactate biosynthetic process from pyruvate ; GO:0019244 pathway Reactome:R-HSA-77108 > GO:ketone body catabolic process ; GO:0046952 escaped filter Reactome:R-HSA-9018785 > GO:protein folding chaperone ; GO:0044183 pathway Reactome:R-HSA-947581 > GO:sulfurated eukaryotic molybdenum cofactor(2-) biosynthetic process ; GO:1902756

“alien process” Three reactions are correctly mapped to their molecular function in Reactome but the process association shown here, though plausible, is not asserted in Reactome. “pathway” Seven pathways sneaked into a reaction-only file “escaped filter” Three GO MF terms escaped Chris’s filter

So the short-term question for Joel and Adam is, how did pathway instances sneak into a file that should be reactionlike events-only - answered in Adam's comment immediately above this one

And a short-term question for someone (David and I can’t figure out who) is where the alien process items came from. These reactions are not associated with any BP term in our central database or on our public release site, so we don’t see how they could come from Reactome curators. To add to the weirdness, the suggested alien terms make biological sense – those reactions are parts of the suggested processes.

suzialeksander commented 1 year ago

tagging @adamjohnwright

adamjohnwright commented 1 year ago

@deustp01 I have looked for the pathway ID "R-HSA-446205" in the Reactome file Reactions2GoTerms_human.txt file for versions 82,83 and 84. I didn't find it in any of the versions of the file. I believe this is the file you are talking about when you say Reactions only file. Maybe I am not understanding what you said in you said above. If there is an issue with pathways IDs being in Reaction-only files could you specify one or more exact pathway IDs and specify the exact file you are working with?

adamjohnwright commented 1 year ago

@cmungall I don't see the file reactome2go in our download directory. Is it possible that you have a pipeline on the GO end to generate the file?

kltm commented 1 year ago

@adamjohnwright @deustp01 Just to clarify this so it doesn't get lost. The reactome2go file (e.g. http://current.geneontology.org/ontology/external2go/reactome2go) is a product of the ontology and has its origin here: https://github.com/geneontology/go-ontology/blob/master/src/ontology/external2go/reactome2go . It looks like this file is not being maintained? Tagging @ukemi

adamjohnwright commented 1 year ago

@kltm thanks for pointing this out. It is now on my radar. I am busy with Reactome sab next week but after that I will try to look into what is needing to be updated.

kltm commented 1 year ago

(@adamjohnwright noting that I'm @kltm not @ktlm (edited above))

kltm commented 1 year ago

@adamjohnwright No worries--I just wanted to make sure that this didn't get lost in the shuffle!

kltm commented 1 year ago

@deustp01 Could we add this for discussion to the next pathways2go call?

deustp01 commented 1 year ago

@kltm @adamjohnwright @dustine32 @ukemi I'm away and will miss pathways2GO weeds calls until May 15, but I'm not needed for this discussion, I think, so what's the first Monday at 11 AM PDT / 2 PM EDT when everyone else is available? Is there anyone else who should be asked to join?

ukemi commented 1 year ago

@kltm I thought that the reactome2go file is generated from the xrefs in the ontology, but those xrefs were maintained from the Reactome mapping file. IIRC, we decided to do this rather than maintain the ontology xrefs manually. @balhoff is this correct? Are the Reactome xrefs in the ontology populated from the Reactome file as part of the ontology build? If so, then we need to correct the issue in the Reactome file->build the ontology with the new xrefs->build the reactome2GO file from the ontology. But we need to be careful, I suspect that we put the limits on the Reactome-side mappings that @adamjohnwright describes for a reason. If the pipeline I describe above is correct, we should take a close look at all the steps used to generate the Reactome mappings and determine why restrictions might have been put in place.

kltm commented 1 year ago

@ukemi You are correct: the reactome2go file is derived from the ontology directly (my commit above removes the accidentally added files to the repo). From talking with @cmungall and poking around in the Makefile, there is mechanism as you outlined, but it might be good to revisit it a little for the sake of clarity and refreshing the tooling.