geneontology / pathways2GO

Code for converting between BioPAX pathways and Gene Ontology Causal Activity Models (GO-CAM)
8 stars 0 forks source link

Determine single function term for YeastPathways reaction activities #141

Closed dustine32 closed 1 year ago

dustine32 commented 2 years ago

Wow, just realized there was no ticket for this! So far, communication has mainly been emails, zoom conversations, and comments on other GH tickets. Start of relevant discussion in https://github.com/geneontology/go-ontology/issues/20091#issuecomment-808936628. More details can be found in that ticket but we can use this new one to further discuss as the new YeastPathways GO-CAMs get tested.

So, a while back, we discovered some YeastPathways GO-CAM models had multiple function terms annotated to an OWL activity individual. The example here is from salvage pathways of pyrimidine deoxyribonucleotides: image The instances where this occurred were due to a couple reasons:

  1. The YeastPathways BioPAX source files occasionally mapped reactions to multiple EC numbers.
  2. Some EC numbers were not specific "4-digit" ECs (e.g. 2.7.1.145) but rather the more generic "3-digit" (e.g. 2.7.1.-). These generic ECs would then map to multiple GO terms.

To mitigate these situations, @thomaspd proposed a precedence/fallback method for selecting the best, single function term for a reaction activity.

The reaction-to-term precedence rules:

  1. Check if BioPAX file has the reaction mapped to single 4-digit EC
  2. If no, then lookup reaction controller GP -> EC
  3. If more than one EC for a GP, or if no reaction controller, then fall back on using Molecular Event for activity

Step 2 above incorporates an SGD GP ID -> EC lookup file provided (and suggested) by @suzialeksander.

cmungall commented 2 years ago

These generic ECs would then map to multiple GO terms.

We should be assigning these mappings categories - exact, broad, narrow. This will allow us to reliably use these for pathways2go. We can prioritize the ones needed here.

cmungall commented 2 years ago

@dustine32 it looks like you made a commit after writing this ticket - what's the status of implementing the steps above

also can you give a list of all ECs used so we can review the mappings

dustine32 commented 2 years ago

@cmungall Yep, I committed the latest batch of YeastPathways models to get them into noctua-dev. These were generated using the code on branch sgd-to-ec-lookup. I'll make a pathways2go PR for this real soon.

dustine32 commented 2 years ago

@cmungall Also, yes, I can setup some logging out of both used and "attempted" (encountered but tossed out due to multiple or not-MF) ECs and send you the list.

dustine32 commented 2 years ago

@thomaspd found a bug in the https://github.com/geneontology/pathways2GO/commit/be6dccb0e8c78dd78a222e23682ee446c4c877c2 implementation with GO-CAM model YEAST-RIBOSYN-PWY currently in noctua-dev. image The MF for RIB2 in the GO-CAM model is tRNA pseudouridine synthase, but it should be DRAP deaminase. This is due to a bug in the SGD EC mapping file lookup code, which is currently interpreting 1:n SGD-to-EC mappings as 1:1, only returning the last mapping. I need to change the code to bail out of the SGD-to-EC lookup when there are multiple. This fix will actually result in the RIB2 MF becoming Molecular Event, not DRAP deaminase, because there's no EC number for this reaction RXN3O-164 in the BioPAX, which would have been found earlier by precedence rule 1.

In this example, RIB2 YeastCyc:YOL066C is assigned to both 3.5.4.26 and 5.4.99.28.

dustine32 commented 1 year ago

Closing since the fix has been merged for a while and running in the latest batch of YeastPathways models.