Closed dustine32 closed 1 year ago
These generic ECs would then map to multiple GO terms.
We should be assigning these mappings categories - exact, broad, narrow. This will allow us to reliably use these for pathways2go. We can prioritize the ones needed here.
@dustine32 it looks like you made a commit after writing this ticket - what's the status of implementing the steps above
also can you give a list of all ECs used so we can review the mappings
@cmungall Yep, I committed the latest batch of YeastPathways models to get them into noctua-dev. These were generated using the code on branch sgd-to-ec-lookup. I'll make a pathways2go PR for this real soon.
@cmungall Also, yes, I can setup some logging out of both used and "attempted" (encountered but tossed out due to multiple or not-MF) ECs and send you the list.
@thomaspd found a bug in the https://github.com/geneontology/pathways2GO/commit/be6dccb0e8c78dd78a222e23682ee446c4c877c2 implementation with GO-CAM model YEAST-RIBOSYN-PWY currently in noctua-dev. The MF for RIB2 in the GO-CAM model is tRNA pseudouridine synthase, but it should be DRAP deaminase. This is due to a bug in the SGD EC mapping file lookup code, which is currently interpreting 1:n SGD-to-EC mappings as 1:1, only returning the last mapping. I need to change the code to bail out of the SGD-to-EC lookup when there are multiple. This fix will actually result in the RIB2 MF becoming Molecular Event, not DRAP deaminase, because there's no EC number for this reaction RXN3O-164 in the BioPAX, which would have been found earlier by precedence rule 1.
In this example, RIB2 YeastCyc:YOL066C is assigned to both 3.5.4.26
and 5.4.99.28
.
Closing since the fix has been merged for a while and running in the latest batch of YeastPathways models.
Wow, just realized there was no ticket for this! So far, communication has mainly been emails, zoom conversations, and comments on other GH tickets. Start of relevant discussion in https://github.com/geneontology/go-ontology/issues/20091#issuecomment-808936628. More details can be found in that ticket but we can use this new one to further discuss as the new YeastPathways GO-CAMs get tested.
So, a while back, we discovered some YeastPathways GO-CAM models had multiple function terms annotated to an OWL activity individual. The example here is from salvage pathways of pyrimidine deoxyribonucleotides: The instances where this occurred were due to a couple reasons:
2.7.1.145
) but rather the more generic "3-digit" (e.g.2.7.1.-
). These generic ECs would then map to multiple GO terms.To mitigate these situations, @thomaspd proposed a precedence/fallback method for selecting the best, single function term for a reaction activity.
The reaction-to-term precedence rules:
Step 2 above incorporates an SGD GP ID -> EC lookup file provided (and suggested) by @suzialeksander.