Closed kltm closed 2 years ago
Possible dupe with https://github.com/geneontology/pipeline/issues/240?
Depending on our tests, we may forward dupe this with https://github.com/geneontology/go-site/issues/1847#issuecomment-1116743052
Hi @kltm
Can you list the problematic ECOs?
Thanks, Pascale
We'll be able to get exhaustive listings (on the other ticket, https://github.com/geneontology/go-site/issues/1847) once we have the fixes from @dustine32 run.
@pgaudet @kltm Grepping noctua_mgi-report.md from the 2022-05-08 snapshot
run, these are the ECO codes that currently do not directly map to a GAF evidence code (according to gaf-eco-mapping.txt):
ECO:0000164
ECO:0000257
ECO:0000279
ECO:0000282
ECO:0001120
ECO:0001152
ECO:0001171
ECO:0001179
ECO:0001183
ECO:0001198
ECO:0001220
ECO:0001225
ECO:0001230
ECO:0001232
ECO:0001242
ECO:0001255
ECO:0005579
ECO:0005581
ECO:0005589
ECO:0005592
ECO:0005593
ECO:0005601
ECO:0005640
ECO:0005644
ECO:0005648
ECO:0005661
ECO:0005801
ECO:0005804
ECO:0005805
ECO:0006003
ECO:0006004
ECO:0006005
ECO:0006006
ECO:0006013
ECO:0006030
ECO:0006039
ECO:0006042
ECO:0006051
ECO:0006052
ECO:0006054
ECO:0006062
ECO:0006063
ECO:0006064
ECO:0006065
ECO:0006067
ECO:0006091
ECO:0007016
ECO:0007054
ECO:0007089
ECO:0007707
However, most of these do indirectly map "up" to a GAF evidence code via gaf-eco-mapping-derived.txt with the following exceptions: ECO:0000164 ECO:0001120 ECO:0005593 ECO:0006065
We could consider using this gaf-eco-mapping-derived.txt
in ontobio to fetch GAF evidence codes for most of these.
Okay, so at least the first concrete step would be to get those four exceptions added to the mapping file by submitting an item to https://github.com/evidenceontology/evidenceontology/issues (or otherwise create guidance or a rule to purge these items).
The second step I think is to remember why we don't already use the derived file. We had no small amount of conversation on and around Nov 11th (DMing with @dustine32 and @sierra-moxon on Slack), but final notes are a bit lacking. Reading through that again, it seems that 1) the loss of the GO_REFs and 2) that it is maybe not automatically generated were considered problems. Do we not already have the ECO closure available in ontobio?
In the SynGO data, these evidence codes should be replaced so that they map up to IDA:
@kltm ontobio
currently just accesses the gaf-eco-mapping.txt as a lookup file as the way to handle ECO. We could re-plumb this EcoMap
class to handle eco.owl like a real ontology, which would include the GO_REFs (stored as termoboInOwl:hasDbXref
) for the full closure set. Downside is it'll likely increase compute time for stuff in ontobio
that uses ECO, but maybe not by that much.
@dustine32 Hm. I briefly touched bases with Chris and he also remembers that we had something a little off with the derived file, but doesn't remember the details either.
If replumbing EcoMap
is "easy", that is a good way forward. That said, I'm a little worried that it might take a bit of time to get it right and we're hoping to get a release out soon.
Another way forward would be to use the derived mapping as a fallback to the non-derived file (or as a replacement for this one use). IIRC, we switched away from using the derived file at some point (no GO_REFs possibly, which we needed?), but I haven't been able to find the documentation for why.
I'm not sure which way would be more expedient, but I think we're leaning towards expedience now and having a ticket opened to give a better or more thorough long-term solution. What are your thoughts on this?
(or as a replacement for this one use)
Yeah, I pretty much already coded this yesterday in order to get those four ECOs not in the derived file. Just added a derived
flag like this:
ecomap.ecoclass_to_coderef(str(assoc.evidence.type), derived=True)
This change would be incorporated into the gpadparser.py
changes made in https://github.com/biolink/ontobio/pull/621.
To repair the downstream GAF lines I can use this derived=True
as a fallback in the GafWriter
. So, any ECO->GAF mapping already in the non-derived file will be sourced the same (GO_REFs returned) and the derived file will only be used/checked if no mapping found.
I can open a ticket in ontobio
for this.
@kltm Also, I'll make a separate ticket for the longer-term solution (possibly "ontologizing" ECO in EcoMap
).
@dustine32 That sounds like a good plan--thank you!
This should test on the snapshot
starting tonight.
Looking at products now; see https://github.com/geneontology/go-site/issues/1847#issuecomment-1126429739, this specific issue may be fixed.
@pgaudet We may have a fix in place for everything except
ECO:0000164
ECO:0001120
ECO:0005593
ECO:0006065
These four may be fixed with https://github.com/geneontology/syngo-go-cams/pull/1, but they are still being tested on noctua-dev
(i.e. https://github.com/geneontology/syngo2lego_data_conversion/issues/5)
So my question is whether to
@kltm Do you know how many annotations we'd loose by doing the release now, compared to the previous release? or were these already being excluded?
Thanks, Pascale
@pgaudet Honestly, it's probably easiest to just try a release
run and check the numbers. I'll go ahead and get that started.
We have actually gained 278 SynGO annotations, so it looks like this is resolved?
Noticed when looking at Xenbase test run data, when converting emitting GAF (from Noctua model sources), some ECO codes do not seem to map up correctly. We'd like to track down the source of this issue (i.e. either GPAD -> GAF conversion issue or direct GAF emission) and fix it.
Given what this is, in either base it seems likely to be related to ontobio (so looping in @dustine32 as well).
Initially reported by @malcolmfisher103
Tagging @vanaukenk to provide some more detail.