Open cmungall opened 8 years ago
+1
A little bit confused: even if we had the links in go-basic, we would still need to materialise the inferences in the GAF, wouldn't we? Tagging @tonysawfordebi for when he's back from vacation.
A little bit confused: even if we had the links in go-basic, we would still need to materialise the inferences in the GAF, wouldn't we?
Why? Queries for a BP will yield annotations to MFs that are part-of that BP
I guess I'm thinking about the QuickGO/AmiGO display, where you want to see all the annotations including those inferred by GOC. I guess they could be retrieved at run time via query, but wouldn't it be easier to just materialize them when the files are loaded?
I guess I'm thinking about the QuickGO/AmiGO display, where you want to see all the annotations including those inferred by GOC. I guess they could be retrieved at run time via query, but wouldn't it be easier to just materialize them when the files are loaded?
But the inferences are trivial, you get them anyway in quickgo/amigo. The only reason they were ever materialized was to support legacy software that could not handle inter-ontology links
Hi Chris,
If you assume that the only use case for GO is via a query tool, this is fine. However, I think our uses will expect to see BP annotations on the gene pages, not just MF annotation.
If I have understood you correctly, if there is a F-P link between, for a trivial example, say: a fission yeast cohesin fission subunit might have the annotation GO:0061775 cohesin ATPase activity which *should (but currently doesn't) have F-P link to "cohesion"
Say we were using this term to annotate the MF of a cohesion subunit, which had no existing BP annotation to cohesion. On a MOD gene page, our users would also expect to see an explicit annotation to the process (we all arrange our pages by F-P- and C which makes sense to us, and to our users)
Imagine if this page http://www.pombase.org/spombe/result/SPBC29A10.04 had an annotation to
GO:0061775 cohesin ATPase activity but no cohesion annotation in the BP section
Don't you think that would be odd?
I guess what I am trying to say is that it doesn't matter what a query would return, we still need to display the BP annotation. If it doesn't come from the pipeline, we would need to make it explicitly (I always do because we cant guarantee that the F-P link will always be present, or possible).
VAl
OK, good point. There are a number of tools or interfaces that stratify the annotations into 3 sections: F, P and C (and some enrichment tools may perform 3 batches of tests).
I agree it would look odd and misleading. So we are back to some kind of materialization of inferences.
I would argue for doing this as far downstream as possible, and letting the interface handle in. This is the most powerful and flexible. For example, the interface may also choose to stratify cellular processes (as PAINT does). If we do things upstream prior to publishing the GAFs then we set in stone the 3 levels.
However, I also appreciate that this is hard, and for some tools it is out of our hands (although we should be focusing on tools developed by the consortium, e.g. quickgo, amigo, pombase, ...).
Wherever it happens, be it at display time or as a pre-publishing GAF inference, we need to agree upon whether this is the same annotation, just slimmed up the graph (thus preserving evidence codes etc as the GOC pipeline currently does) or a de-novo annotation with a new evidence code (as the EBI pipeline currently does it).
Okay, so following up with Val's post on #1395 and this ticket, shall we slot some time on an annotation call to confirm what evidence code we prefer to use for these inferred annotations?
+1
oh yes, i think discuss on an annotation call would be very helpful to all interested parties.
I would actually argue in favour of having the inference happen centrally, instead of letting the tools handling it. Very much like we distribute inferred versions of the ontology, we should distribute inferred versions of the annotation sets for consistency.
I thought about the evidence codes, and I can see the argument for keeping the existing one, as after all the inter-ontology links are logically implicit. However, doing so is maybe obfuscating the real process: in reality, those annotations were not made manually by curators, they were derived electronically by a reasoning tool. Given this, I like having a specific evidence code reflecting this pipeline, as this allows the greatest flexibility from the user point of view. They get the full provenance information and can process it as desired, whereas they can't disambiguate if the same evidence code is kept.
Could we discuss both (where should the inference happen, what evidence code should be kept) at an annotation call?
@cmungall What is the action here ? Isn't this a pipeline issue ?
Thanks, Pascale
@cmungall @vanaukenk is this still a to-do or a to-discuss? Or has this since been addressed through pipeline code updates (https://github.com/geneontology/go-ontology/issues/19954)
Currently go-basic.obo is designed for lowest common denominator GO tools. Some of these hardcode an assumption that there are no crossings between the 3 ontologies.
As a courtesy, we omit these links from go-basic and provide GAFs with materialized inferences equivalent to what we would get if the links are present.
It is time to stop this. We should add back links, specifically:
And stop the IOL pipeline