Closed lpalbou closed 3 years ago
Depending on exactly what you have in mind, this may fall more into future stats stuff for @pgaudet ? To minimize future regret though (and because I'm curious), would you mind filling in a few more details? Assuming that you've created the legacy archives like release.geneontology.org , would your stat scripts be able to operate on them, or is there something particularly convenient about having them locally available? Maybe you're thinking more about playing the SVN history forward and getting the "full" operational history that would be complementary to the current GH file history in go-ontology (which I think starts being "main" sometime in 2017? I'm not sure about that.).
Off the top of my head, I can't think of any tickets that address this, but I believe certain forms of history were played into the current GH system at least at the ticket level: https://github.com/geneontology/go-ontology/issues?page=476&q=is%3Aissue+author%3Agocentral+is%3Aclosed https://github.com/geneontology/go-ontology/issues/71 . It might take a little poking around to remember how we reached those decisions.
@kltm that's correct, go_ontology_changes.py can run out-of-the-box on all those legacy releases to produce those files:
This would produce precise monthly reports for the last 8 years on how each term changed. In essence, this could be ingested downstream in GOLr or equivalent to produce something like the quickgo "change log" : https://www.ebi.ac.uk/QuickGO/term/GO:0005634
Note on annotations: we can't directly compute stats for the annotations as that part requires GOLr. But depending on how much efforts we want to put into improving the pipeline and if we want to help the pipeline to fail fast when it has to, I could rewrite the stats to work directly from the GAFs. Two benefits:
Hmmm...that is food for thought. It sounds like at least the last part might be a chunk of work, thought I have been jealous of that QuickGO feature for some time... Anyways, this is more likely to end up in @pgaudet 's wheelhouse--thanks for the explanation.
I would of course love to have the history, both as stats and in AmiGO !
So the first step that will be quick is just to compute the ontology-diffs over all the obo that we have.
The second step to make it accessible in AmiGO and have something equivalent to a quickgo "change log" will be a little longer (maybe next phase of development ?) and will require to combine those diffs over time to recreate the history per term and store that history per term in GOLr (ontology document, probably a field call history to stay simple).
This can still be done in a later phase when we want the go term history, closing for now.
There are probably other tickets about that subject, however the reconstruction of this archive gives a direct access to go.obo starting from 2012. Running the ontology-diffs stats on all go.obo over that period would take about a few hours and could be used in end user UIs to clearly show the history of any term.