Open kltm opened 1 year ago
Letting @pgaudet and @ukemi know that this is seeded with likely personnel.
Possible order of operations
TODO: add clarification for orthology source and how to process down to positive/negative list
Process documentation folder: https://drive.google.com/drive/folders/17O5e3gj_fkbSv2vscEYNIzpCNLIq3fG2
QC rounds folder with @ukemi and @sierra-moxon https://drive.google.com/drive/folders/1q_KNRV9iwCndS_tWlvYyx_hddUFfqiDJ
QC rounds show that the Rat ISO load is done. @sierra-moxon will begin working on the human ISO annotations and @ukemi will begin QC on those. Once GPAD specs have been finalized, the GOC will begin providing test files for Lori to load into MGI.
New repository for this project at https://github.com/geneontology/gopreprocess
Noting for @pgaudet that we have hit a couple of slowdown points WRT needing to update some core software to support recent tooling (basically we need to start updating from some very old python versions). This will likely result in a small overhead increase for the project and draw in myself and @dustine32 for some tasks.
Note that the rat and human ISO parts of the pipeline are close to completion and we have begun working on mouse annotations from Protein2GO. There is a rate limitation for the completion of this project that is tied to the GOC-wide conversion to the GPAD2.0 format and the generation of the GPAD2.0 files. There are also some issues to be discussed at the GOC-level:
@pgaudet I had a long conversation with @sierra-moxon and have a feel for the position of the work. Basically, in a perfect world, it may be that all direct (i.e. ontobio) software work is done and all that's left is checking, making a GPAD/GPI 2.0 announcement, and running it into through the main pipeline. That said, this needs to be confirmed and running this through a pipeline that is a decent simulation of the final work is running into most of the same problems we run into when trying to do release pipeline stuff. To push through this, I'll be prioritizing pushing this through by whatever methods I can to land it on a "close enough" version of the final product so that we can do any final debugging and confirm the output with MGI. Once MGI has given that confirmation, it will be on us to make the final timeline and do the technical stuff. I've assigned myself https://github.com/geneontology/pipeline/issues/325.
It may be that the GPAD/GPI production would be better off as a separate project.
Talking to @pgaudet and @suzialeksander , next concrete steps are
snapshot
and release
)(Note, if more work is needed on the MGI/QC side, we are likely to proceed with adding the code sooner anyways.)
@pgaudet Some of the development team took at look at the output from the test pipeline and there are some issues with the data that we want to pin down before passing the results on to MGI--mainly an increase in annotation in one file that we're having a little trouble tracing. This will mean 1) re-running some of the data (about half a day lag, assuming the pipelines are cooperative) and tweaking/checking a GPAD reprocessing step. We will be meeting again mid-week to see where we're at.
Also tagging @sierra-moxon and @dustine32
@pgaudet Changing PO to Li/Pascale
@pgaudet Talking to @sierra-moxon , the remainder of items in https://github.com/orgs/geneontology/projects/136 are MGI bookkeeping items , with all GO-driven items now moved or being re-created for https://github.com/orgs/geneontology/projects/155.
But these are still open items in our tracker. I think one way forward, to prevent confusion, would be to rename the project and project metadata to make clear that this is now an "MGI sub-project" and move it into the external collab category (i.e. no more "GO" resources, beyond communication, unless something bad happens).
@kltm Is https://github.com/geneontology/go-site/issues/2043 a MGI task?
@pgaudet Assuming no answer is needed as is now closed.
Project link
https://github.com/orgs/geneontology/projects/136
Project description
Currently, the GOC picks up MGI ortholog and upstream annotation data from MGI. The completion of this project would be that GOC directly pulls in this data, processes it, and adds it to the current data flow. This would remove MGI from the loop of directly processing MGI/mouse function data.
PI
Chris
Product owner (PO)
Li/Pascale
Technical lead (TL)
Sierra
Other personnel (OP)
Seth, Dustin, Anushya
Technical specs
While there is new software being written for this project, it is either 1) within the bounds of current technologies and practices or is 2) custom and one-off, not to be reused elsewhere. The needs of the project are described in great detail in the folders listed below; minimally meeting these requirements and rendering them into a pipeline is the entire scope of the project.
Other comments
This is a continuation of:
https://github.com/geneontology/project-management/issues/42 https://github.com/orgs/geneontology/projects/109