geneontology / pipeline

Declarative pipeline for the Gene Ontology.
https://build.geneontology.org/job/geneontology/job/pipeline/
BSD 3-Clause "New" or "Revised" License
5 stars 5 forks source link

Add documentation for MGI upstreams pipeline #326

Closed kltm closed 2 months ago

kltm commented 1 year ago

We want to document the process of taking files from the upstream and producing what MGI will have previously generated. This is currently documented in great detail here: https://drive.google.com/drive/folders/17O5e3gj_fkbSv2vscEYNIzpCNLIq3fG2 (which is active notes and technical docs for this project).

However, for the long term, we'd like to encode this information in GORULEs and the like.

An initial step could be documenting this under a new GORULE that is taken care of by the new software (and can be added to the current outputs). This can be made more granular as time goes on.

Tagging @sierra-moxon @ukemi @pgaudet

kltm commented 1 year ago

@sierra-moxon We can talk about this elsewhere, but there is no need for GORULEs to be part of ontobio, we would just need documentation at https://github.com/geneontology/go-site/tree/master/metadata/rules Additionally, we could add stanzas to the goa.yaml metadata to indicate where data is coming from. @pgaudet There can also be more user/external-facing documentation.

kltm commented 1 year ago

For the GO_REF angle, there is also geneontology/go-site#2019

kltm commented 4 months ago

@pgaudet I think everything we need fir the software side is now part of the GORULEs framework. Is there anything you'd like to have documented before we close this out?

pgaudet commented 4 months ago

Hi @kltm Yes, @LiNiMGI and I would like to look at this. Specifically:

Thanks, Pascale

sierra-moxon commented 3 months ago

I tried to capture the "preprocess steps" in this diagram -- feel free to use or edit or throw away (it did not take me a long time). The "green" is the go preprocess pipeline actions. diagram to edit here. If this seems useful, we can extend it past the "GAF 2.2 output" in the pink circle to capture what the main GO Pipeline is doing to transform this file (with the PAINT and noctua annotations) to GPAD 2.0 at the end.

Image

pgaudet commented 3 months ago

Hi Sierra,

This looks great!

A couple of questions:

sierra-moxon commented 3 months ago

Second iteration (many more details) including arrows to indicate how "control" moves between the gopreprocess Makefile and the silver-issue-325-gopreprocess pipeline branch. Also included is the location of the necessary files that are used to generate the MGI GAF now, and the timing in the pipeline where it writes intermediate files back to skyhook.

White = files downloaded Green = silver-issue-325-gopreprocess pipeline Pink = Makefile in gopreprocess code repo Orange = skyhook Blue = gopreprocess logic

to edit this diagram: https://app.diagrams.net/?mode=google#G1wSPfwmIL6e58LJwbHlFS21COLr2aC1u6#%7B%22pageId%22%3A%22b7a7eaba-c6c5-6fbe-34ae-1d3a4219ac39%22%7D

Image

sierra-moxon commented 3 months ago

@pgaudet - I have to go to HGNC in order to use the Alliance orthology file as Alliance uses HGNC ids.

pgaudet commented 3 months ago

OK, got it ! thanks :)

kltm commented 2 months ago

@pgaudet If anything more needs to be done here, please re-open.