geneontology / pipeline

Declarative pipeline for the Gene Ontology.
https://build.geneontology.org/job/geneontology/job/pipeline/
BSD 3-Clause "New" or "Revised" License
5 stars 5 forks source link

Upload MGI upstream / "silver" to mirror.geneontology.io, with new filename, and point metadata to it #369

Closed kltm closed 2 months ago

kltm commented 3 months ago

Currently, the build process depends on quirks of skyhook. To make this generally usable, we want to upload the MGI upstream file we produce to a stable location (mirror.geneontology.io), with the new filename, and point metadata to it.

From original https://github.com/geneontology/gopreprocess/issues/65

go look at the go-copy-to-mirror pipeline branch for finding the S3 bucket.

kltm commented 2 months ago

From @sierra-moxon

this is the current "upstream" for MGI: http://skyhook.berkeleybop.org/silver-issue-325-gopreprocess/products/upstream_and_raw_data/preprocess_raw_files/mgi-merged.gaf
kltm commented 2 months ago

Now available at: https://mirror.geneontology.io/mgi-p2go-homology.gaf https://mirror.geneontology.io/mgi-p2go-homology.gaf.gz

kltm commented 2 months ago

go-site metadata updated in mgi.yaml.

sierra-moxon commented 2 months ago

I made a new branch off of the silver-issue-325-gopreprocess pipeline branch called: p2go-homology-upstream-file-generator. This new branch adds a step to include two new subdirectories and a copy of the final GAF file from the upstreams code base to s3://go-mirror/:

These capture the incremental output of the upstreams code as well as the final GAF file. Each command in the new pipeline branch overwrites the last run's files in the paths above. I looked a tiny bit into versioning; @kltm - do we need to keep versions of this file or the pipeline outputs?

I pushed this branch, and it will try to run on the next repository scan.

kltm commented 2 months ago

@sierra-moxon A quick note that we need the compressed version of the file.

sierra-moxon commented 2 months ago

fixed to use .gz version of the file.

kltm commented 2 months ago

@sierra-moxon Sorry to ask, but I don't think the current production metadata points to this yet? Perhaps we should at an item to the top, just so this can be tracked?

kltm commented 2 months ago

Or maybe that's https://github.com/geneontology/go-site/issues/2285 ...in which case I'll put things back the way you had them :)

sierra-moxon commented 2 months ago

yes, that one https://github.com/geneontology/go-site/issues/2285 should be the one we use to merge metadata changes in, I have the MGI metadata changes in this branch (where we point to the mirror version of the gopreprocess MGI gaf file, etc). This branch also has a lot of hacking in it to make my pipeline go fast. So I will cherry pick changes into a new branch for merge into master/main.