Closed kltm closed 2 months ago
From @sierra-moxon
this is the current "upstream" for MGI: http://skyhook.berkeleybop.org/silver-issue-325-gopreprocess/products/upstream_and_raw_data/preprocess_raw_files/mgi-merged.gaf
go-site metadata updated in mgi.yaml.
I made a new branch off of the silver-issue-325-gopreprocess
pipeline branch called: p2go-homology-upstream-file-generator
. This new branch adds a step to include two new subdirectories and a copy of the final GAF file from the upstreams code base to s3://go-mirror/
:
p2go-homology-upstream-file-generator/preprocess_raw_files/
p2go-homology-upstream-file-generator/preprocessed_GAF_output/
s3://go-mirror/mgi-p2go-homology.gaf.gz
is added/overwritten on every successful run of this pipeline branch. This is the MGI upstream now. Seth already changed the go-site metadata to reflect this new name/path. These capture the incremental output of the upstreams code as well as the final GAF file. Each command in the new pipeline branch overwrites the last run's files in the paths above. I looked a tiny bit into versioning; @kltm - do we need to keep versions of this file or the pipeline outputs?
I pushed this branch, and it will try to run on the next repository scan.
@sierra-moxon A quick note that we need the compressed version of the file.
fixed to use .gz version of the file.
@sierra-moxon Sorry to ask, but I don't think the current production metadata points to this yet? Perhaps we should at an item to the top, just so this can be tracked?
Or maybe that's https://github.com/geneontology/go-site/issues/2285 ...in which case I'll put things back the way you had them :)
yes, that one https://github.com/geneontology/go-site/issues/2285 should be the one we use to merge metadata changes in, I have the MGI metadata changes in this branch (where we point to the mirror version of the gopreprocess MGI gaf file, etc). This branch also has a lot of hacking in it to make my pipeline go fast. So I will cherry pick changes into a new branch for merge into master/main.
Currently, the build process depends on quirks of skyhook. To make this generally usable, we want to upload the MGI upstream file we produce to a stable location (mirror.geneontology.io), with the new filename, and point metadata to it.
From original https://github.com/geneontology/gopreprocess/issues/65
mgi-p2go-homology.gaf
go look at the go-copy-to-mirror pipeline branch for finding the S3 bucket.