geneontology / pipeline

Declarative pipeline for the Gene Ontology.
https://build.geneontology.org/job/geneontology/job/pipeline/
BSD 3-Clause "New" or "Revised" License
5 stars 5 forks source link

Test EBI-GOA pipeline products through derivatives generation at GO Central #388

Closed kltm closed 1 month ago

kltm commented 1 month ago

We are now seeing a first draft of the separated QCed GAFs. We would like to test them through the "second stage" of the GO pipeline, producing derivatives, to see how it works through. This test setup is the experimentation for the final joint pipeline derivatives production and will be used to supply feedback and mock what a run would look like.

https://ftp.ebi.ac.uk/pub/contrib/goa/panther_proteomes/

Tagging @alexsign @pgaudet

kltm commented 1 month ago

Mega-list cause some indigestion. Re-trying with unionized GAF at http://skyhook.berkeleybop.org/confinement-for-pipeline-388/union.gaf.gz .

kltm commented 1 month ago

Noting that generated compressed index is 25G, instead of 8.8G. Expanding, we also see 3x: 312G vs 101G. Generation time is 7.2h vs 5h (so scales nicely there).

From this, there are a some points:

kltm commented 1 month ago

@pgaudet I have created a bunch of new (more granular) tickets after this experiment. I will close this one out once the test Solr products are available somewhere.

kltm commented 1 month ago

@pgaudet Slow, and may timeout on any action, but we can _tentatively examine https://amigo-staging.geneontology.io/amigo/search/annotation

Noting: 15531916 annotations, about 2x over the last release.

Closing as described above.

kltm commented 1 day ago

Just a note here: as before, there is a "write.lock" file present in the created index that needs to be removed before the jetty solr instance can spin up. I'm unsure why it's there, but there is a possibility that there was an anomaly during the load and it remained, even though the rest of the process finished. @pgaudet If you notice anything odd or inconsistent, let me know.