Closed kltm closed 1 month ago
Mega-list cause some indigestion. Re-trying with unionized GAF at http://skyhook.berkeleybop.org/confinement-for-pipeline-388/union.gaf.gz .
Noting that generated compressed index is 25G, instead of 8.8G. Expanding, we also see 3x: 312G vs 101G. Generation time is 7.2h vs 5h (so scales nicely there).
From this, there are a some points:
@pgaudet I have created a bunch of new (more granular) tickets after this experiment. I will close this one out once the test Solr products are available somewhere.
@pgaudet Slow, and may timeout on any action, but we can _tentatively examine https://amigo-staging.geneontology.io/amigo/search/annotation
Noting: 15531916 annotations, about 2x over the last release.
Closing as described above.
Just a note here: as before, there is a "write.lock" file present in the created index that needs to be removed before the jetty solr instance can spin up. I'm unsure why it's there, but there is a possibility that there was an anomaly during the load and it remained, even though the rest of the process finished. @pgaudet If you notice anything odd or inconsistent, let me know.
We are now seeing a first draft of the separated QCed GAFs. We would like to test them through the "second stage" of the GO pipeline, producing derivatives, to see how it works through. This test setup is the experimentation for the final joint pipeline derivatives production and will be used to supply feedback and mock what a run would look like.
https://ftp.ebi.ac.uk/pub/contrib/goa/panther_proteomes/
Tagging @alexsign @pgaudet