geneontology / pipeline

Declarative pipeline for the Gene Ontology.
https://build.geneontology.org/job/geneontology/job/pipeline/
BSD 3-Clause "New" or "Revised" License
5 stars 5 forks source link

Explore getting the pipeline completing again by adjusting settings and runtime parameters #349

Open kltm opened 6 months ago

kltm commented 6 months ago

We are currently having issues with regularly and quickly getting full-data runs out of the pipeline. This is seriously affecting snapshot and release.

As a bandaid to more long-term solutions (like pipeline refactoring and hardware purchasing), we're going to briefly experiment with limiting pipeline bandwidth (number of "workers") and increasing runtime resources for various parts.

This is a partial response to https://github.com/geneontology/pipeline/issues/316

Sending notice to @mugitty @sierra-moxon @dustine32

Tagging @pgaudet

kltm commented 6 months ago

On a console review, I'm noticing a lot of late errors like:

    03:27:00  + rsync -avz -e ssh -o StrictHostKeyChecking=no -o IdentitiesOnly=true -o IdentityFile=**** /opt/go-site/pipeline/target/blazegraph-production.jnl.gz skyhook@skyhook.berkeleybop.org:/home/skyhook/snapshot/products/blazegraph/
    03:27:00  sending incremental file list
    03:27:15  blazegraph-production.jnl.gz
    03:27:15  deflate on token returned 0 (21379 bytes left)
    03:27:15  rsync error: error in rsync protocol data stream (code 12) at token.c(481) [sender=3.2.7]

I'm going to try scp instead of rsync here for a bit.

(Noting that the internet says things like "need rsync on target", "need full path on target", and "need full path on ssh bin"; none really explain why it is intermittent.)

kltm commented 6 months ago

Shockingly got a pass here--not sure if changes or lucky. Try again with same set on release.

kltm commented 6 months ago

The stop issue seems to be continuing. Reduced executors to 5.