Closed dsclassen closed 4 months ago
@shrprabh Can you think of any other ways to trigger the docker build step for bilbomd-perlmutter-worker
on a Perlmutter login node during the GitHub actions? or maybe a webhook?
We can give a try this approach.
I am making progress on this, but running into an issue with a long-running docker build script timing out:
{
"id": "618465",
"status": "failed",
"result": "\"error: Command '['/usr/bin/ssh', '-q', '-i', '/tmp/sclassen-1722621784-8026', '-oUserKnownHostsFile=/dev/null', '-oStrictHostKeyChecking=no', '-o', 'preferredauthentications=publickey', 'sclassen@perlmutter.nersc.gov', 'bash -c \\\"ENVIRONMENT=development /global/cfs/cdirs/m4659/bilbomd/dev/scripts/build-perlmutter-worker.sh 40782b19-5d9a-4212-ba05-6bed431b88c7 | tee /global/homes/s/sclassen/script-logs/build-perlmutter-worker.sh-2024-08-02T18:03:04.201Z.log 2>&1\\\" &']' timed out after 600 seconds\""
}
We use a Perlmutter-specific Docker image to run the bulk of the BilboMD pipeline on Perlmutter. This image is called
bilbomd-perlmutter-worker
and must be built on a login node and then "migrated" so that it is available on the compute nodes. I want to create this issue to remind myself that this could possibly be implemented with a webhook that makes a call to a backend service running on SPIN which in turn would make use of the Superfacility API to trigger a build script.utilities/command/{machine}
to trigger a bash script to run Docker build/migrate... etc.