bl1231 / bilbomd-worker

Processes BilboMD jobs and run CHARMM, FoXS, and MultiFoXS
1 stars 0 forks source link

Investigate using webhooks to trigger Docker build on Perlmutter #352

Closed dsclassen closed 4 months ago

dsclassen commented 4 months ago

We use a Perlmutter-specific Docker image to run the bulk of the BilboMD pipeline on Perlmutter. This image is called bilbomd-perlmutter-worker and must be built on a login node and then "migrated" so that it is available on the compute nodes. I want to create this issue to remind myself that this could possibly be implemented with a webhook that makes a call to a backend service running on SPIN which in turn would make use of the Superfacility API to trigger a build script.

  1. GitHub Action makes API POST to SPIN service
  2. SPIN service uses SF-API /utilities/command/{machine} to trigger a bash script to run Docker build/migrate... etc.
dsclassen commented 4 months ago

@shrprabh Can you think of any other ways to trigger the docker build step for bilbomd-perlmutter-worker on a Perlmutter login node during the GitHub actions? or maybe a webhook?

shrprabh commented 4 months ago

We can give a try this approach.

dsclassen commented 4 months ago

I am making progress on this, but running into an issue with a long-running docker build script timing out:

{
    "id": "618465",
    "status": "failed",
    "result": "\"error: Command '['/usr/bin/ssh', '-q', '-i', '/tmp/sclassen-1722621784-8026', '-oUserKnownHostsFile=/dev/null', '-oStrictHostKeyChecking=no', '-o', 'preferredauthentications=publickey', 'sclassen@perlmutter.nersc.gov', 'bash -c \\\"ENVIRONMENT=development /global/cfs/cdirs/m4659/bilbomd/dev/scripts/build-perlmutter-worker.sh 40782b19-5d9a-4212-ba05-6bed431b88c7 | tee /global/homes/s/sclassen/script-logs/build-perlmutter-worker.sh-2024-08-02T18:03:04.201Z.log 2>&1\\\" &']' timed out after 600 seconds\""
}