CDCgov / cfa-epinow2-pipeline

https://cdcgov.github.io/cfa-epinow2-pipeline/
Apache License 2.0
10 stars 2 forks source link

Link built pool to image in ACR #59

Open zsusswein opened 1 month ago

zsusswein commented 1 month ago

43 and #54 create the infrastructure to build and push the pipeline to ACR in an image. It also creates a pool with the same tag. But these images and pools aren't actually linked as far as we can tell. We need some additional formatting to link the pools to the image.

As part of this issue, it would also be good to bring over any missing settings from the existing approach.

cc: @gvegayon please add any missing color here!

Possible solution steps (initial draft by @gvegayon)

jkislin commented 1 month ago

@zsusswein what exactly does 'linking' entail? In other words, what functionality do you want? Is tagging the pools and images with the same hash or id enough, or am I missing something else?

Sorry if this is a really facile question!

zsusswein commented 1 month ago

This is a great q.

We need the nodes in the Batch pool to run our code. George's and your prior work takes the code here, builds the package in a Docker image, and stores the image in ACR.

But I don't think the Batch pools created here specify running that particular Docker image in ACR. They specify:

https://github.com/CDCgov/cfa-epinow2-pipeline/blob/edb4fbd7f405eadaba77f34f9df0dfbecee877fd/.github/workflows/1_pre-Test-Model-Image-Build.yaml#L131

We add some additional keys in our current config to specify the container configuration:

https://github.com/cdcent/cfa-nnh-pipelines/blob/33b4a55daba3479cddc85fe10dac4732b2f2c91b/NHSN/Rt/run_azure_batch/create_expt_pool.py#L68-L78

I believe we need to do something similar here.

jkislin commented 1 month ago

Got it!

@gvegayon , if you have the cycles (let me know if not), what we essentially need to do here is create a new job within the deployment workflow to submit Azure Batch Jobs and Tasks. The current process is in the cfa-nnh-pipelines, and the syntax is quite nested, quite convoluted, and in Python. What we need instead is a series of shell-based az batch <> commands to replace these leviathans, no python necessary.

  1. Take a look at the SOP Patrick and Kingsley use - this gives you a sense of the current order of operations for both setting up pools (something we've already done here, but worth looking to see if we missed any juicy config deets) and submitting jobs:

  2. Perform some 'code archaeology' in the old cfa-nnh-pipelines repo. The shell scripts that do what we need to replicate are here:

Some other notes:

zsusswein commented 1 month ago

@gvegayon ping me if helpful to talk through scope here. If this issue seems like it's getting unwieldy, let's split it into chunks of work.

zsusswein commented 1 month ago

Finally found some docs explaining what we want. I don't have time to finish them now, but dumping here to come back for another read.

gvegayon commented 1 month ago

Finally found some docs explaining what we want. I don't have time to finish them now, but dumping here to come back for another read.

A couple of other references:

gvegayon commented 1 month ago

So, we have reached a roadblock: The creation of batch pools using the --template argument retired this year in September (here). In particular, the az cli extension that gave that capability. Looking around, I believe the best would be to use the Python script that exists for this project (here). We should probably have a chat about this, @zsusswein, @jkislin, @natemcintosh, and @dylanhmorris.

natemcintosh commented 4 weeks ago

To clarify, the issue here is in attaching an ACR image to a pool at some arbitrary time, that is not necessarily pool build time?

gvegayon commented 4 weeks ago

To clarify, the issue here is in attaching an ACR image to a pool at some arbitrary time, that is not necessarily pool build time?

I was thinking during build time. But that's the image itself. I am unsure when the image is actually downloaded.