Open zsusswein opened 1 month ago
@zsusswein what exactly does 'linking' entail? In other words, what functionality do you want? Is tagging the pools and images with the same hash or id enough, or am I missing something else?
Sorry if this is a really facile question!
This is a great q.
We need the nodes in the Batch pool to run our code. George's and your prior work takes the code here, builds the package in a Docker image, and stores the image in ACR.
But I don't think the Batch pools created here specify running that particular Docker image in ACR. They specify:
We add some additional keys in our current config to specify the container configuration:
I believe we need to do something similar here.
Got it!
@gvegayon , if you have the cycles (let me know if not), what we essentially need to do here is create a new job within the deployment workflow to submit Azure Batch Jobs and Tasks. The current process is in the cfa-nnh-pipelines, and the syntax is quite nested, quite convoluted, and in Python. What we need instead is a series of shell-based az batch <>
commands to replace these leviathans, no python necessary.
Take a look at the SOP Patrick and Kingsley use - this gives you a sense of the current order of operations for both setting up pools (something we've already done here, but worth looking to see if we missed any juicy config deets) and submitting jobs:
Perform some 'code archaeology' in the old cfa-nnh-pipelines
repo. The shell scripts that do what we need to replicate are here:
Some other notes:
@gvegayon ping me if helpful to talk through scope here. If this issue seems like it's getting unwieldy, let's split it into chunks of work.
Finally found some docs explaining what we want. I don't have time to finish them now, but dumping here to come back for another read.
Finally found some docs explaining what we want. I don't have time to finish them now, but dumping here to come back for another read.
A couple of other references:
containerConfiguration
class https://learn.microsoft.com/en-us/dotnet/api/microsoft.azure.batch.containerconfiguration?view=azure-dotnetSo, we have reached a roadblock: The creation of batch pools using the --template
argument retired this year in September (here). In particular, the az cli extension
that gave that capability. Looking around, I believe the best would be to use the Python script that exists for this project (here). We should probably have a chat about this, @zsusswein, @jkislin, @natemcintosh, and @dylanhmorris.
To clarify, the issue here is in attaching an ACR image to a pool at some arbitrary time, that is not necessarily pool build time?
To clarify, the issue here is in attaching an ACR image to a pool at some arbitrary time, that is not necessarily pool build time?
I was thinking during build time. But that's the image itself. I am unsure when the image is actually downloaded.
43 and #54 create the infrastructure to build and push the pipeline to ACR in an image. It also creates a pool with the same tag. But these images and pools aren't actually linked as far as we can tell. We need some additional formatting to link the pools to the image.
As part of this issue, it would also be good to bring over any missing settings from the existing approach.
cc: @gvegayon please add any missing color here!
Possible solution steps (initial draft by @gvegayon)
az batch pool
is instantiated using--template
, so configuration is passed via a JSON file.containerConfiguration
viadeploymentConfiguration
to the argument--parameters
.