CDCgov / cfa-epinow2-pipeline

https://cdcgov.github.io/cfa-epinow2-pipeline/
Apache License 2.0
10 stars 2 forks source link

Delete pools and images prior to recreating instead of failing #60

Closed zsusswein closed 1 month ago

zsusswein commented 1 month ago

54 sets up a naming scheme that tags images and pools with the PR name (if a PR) and latest (if main). This strategy fails if the pool already exists.

image

There are a couple of options here:

cc: @gvegayon

zsusswein commented 1 month ago

From @jkislin:

Amit said it makes more sense to have a static name (i.e. cfa-epinow2-pool) and just refresh the pool each run. We can talk on this soon. I told him we're working on cleaning up the old pool with a job, which he suggested is ok... but that just having one name for the pool is probably standard/best practice.

I think this is option (1) above? But not totally sure.

gvegayon commented 1 month ago

My suggestion (following @zsusswein) is to check if the pool exists (one pool per branch). If the pool exists, use something like this to update the container:

az batch pool update \
    --pool-id <your_pool_id> \
    --virtual-machine-configuration image-reference.virtual-machine-image-id <new_container_image_url>

Cleaning the pool should happen either manually or once the branch is deleted.

natemcintosh commented 1 month ago

az batch pool update

I like this idea, but I can't find such a command in the docs.

I think that my preferred method at this point is option 2. I would see if it currently exists with az batch pool list, delete it if it does exist with az batch pool delete, then re-create it with az batch pool create.

I know this could take a while if there are nodes, but if it's running in CI, then I don't think it's that much of an issue. I know from personal experience that this process can be pretty quick (<10s) assuming there are no nodes currently in the pool.

gvegayon commented 1 month ago

You probably didn't find it, @natemcintosh, because AI is hallucinating xD. I'll see if there's anything real about it; if so, we can use it! If not, I agree with you that we need to use those three sets of commands. To avoid breaking things, I would add that we should include a label on the commit message to actively (continuously) delete the pool (of the sort of [skip ci]).

gvegayon commented 1 month ago

PR is now working here: