PrefectHQ / prefect

Prefect is a workflow orchestration framework for building resilient data pipelines in Python.
https://prefect.io
Apache License 2.0
17.34k stars 1.64k forks source link

Docker agent on Azure cannot pull images from ACR on a VM with a system-managed identity #5467

Closed marvin-robot closed 2 years ago

marvin-robot commented 2 years ago

Opened from the Prefect Public Slack Community

bogdan.bliznyuk: Hi all! We have an issue with the prefect docker agent. We're using Azure and ACR with a system-managed identity assigned to vm. In order to login to acr we run every 3 hrs, there's a systemd timer for that:

az acr login

But, it seems that prefect docker agent only reads the token during start and stores it in memory. Unless we restart the prefect docker agent, it is unable to pull docker image flows after 3hrs (the acr token's expired)

anna: Are you running the Docker agent and your flow registration process on the same Azure VM (asking in case you are using Docker storage)? Can you share your flow storage and run config?

Based on <https://docs.microsoft.com/en-us/azure/container-registry/container-registry-authentication-managed-identity|this docs>, managed identity assigned to this VM should solve the issue. If not, something didn't go well with assigning those permissions.

To figure out which process is at fault, can you try pulling your ACR image from that VM without login via CLI (az acr login)? If the identity is set, you shouldn't need that extra CLI process and you should be able to do:

docker pull yourCustomACRimage

bogdan.bliznyuk: yes, we're running the agent on the same VM and we've verified that manual docker pull works

it seems that prefect's using the python DockerApi client and it doesn't refresh the in-memory credentials

bogdan.bliznyuk: e.g. it seems that prefect docker agent only reads the token during the startup process.

if we restart the docker agent - it's able to pull images

anna: I'm no Azure expert, but when you use managed identity (aka IAM role?), you shouldn't need to authenticate with a token every 3 hours.

Do you happen to have Azure support?

bogdan.bliznyuk: yes, it's exactly like IAM role attached to EC2

and it has the 3hours lifetime indeed. we refresh it every 3 hours with a separate script. so you're able to do docker pull without any authentication on each VM

but prefect agent doesn't pick up the refreshed token

bogdan.bliznyuk: you can probably reproduce it with any environment (it's not specific to azure):

  1. start prefect docker agent
  2. login to private registry after the agent's started
  3. try to run the flow that's using private registry as storage
  4. it should fail

anna: If this would work like IAM role attached to EC2, it wouldn't matter when you started the agent - the permission is set for the machine.

So you shouldn't have to refresh the token every 3 hours. If the permissions are set properly for the VM, the IAM role should be all you need. If you look at https://docs.microsoft.com/en-us/azure/container-registry/container-registry-best-practices#authentication-and-authorization|this, you are currently doing option 1 for individual entity, while it seems for the IAM-way you should do the service principal option. Can you try <https://docs.microsoft.com/en-us/azure/container-registry/container-registry-auth-service-principal|this tutorial>?

I'll probably open an issue and see if some Azure pro can chime in and help.

anna: <@ULVA73B9P> open "Docker agent on Azure cannot pull images from ACR on a VM with a system-managed identity"

bogdan.bliznyuk: *after 3 hours

Original thread can be found here.

anna-geller commented 2 years ago

Update:

The Azure marketplace Docker agent is not yet updated but for now, we’ve published a full walkthrough on how to set up a new VM on Azure and spin up a Docker agent in a robust way, including:

Let us know if something is unclear or doesn’t work for you: https://discourse.prefect.io/t/how-to-spin-up-a-docker-agent-on-azure-vm-a-full-walkthrough/407

anna-geller commented 2 years ago

The user confirmed the solution linked above is working for them. Closing the issue.

Here is a copy:

In order to generate long-lived permissions for your Docker agent, you will need to create a service principal, which is Azure’s fancy way of saying: a username and password.

Follow the instructions from this documentation page to create the credentials.

Modify the bash script below to include your ACR registry name (replace my prefectcommunity registry name with your ACR registry name). I recommend keeping the acrpush permission scope so that you can also register your flows and push new images to ACR using the same set of credentials, but feel free to configure it depending on your needs.

Then, run those commands in your terminal - all that it’s really doing is generating the long-lived username and password, storing those as environment variables, and printing those to the console:

#!/bin/bash
# This script requires Azure CLI version 2.25.0 or later. Check version with `az --version`.

# Modify for your environment.
# ACR_NAME: The name of your Azure Container Registry
# SERVICE_PRINCIPAL_NAME: Must be unique within your AD tenant
ACR_NAME=prefectcommunity
SERVICE_PRINCIPAL_NAME=acr-service-principal-prefect-docker-agent-demo

# Obtain the full registry ID for subsequent command args
ACR_REGISTRY_ID=$(az acr show --name $ACR_NAME --query "id" --output tsv)

# Create the service principal with rights scoped to the registry.
# Default permissions are for docker pull access. Modify the '--role'
# argument value as desired:
# acrpull:     pull only
# acrpush:     push and pull
# owner:       push, pull, and assign roles
PASSWORD=$(az ad sp create-for-rbac --name $SERVICE_PRINCIPAL_NAME --scopes $ACR_REGISTRY_ID --role acrpush --query "password" --output tsv)
USER_NAME=$(az ad sp list --display-name $SERVICE_PRINCIPAL_NAME --query "[].appId" --output tsv)

# Output the service principal's credentials; use these in your services and
# applications to authenticate to the container registry.
echo "Service principal ID: $USER_NAME"
echo "Service principal password: $PASSWORD"

This will print the username and password.

Then, use those long-lived Docker credentials to sign in to ACR so that your Docker agent will be able to push and pull Docker images from ACR:

docker login prefectcommunity.azurecr.io -u $USER_NAME -p $PASSWORD

Replace prefectcommunity with your ACR registry name.