Azure Devops Run-Once Self-hosted Agents in AKS - CrashLoopBackOff

amgmdz commented 2 years ago

We're running Linux-based (Ubuntu) ADO agents hosted on AKS clusters. The agents were created by following steps in this doc: https://docs.microsoft.com/en-us/azure/devops/pipelines/agents/docker?view=azure-devops#create-and-build-the-dockerfile-1

Our agent pools have entrypoint "start.sh" script is modified for the agent to run once to ensure that agents are refreshed after every run.:

./run-docker.sh interactive --once && config.sh remove & wait $! cleanup

When the agents finish the job and exit, they exit on code '0', but regardless, Kubernetes treats this as a "Crash" and applies an exponential Back Off policy on restarts.

I understand this is a K8S problem that may be beyond ADO's control, because there is no way to tune the BackOff / RestartPolicy. But we were wondering if there could be any workarounds for this issue.

This problem arises because Azure Devops cannot communicate with private AKS and other private resources. This forces the use of self-hosted agents.

Document Details

⚠ Do not edit this section. It is required for docs.microsoft.com ➟ GitHub issue linking.

ID: b31d4f6c-d16d-2ba1-1294-6aa94e328046
Version Independent ID: 88aec599-4410-697d-1002-1c41b1c17bb6
Content: Run a self-hosted agent in Docker - Azure Pipelines
Content Source: docs/pipelines/agents/docker.md
Product: devops
Technology: devops-cicd-agents
GitHub Login: @steved0x
Microsoft Alias: sdanie

MarileeTurscak-MSFT commented 2 years ago

@amgmdz Thanks for your feedback! We will investigate and update as appropriate.

kohithms commented 1 year ago

Hi @MarileeTurscak-MSFT / @Karishma-Tiwari-MSFT , Any update on this issue?

amgmdz commented 1 year ago

Hello! @kohithms I have applied a temporary solution that solves the problem. In the agent base image I have installed the openshift cli and added the following lines at the end of the start.sh script. I used the token of the default service account.

oc login --token=$TOKEN --server=$APIAKS && kubectl delete pod $HOSTNAME -n $NAMESPACE

This at the end of each pipeline deletes the pod and recreates a new one.

ccoder83 commented 1 year ago

@amgmdz That seems like quite a neat solution that could work for me too, as I'm facing similar issues.

I have one question - Why did you use the openshift cli rather than az cli? As I understand, you are using ADO agents hosted on AKS clusters

MicrosoftDocs / azure-docs

Azure Devops Run-Once Self-hosted Agents in AKS - CrashLoopBackOff #95961

Document Details