Open amgmdz opened 2 years ago
@amgmdz Thanks for your feedback! We will investigate and update as appropriate.
Hi @MarileeTurscak-MSFT / @Karishma-Tiwari-MSFT , Any update on this issue?
Hello! @kohithms I have applied a temporary solution that solves the problem. In the agent base image I have installed the openshift cli and added the following lines at the end of the start.sh script. I used the token of the default service account.
oc login --token=$TOKEN --server=$APIAKS && kubectl delete pod $HOSTNAME -n $NAMESPACE
This at the end of each pipeline deletes the pod and recreates a new one.
@amgmdz That seems like quite a neat solution that could work for me too, as I'm facing similar issues.
I have one question - Why did you use the openshift cli rather than az cli? As I understand, you are using ADO agents hosted on AKS clusters
We're running Linux-based (Ubuntu) ADO agents hosted on AKS clusters. The agents were created by following steps in this doc: https://docs.microsoft.com/en-us/azure/devops/pipelines/agents/docker?view=azure-devops#create-and-build-the-dockerfile-1
Our agent pools have entrypoint "start.sh" script is modified for the agent to run once to ensure that agents are refreshed after every run.:
./run-docker.sh interactive --once && config.sh remove & wait $! cleanup
When the agents finish the job and exit, they exit on code '0', but regardless, Kubernetes treats this as a "Crash" and applies an exponential Back Off policy on restarts.
I understand this is a K8S problem that may be beyond ADO's control, because there is no way to tune the BackOff / RestartPolicy. But we were wondering if there could be any workarounds for this issue.
This problem arises because Azure Devops cannot communicate with private AKS and other private resources. This forces the use of self-hosted agents.
Document Details
⚠ Do not edit this section. It is required for docs.microsoft.com ➟ GitHub issue linking.