Closed scbedd closed 1 year ago
Talked with @weshaggard a bit today. The way that our aks works is there are virtual machine scale sets
that are automatically managed by the aks cluster. We opened those up, then checked under identity
.
The core of the issue for PPE
is that the agent pools had lost the agentpool
identity that was supposed to be assigned to them. Due to this, the pools couldn't talk to the ACR while spinning up. Re-adding, then wait for a bit of a crash loop got everything working again.
The resource in question
The symptom that an identity or associated credential has expired is that the cluster can't pull images to spin new pods.
az aks
extension to youraz
instance, then set context to the subscription containing the ppe k8s instance.az aks get-credentials --resource-group openapi-ppe --name liveValidatePPE
to enablekubectl
commandsaz aks show --resource-group openapi-ppe --name liveValidatePPE --query servicePrincipalProfile.clientId -o tsv
msi
, which indicates this is a managed service identityNeed to investigate why we can't pull when none of the credentials have expired.