Azure / AKS-Edge

Welcome to the Azure Kubernetes Service (AKS) Edge repo.
MIT License
53 stars 34 forks source link

[BUG] Unable to enable Azure Arc Monitoring due to image pull issues from aksiotdevacr.azurecr.io ACR #156

Closed gshiva closed 6 months ago

gshiva commented 8 months ago

Describe the bug I enabled Azure Arc Monitoring integration via the Azure Portal. I see hundreds of pods in various error states. I recreated a pod definition and launched it to debug the issue. It is failing because it is unable to pull the image from aksiotdevacr.azurecr.io.

To Reproduce Steps to reproduce the behavior:

  1. Download test-aks-pull.json
  2. Run kubectl apply -f .\test-aks-pull.json
  3. Run kubectl get events
  4. See error
3s          Warning   FailedCreatePodSandBox   pod/resource-sync-agent-test-aks-pull   Failed to create pod sandbox: rpc error: code = Unknown desc = failed to get sandbox image "aksiotdevacr.azurecr.io/pause:3.9": failed to pull image "aksiotdevacr.azurecr.io/pause:3.9": failed to pull and unpack image "aksiotdevacr.azurecr.io/pause:3.9": failed to resolve reference "aksiotdevacr.azurecr.io/pause:3.9": failed to authorize: failed to fetch anonymous token: unexpected status: 401 Unauthorized

Expected behavior The pod should be launched without any error.

Environment (please complete the following information):

Additional context I see the same error in multiple azure-arc pods. pause pod is just an example. The machine is behind a corp proxy. Not sure if that is an issue.

PS C:\Aks_Edge_Essentials> Test-AksEdgeArcConnection
[11/27/2023 17:56:23] Exception Caught!!!

 - Could not run kubectl on Windows node - node may not be reachable or cluster may be in bad state. Error was: ssh  failed to execute [Error from server (Forbidden): namespaces is forbidden: User "system:node:win-cvqbhhj1265-wedge" cannot list resource "namespaces" in API group "" at the cluster scope] (AksEdge.psm1: line 9019)
False
gshiva commented 7 months ago

I recreated the cluster and it is working now. One difference is that I configured monitoring as soon as I created the cluster and did not wait for days to configure the monitoring. Another change I made is to give the Linux Node 8GB memory instead of the default 4GB. Not sure if those were the issue.

I will close this in a week if there is no response from the MS team.

Vicent8899 commented 6 months ago

Issues with pulling the images from aksiotdevacr.azurecr.io could indicate problems. Specially if the linux nodes resources are thin.

We believe it can be reproduced by allocating the default memory and storage space (4GB/10GB respectively) to the Linux node and then turning on Azure Arc monitoring. We can see that the disk space used is 18GB and memory used is 5.4GB.

Suggestions:

Once the resource constraints are removed, we dont see much of error messages for pulling images from aksiotdevacr.azurecr.io.

rcheeran commented 6 months ago

Thanks for the update. Yes, if you need to use Arc and other Arc-extensions, you need a minimum of 8GB. See this

We are working on reducing this footprint.