Azure / AKS

Azure Kubernetes Service
https://azure.github.io/AKS/
1.97k stars 307 forks source link

Unable to check ACR status in AzureUSGovernment #3201

Open CrispySipp opened 2 years ago

CrispySipp commented 2 years ago

From the Azure CLI, we are unable to check on the status of an ACR integration using the az aks check-acr utility in the AzureUSGovernment environment because the ACR FQDN is incorrectly appended with azurecr.io when calling the --acr switch, as in the following command:

az aks check-acr --acr exampleAcrName.azurecr.us --name exampleAksName -g exampleAksRG

results in:

Checking host name resolution (exampleAcrName.azurecr.us.azurecr.io): FAILED
Failed to resolve specified fqdn exampleAcrName.azurecr.us.azurecr.io: lookup exampleAcrName.azurecr.us.azurecr.io on 10.0.X.X: no such host

az aks check-acr --acr exampleAcrName --name exampleAksName -g exampleAksRG

results in:

Checking host name resolution (exampleAcrName.azurecr.io): FAILED
Failed to resolve specified fqdn exampleAcrName.azurecr.io: lookup exampleAcrName.azurecr.io on 10.0.X.X: no such host

No matter the value entered for the --acr switch, azurecr.io is appended to the FQDN and causes resolution failure as a result.

Steps to reproduce the behavior:

  1. Login to Azure US Government subscription: az login
  2. Set environment: az cloud set --name AzureUSGovernment
  3. Set subscription: az account set --subscription <govSubID>
  4. Login to AKS: az aks get-credentials --n exampleAksName -g exampleAksRG
  5. Check ACR status: az aks check-acr --acr exampleAcrName.azurecr.us --name exampleAksName -g exampleAksRG

Expected behavior A valid status of the ACR integration with the AKS cluster

Screenshots Not possible in our government environment

Environment (please complete the following information):

Additional context Applicable only to AzureUSGovernment, at least as it pertains to our current scope

carvido1 commented 2 years ago

Hello @CrispySipp

The past 30th Aug Azure experienced a problem with a concrete Ubuntu version related to DNS resolution.

I raised a case in Azure Support and the solution recommended was to upgrade the node image or restart the nodes.

Have you tried this approach?

On our case a simple restart of nodes (in an ordered way) fixed the problem.

BR.

CrispySipp commented 2 years ago

Hello @CrispySipp

The past 30th Aug Azure experienced a problem with a concrete Ubuntu version related to DNS resolution.

I raised a case in Azure Support and the solution recommended was to upgrade the node image or restart the nodes.

Have you tried this approach?

On our case a simple restart of nodes (in an ordered way) fixed the problem.

BR.

Carvido-

Thank you for this suggestion, will test this out and follow up!

ghost commented 2 years ago

Action required from @Azure/aks-pm

CrispySipp commented 2 years ago

This bug is persisting. Anyone from Microsoft interested in suggestions?

CrispySipp commented 2 years ago

Still an issue after upgrading AKS control plane/nodepools to 1.23.8 and the Ubuntu versions to latest 20.04 stable

ghost commented 1 year ago

Issue needing attention of @Azure/aks-leads

ghost commented 1 year ago

Issue needing attention of @Azure/aks-leads

CrispySipp commented 1 year ago

still a problem

carvido1 commented 1 year ago

Hello @CrispySipp .

I see that the IP address that gets resolved for the ACR FQDN is a private IP address. Do you have any idea if there is a private endpoint configured on the ACR ? You can check this under the Container registry -> Settings -> Networking blade, and then on the Private access tab.

BR

CrispySipp commented 1 year ago

@car

Hello @CrispySipp .

I see that the IP address that gets resolved for the ACR FQDN is a private IP address. Do you have any idea if there is a private endpoint configured on the ACR ? You can check this under the Container registry -> Settings -> Networking blade, and then on the Private access tab.

BR

BR-

Yes we do have a private endpoint configured.

-Chris

carvido1 commented 1 year ago

Hello @CrispySipp

Going back to the logs you provided, it seems that this can be a DNS resolution problem.

Failed to resolve specified fqdn exampleAcrName.azurecr.us.azurecr.io: lookup exampleAcrName.azurecr.us.azurecr.io on 10.0.X.X: no such host

Do you have a private DNS zone for the Private endpoint that was created for that Azure Container registry ? In addition to this check, does this private DNS zone is linked to the same VNET that the AKS cluster ?

This are two conditions to check, we require a private DNS zone to resolve the Azure Container Registry inside of Azure or a manual DNS record entry (maybe with coreDNS). If the private DNS zone was created, it is required to be linked to the same VNET as the cluster is deployed, otherwise it won't be used to resolve the ACR.

Best regards.

ghost commented 1 year ago

Issue needing attention of @Azure/aks-leads

CrispySipp commented 1 year ago

What it appears we have is a private endpoint with two custom DNS settings within that configuration. It does not appear to have its own private DNS zone linked to the VNet, however. What I did do was manually add a record to the hosts file of the resource I am using to make the check on the ACR and the FQDN was still (incorrectly) appended with azurecr.io

carvido1 commented 1 year ago

Hello @CrispySipp .

When you have a Private endpoint created for a resource in Azure and this resource has a public FQDN, internally Azure DNS resolves the public FQDN to the private IP address that was created with the Private endpoint (this only happens in Azure, this behaviour won't happen locally as you don't use Azure DNS servers). To make use of the Private endpoint, the NIC that gets created has to be in a subnet reachable from the AKS cluster you run and to get the DNS resolution working you need to create a Private DNS zone linked to the VNET from that subnet. If you don't create a private DNS zone, you will need to add a mechanism to resolve the ACR FQDN and point it to the Private Endpoint private IP address manually (remember to choose a static IP address for the private endpoint).

BR

ghost commented 1 year ago

Issue needing attention of @Azure/aks-leads

ghost commented 1 year ago

Issue needing attention of @Azure/aks-leads