Open CrispySipp opened 2 years ago
Hello @CrispySipp
The past 30th Aug Azure experienced a problem with a concrete Ubuntu version related to DNS resolution.
I raised a case in Azure Support and the solution recommended was to upgrade the node image or restart the nodes.
Have you tried this approach?
On our case a simple restart of nodes (in an ordered way) fixed the problem.
BR.
Hello @CrispySipp
The past 30th Aug Azure experienced a problem with a concrete Ubuntu version related to DNS resolution.
I raised a case in Azure Support and the solution recommended was to upgrade the node image or restart the nodes.
Have you tried this approach?
On our case a simple restart of nodes (in an ordered way) fixed the problem.
BR.
Carvido-
Thank you for this suggestion, will test this out and follow up!
Action required from @Azure/aks-pm
This bug is persisting. Anyone from Microsoft interested in suggestions?
Still an issue after upgrading AKS control plane/nodepools to 1.23.8 and the Ubuntu versions to latest 20.04 stable
Issue needing attention of @Azure/aks-leads
Issue needing attention of @Azure/aks-leads
still a problem
Hello @CrispySipp .
I see that the IP address that gets resolved for the ACR FQDN is a private IP address. Do you have any idea if there is a private endpoint configured on the ACR ? You can check this under the Container registry -> Settings -> Networking blade, and then on the Private access tab.
BR
@car
Hello @CrispySipp .
I see that the IP address that gets resolved for the ACR FQDN is a private IP address. Do you have any idea if there is a private endpoint configured on the ACR ? You can check this under the Container registry -> Settings -> Networking blade, and then on the Private access tab.
BR
BR-
Yes we do have a private endpoint configured.
-Chris
Hello @CrispySipp
Going back to the logs you provided, it seems that this can be a DNS resolution problem.
Failed to resolve specified fqdn exampleAcrName.azurecr.us.azurecr.io: lookup exampleAcrName.azurecr.us.azurecr.io on 10.0.X.X: no such host
Do you have a private DNS zone for the Private endpoint that was created for that Azure Container registry ? In addition to this check, does this private DNS zone is linked to the same VNET that the AKS cluster ?
This are two conditions to check, we require a private DNS zone to resolve the Azure Container Registry inside of Azure or a manual DNS record entry (maybe with coreDNS). If the private DNS zone was created, it is required to be linked to the same VNET as the cluster is deployed, otherwise it won't be used to resolve the ACR.
Best regards.
Issue needing attention of @Azure/aks-leads
What it appears we have is a private endpoint with two custom DNS settings within that configuration. It does not appear to have its own private DNS zone linked to the VNet, however. What I did do was manually add a record to the hosts file of the resource I am using to make the check on the ACR and the FQDN was still (incorrectly) appended with azurecr.io
Hello @CrispySipp .
When you have a Private endpoint created for a resource in Azure and this resource has a public FQDN, internally Azure DNS resolves the public FQDN to the private IP address that was created with the Private endpoint (this only happens in Azure, this behaviour won't happen locally as you don't use Azure DNS servers). To make use of the Private endpoint, the NIC that gets created has to be in a subnet reachable from the AKS cluster you run and to get the DNS resolution working you need to create a Private DNS zone linked to the VNET from that subnet. If you don't create a private DNS zone, you will need to add a mechanism to resolve the ACR FQDN and point it to the Private Endpoint private IP address manually (remember to choose a static IP address for the private endpoint).
BR
Issue needing attention of @Azure/aks-leads
Issue needing attention of @Azure/aks-leads
From the Azure CLI, we are unable to check on the status of an ACR integration using the
az aks check-acr
utility in the AzureUSGovernment environment because the ACR FQDN is incorrectly appended withazurecr.io
when calling the--acr
switch, as in the following command:az aks check-acr --acr exampleAcrName.azurecr.us --name exampleAksName -g exampleAksRG
results in:
az aks check-acr --acr exampleAcrName --name exampleAksName -g exampleAksRG
results in:
No matter the value entered for the
--acr
switch,azurecr.io
is appended to the FQDN and causes resolution failure as a result.Steps to reproduce the behavior:
az login
az cloud set --name AzureUSGovernment
az account set --subscription <govSubID>
az aks get-credentials --n exampleAksName -g exampleAksRG
az aks check-acr --acr exampleAcrName.azurecr.us --name exampleAksName -g exampleAksRG
Expected behavior A valid status of the ACR integration with the AKS cluster
Screenshots Not possible in our government environment
Environment (please complete the following information):
Additional context Applicable only to AzureUSGovernment, at least as it pertains to our current scope