MicrosoftDocs / azure-docs

Open source documentation of Microsoft Azure
https://docs.microsoft.com/azure
Creative Commons Attribution 4.0 International
10.29k stars 21.48k forks source link

Name resolution fails in Windows containers #32209

Closed artisticcheese closed 5 years ago

artisticcheese commented 5 years ago

Repro steps (including mandatory 30s wait)

New-AzContainerGroup -ResourceGroupName ACI -Name mycontainer -Image mcr.microsoft.com/windows/servercore:ltsc2019 -OsType Windows -DnsNameLabel aci-dns -Command "powershell -Command"Start-Sleep 30; Resolve-DNSNAme www.google.com""

Result

PS Azure:\> Get-AzContainerInstanceLog -ResourceGroupName ACI -Name mycontainer -ContainerGroupName mycontainerResolve-DNSNAme : www.google.com : This operation returned because the timeout
period expired
At line:1 char:18
+ Start-Sleep 30;  Resolve-DNSNAme www.google.com
+                  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : OperationTimeout: (www.google.com:String) [Resol
   ve-DnsName], Win32Exception
    + FullyQualifiedErrorId : ERROR_TIMEOUT,Microsoft.DnsClient.Commands.Resol
   veDnsName```

Document Details

Do not edit this section. It is required for docs.microsoft.com ➟ GitHub issue linking.

jakaruna-MSFT commented 5 years ago

@artisticcheese Similar issue in MSDN https://social.msdn.microsoft.com/Forums/en-US/6acd118f-7491-4b65-81d6-371f612088e8/problem-accessing-standard-table-from-a-windows-nanoserver-azure-container-instance?forum=AzureContainerServices

Can you try the below command and check if its able to resolve well on your container. Resolve-DnsName -Name www.bing.com -Type A -Server 8.8.8.8

artisticcheese commented 5 years ago

Yes, explicitly specifying resolver IP address works right away. New-AzContainerGroup -ResourceGroupName ACI -Name mycontainer -Image mcr.microsoft.com/windows/servercore:ltsc2019 -OsType Windows -DnsNameLabel aci-dns -Command "powershell -Command Resolve-DnsName -Name www.bing.com-Type A -Server 8.8.8.8"

PS Azure:\> Get-AzContainerInstanceLog -ResourceGroupName ACI -Name mycontainer -ContainerGroupName mycontainer
Name                           Type   TTL   Section    NameHost
----                           ----   ---   -------    --------
www.bing.com                   CNAME  28    Answer     a-0001.a-afdentry.net.tr
                                                       afficmanager.net
a-0001.a-afdentry.net.trafficm CNAME  28    Answer     a-0001.a-msedge.net
anager.net

Name       : a-0001.a-msedge.net
QueryType  : A
TTL        : 47
Section    : Answer
IP4Address : 204.79.197.200

Name       : a-0001.a-msedge.net
QueryType  : A
TTL        : 47
Section    : Answer
IP4Address : 13.107.21.200
jakaruna-MSFT commented 5 years ago

ok. As specified in the related msdn thread, Its reported to the Product Team. Current work around is to use this image mcr.microsoft.com/windows/servercore:1607

jawiz commented 5 years ago

We are seeing the same issue with servercore on AKS:

Image: 4.7.2-windowsservercore-ltsc2019

No default DNS server is set:

PS C:> nslookup DNS request timed out. timeout was 2 seconds. Default Server: UnKnown Address: 10.0.0.10

Can be reproduced by running an interactive pod using this image by running this command on a AKS cluster with a windows node:

kubectl run netruntime-472 -it powershell --image=mcr.microsoft.com/dotnet/framework/runtime:4.7.2 --restart=Never --overrides='{ "apiVersion": "v1", "spec": { "template": { "spec": { "nodeSelector": { "beta.kubernetes.io/os": "windows" }, "tolerations": [ { "effect": "NoSchedule", "key": "os", "operator": "Equal", "value": "Win2019" }, { "effect": "NoExecute", "key": "node.kubernetes.io/not-ready", "operator": "Exists", "tolerationSeconds": 300 }, { "effect": "NoExecute", "key": "node.kubernetes.io/unreachable", "operator": "Exists", "tolerationSeconds": 300 } ] } } } }'

PS C:> nslookup DNS request timed out. timeout was 2 seconds. Default Server: UnKnown Address: 10.0.0.10

google.com Server: UnKnown Address: 10.0.0.10

DNS request timed out. timeout was 2 seconds. DNS request timed out. timeout was 2 seconds. DNS request timed out. timeout was 2 seconds. DNS request timed out. timeout was 2 seconds. *** Request to UnKnown timed-out

jakaruna-MSFT commented 5 years ago

This is a known issue for the image mcr.microsoft.com/windows/servercore/iis:windowsservercore-ltsc2019 in ACI. All windows 2019 images may have this issue. This issue happens because the DNS server is not set properly.

Current workaround is to run the below commands on the container startup to set the dns server or use the windows 2016 based images( tags 1607, ltsc2016 etc)

$nic = Get-NetAdapter
Set-DnsClientServerAddress -InterfaceIndex $nic.IfIndex -ServerAddresses ('8.8.8.8')
artisticcheese commented 5 years ago

Is it fixed in 1903?

brobichaud commented 5 years ago

Is it fixed in 1903?

Personally I doubt this is an 1809 thing, it's much more likely to be related to the Azure networking integration with k8s. I have an 1809 cluster built with AKS-Engine that does not exhibit this problem at all. However my AKS cluster does.

That said, this is a huge issue that needs to be resolved before anyone is going to be happy with Windows on AKS.

artisticcheese commented 5 years ago

if it would be AKS issue I imagined it should have been broken in earlier images as well. So far it looks to be inconclusive which part is failing since evidence seems to be suggesting it's 1809 + managed AKS only so far.

brobichaud commented 5 years ago

Yeah but remember there are many moving parts in k8s. So this preview of Windows support in AKS uses a specific point-in-time of the various k8s components and it's very possible there is a significant bug in the networking stack of the k8s plug-ins that has nothing to do with 1809 or any other image of Windows. That's where my bet would go. The positive thing if this were true is that it should be much easier to get a fix in place than if it were a raw Windows problem (where fixes take muuuuch longer).

jakaruna-MSFT commented 5 years ago

@brobichaud DNS server mapping to the container in ACI is not working(or wrong ip mapping) for the latest images. Team is informed about this BUG. I will post the updated here.

mimckitt commented 5 years ago

We are tracking this bug here: https://github.com/Azure/AKS/issues/1029

Please follow that issue for any updates around this bug.