Azure / acs-engine

WE HAVE MOVED: Please join us at Azure/aks-engine!
https://github.com/Azure/aks-engine
MIT License
1.03k stars 561 forks source link

The cluster-internal DNS server cannot be used from Windows containers #2027

Closed chweidling closed 5 years ago

chweidling commented 6 years ago

Is this a request for help?: NO


Is this an ISSUE or FEATURE REQUEST? (choose one): ISSUE


What version of acs-engine?: canary, GitCommit 8fd4ac4267c29370091d98d80c3046bed517dd8c


Orchestrator and version (e.g. Kubernetes, DC/OS, Swarm) kubernetes 1.8.6

What happened:

I deployed a simple cluster with one master node and two Windows nodes. In this deployment, requests to the cluster's own DNS server kubedns time out. Requests to DNS servers work.

Remark: This issue is somehow related to #558 and #1949. The related issues suggest that the DNS problems have a relation to the Windows dnscache service or to the custom VNET feature. But the following description points to a different direction.

What you expected to happen: Requests to the internal DNS server should not time out.

Steps to reproduce:

Deploy a simple kubernetes cluster with one master node and two Windows nodes with the following api model:

{
  "apiVersion": "vlabs",
  "properties": {
    "orchestratorProfile": {
      "orchestratorType": "Kubernetes",
      "kubernetesConfig": {
        "networkPolicy": "none"
      },
      "orchestratorRelease": "1.8"
    },
    "masterProfile": {
      "count": 1,
      "dnsPrefix": "---",
      "vmSize": "Standard_D4s_v3"
    },
    "agentPoolProfiles": [
      {
        "name": "backend",
        "count": 2,
        "osType": "Windows",
        "vmSize": "Standard_D4s_v3",
        "availabilityProfile": "AvailabilitySet"
      }      
    ],
    "windowsProfile": {
      "adminUsername": "---",
      "adminPassword": "---"
    },
    "linuxProfile": {
      "adminUsername": "weidling",
      "ssh": {
        "publicKeys": [
          {
            "keyData": "ssh-rsa ---"
          }
        ]
      }
    },
    "servicePrincipalProfile": {
      "clientId": "---",
      "secret": "---"
    }
  }
}

Then run a Windows container. I used the following command: kubectl run mycore --image microsoft/windowsservercore:1709 -it powershell

Then run the following nslookup session, where you try to resolve a DNS entry with the default (internal) DNS server and then with Google's DNS server:

PS C:\> nslookup
DNS request timed out.
    timeout was 2 seconds.
Default Server:  UnKnown
Address:  10.0.0.10

> github.com
Server:  UnKnown
Address:  10.0.0.10

DNS request timed out.
    timeout was 2 seconds. 
(repeats 3 more times)
*** Request to UnKnown timed-out

> server 8.8.8.8
DNS request timed out.
    timeout was 2 seconds.
Default Server:  [8.8.8.8]
Address:  8.8.8.8

> github.com
Server:  [8.8.8.8]
Address:  8.8.8.8

Non-authoritative answer:
Name:    github.com
Addresses:  192.30.253.113
          192.30.253.112

> exit

Anything else we need to know: As suggested in #558, the problem should vanish 15 minutes after a pod has started. In my deployment, the problem does not disapper even after one hour.

I observed the behavior independent from the values of the networkPolicy (none, azure) and orchestratorRelease (1.7, 1.8, 1.9) properties in the api model. With the model above, I get the following network configuration inside the Windows pod:

PS C:\> ipconfig /all

Windows IP Configuration

   Host Name . . . . . . . . . . . . : mycore-96fdd75dc-8g5kd
   Primary Dns Suffix  . . . . . . . :
   Node Type . . . . . . . . . . . . : Hybrid
   IP Routing Enabled. . . . . . . . : No
   WINS Proxy Enabled. . . . . . . . : No

Ethernet adapter vEthernet (9519cc22abb5ef39c786c5fbdce98c6a23be5ff1dced650ed9e338509db1eb35_l2bridge):

   Connection-specific DNS Suffix  . :
   Description . . . . . . . . . . . : Hyper-V Virtual Ethernet Adapter #3
   Physical Address. . . . . . . . . : 00-15-5D-87-0F-CC
   DHCP Enabled. . . . . . . . . . . : No
   Autoconfiguration Enabled . . . . : Yes
   Link-local IPv6 Address . . . . . : fe80::a58c:aaf:c12b:d82c%21(Preferred)
   IPv4 Address. . . . . . . . . . . : 10.244.2.92(Preferred)
   Subnet Mask . . . . . . . . . . . : 255.255.255.0
   Default Gateway . . . . . . . . . : 10.240.0.1
   DNS Servers . . . . . . . . . . . : 10.0.0.10
   NetBIOS over Tcpip. . . . . . . . : Disabled
jbiel commented 6 years ago

@jiangtianki thanks. We are not using acs-engine but encountered this bug nonetheless so we appreciate the upstream/public fixes.

4c74356b41 commented 6 years ago
I can successfully create a windows container using microsoft/windowsservercore:1709_KB4074588

hm, are there dotnet and aspnet images with that fix?

yuedai commented 6 years ago

@4c74356b41 maybe you can try "docker pull" to refresh the base image before building your windows container.

4c74356b41 commented 6 years ago

@yuedai wouldnt help unless they updated the images with this fix

4c74356b41 commented 6 years ago

@JiangtianLi is this working already? can we recreate the cluster? thanks!

4c74356b41 commented 6 years ago

@msorby where have you got that image from? are they being published somewhere, can you provide a link? thanks.

JiangtianLi commented 6 years ago

@4c74356b41 Feb Windows update/docker image is already out so it should fix DNS configure issue. I will update here after I confirm in a windows cluster from my side.

4c74356b41 commented 6 years ago

@JiangtianLi do you know if\when MS releases the new image for Windows hosts (in Azure)? I've checked today and the latest image for 1709 was in december.

msorby commented 6 years ago

@4c74356b41 using acs-engine 0.13.0, it has the hotfix for the windows host. Then I use this microsoft/windowsservercore:1709_KB4074588 docker image for my core container.

So no need to look for images in Azure, acs-eninge 0.13.0 patches the host.

patrick-motard commented 6 years ago

I also was able to get internal DNS working using acs-engine 0.13.0 and k8s 1.8.4. But I'm not able to get external DNS working -_-

JiangtianLi commented 6 years ago

@patrick-motard what DNS server is the container using? what is output of ipconfig /all inside container?

patrick-motard commented 6 years ago

@JiangtianLi

PS C:\> ipconfig /all

Windows IP Configuration

   Host Name . . . . . . . . . . . . : my-app-798c67b4db-gm2tn
   Primary Dns Suffix  . . . . . . . : 
   Node Type . . . . . . . . . . . . : Hybrid
   IP Routing Enabled. . . . . . . . : No
   WINS Proxy Enabled. . . . . . . . : No

Ethernet adapter vEthernet (5376f59639304101ffa730ac1d398c1b34f83c602036910eb1257c957800ab24_l2bridge):

   Connection-specific DNS Suffix  . : 
   Description . . . . . . . . . . . : Hyper-V Virtual Ethernet Adapter #4
   Physical Address. . . . . . . . . : 00-15-5D-06-98-42
   DHCP Enabled. . . . . . . . . . . : No
   Autoconfiguration Enabled . . . . : Yes
   Link-local IPv6 Address . . . . . : fe80::8510:ea5e:29f6:efe7%25(Preferred) 
   IPv4 Address. . . . . . . . . . . : 10.244.4.241(Preferred) 
   Subnet Mask . . . . . . . . . . . : 255.255.255.0
   Default Gateway . . . . . . . . . : 10.240.0.1
   DNS Servers . . . . . . . . . . . : 10.0.0.10
   NetBIOS over Tcpip. . . . . . . . : Disabled
patrick-motard commented 6 years ago

I have a windows server within the same subnet with called "my-sql-server". From the windows node i can curl the sql server using curl my-sql-server and get a 200 response. From inside the container on the node i cannot.

ams0 commented 6 years ago

Got it to work today, using acs-engine (commit ba48383a, I build it daily here). What got me really confused is that ping doesn't work from within the container, but Invoke-Webrequest does:

PS C:\> Invoke-webrequest -UseBasicParsing https://google.com

StatusCode        : 200
StatusDescription : OK

Notably, I'm pulling microsoft/windowsservercore:1709 image.

cpunella commented 6 years ago

Hi,

I've just deployed a cluster with acs-engine 0.13.1 and using microsoft/windowsservercore:1709_KB4074588 as base image for my containers but external dns resolution doesn't work.

IpConfig /all result is the same as @patrick-motard

I've installed all windows updates on win node.

JiangtianLi commented 6 years ago

@patrick-motard @cpunella what is output of resolve-dnsname www.bing.com inside the container?

patrick-motard commented 6 years ago
PS C:\> resolve-dnsname www.bing.com

Name                           Type   TTL   Section    NameHost                         
----                           ----   ---   -------    --------                         
www.bing.com                   CNAME  60    Answer     www-bing-com.a-0001.a-msedge.net 
www-bing-com.a-0001.a-msedge.n CNAME  60    Answer     a-0001.dc-msedge.net             
et                                                                                      

Name       : a-0001.dc-msedge.net
QueryType  : A
TTL        : 60
Section    : Answer
IP4Address : 131.253.33.200

Name       : a-0001.dc-msedge.net
QueryType  : A
TTL        : 60
Section    : Answer
IP4Address : 13.107.22.200
patrick-motard commented 6 years ago
PS C:\> Invoke-WebRequest -UseBasicParsing https://google.com

StatusCode        : 200
StatusDescription : OK
Content           : <!doctype html><html itemscope="" 
                    itemtype="http://schema.org/WebPage" lang="en"><head><meta 
                    content="Search the world's information, including webpages, 
                    images, videos and more. Google has many speci...
RawContent        : HTTP/1.1 200 OK
                    X-XSS-Protection: 1; mode=block
                    X-Frame-Options: SAMEORIGIN
                    Cache-Control: private, max-age=0
                    Content-Type: text/html; charset=UTF-8
                    Date: Fri, 02 Mar 2018 17:35:53 GMT
                    Expires: ...
Forms             : 
Headers           : {[X-XSS-Protection, 1; mode=block], [X-Frame-Options, SAMEORIGIN], 
                    [Cache-Control, private, max-age=0], [Content-Type, text/html; 
                    charset=UTF-8]...}
Images            : {@{outerHTML=<img alt="Holi 2018" border="0" height="220" 
                    src="/logos/doodles/2018/holi-2018-5209035568578560-l.png" 
                    title="Holi 2018" width="550" id="hplogo" 
                    onload="window.lol&&lol()">; tagName=IMG; alt=Holi 2018; border=0; 
                    height=220; 
                    src=/logos/doodles/2018/holi-2018-5209035568578560-l.png; 
                    title=Holi 2018; width=550; id=hplogo; onload=window.lol&&lol()}}
InputFields       : {}
Links             : {@{outerHTML=<a onclick=gbar.logger.il(1,{t:1}); class="gbzt gbz0l 
                    gbp1" id=gb_1 href="https://www.google.com/webhp?tab=ww"><span 
                    class=gbtb2></span><span class=gbts>Search</span></a>; tagName=A; 
                    onclick=gbar.logger.il(1,{t:1});; class=gbzt gbz0l gbp1; id=gb_1; 
                    href=https://www.google.com/webhp?tab=ww}, @{outerHTML=<a 
                    onclick=gbar.logger.il(1,{t:2}); class=gbzt id=gb_2 
                    href="https://www.google.com/imghp?hl=en&tab=wi"><span 
                    class=gbtb2></span><span class=gbts>Images</span></a>; tagName=A; 
                    onclick=gbar.logger.il(1,{t:2});; class=gbzt; id=gb_2; 
                    href=https://www.google.com/imghp?hl=en&tab=wi}, @{outerHTML=<a 
                    onclick=gbar.logger.il(1,{t:8}); class=gbzt id=gb_8 
                    href="https://maps.google.com/maps?hl=en&tab=wl"><span 
                    class=gbtb2></span><span class=gbts>Maps</span></a>; tagName=A; 
                    onclick=gbar.logger.il(1,{t:8});; class=gbzt; id=gb_8; 
                    href=https://maps.google.com/maps?hl=en&tab=wl}, @{outerHTML=<a 
                    onclick=gbar.logger.il(1,{t:78}); class=gbzt id=gb_78 
                    href="https://play.google.com/?hl=en&tab=w8"><span 
                    class=gbtb2></span><span class=gbts>Play</span></a>; tagName=A; 
                    onclick=gbar.logger.il(1,{t:78});; class=gbzt; id=gb_78; 
                    href=https://play.google.com/?hl=en&tab=w8}...}
ParsedHtml        : 
RawContentLength  : 47295
4c74356b41 commented 6 years ago

@JiangtianLi I can confirm it works for me, but if I dont add start-sleep 5 to my init script sometimes it crashes.

patrick-motard commented 6 years ago

Both google and bing work. Can't hit a server in the same vnet though. I have a server called 'my-server'. I can curl it and get a 200 back from the node itself but not from the container on the node.

PS C:\> curl -UseBasicParsing my-server
curl : The remote name could not be resolved: 'my-server'
At line:1 char:1
+ curl -UseBasicParsing my-server
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : InvalidOperation: (System.Net.HttpWebRequest:HttpWebReque 
   st) [Invoke-WebRequest], WebException
    + FullyQualifiedErrorId : WebCmdletWebResponseException,Microsoft.PowerShell.Comman 
   ds.InvokeWebRequestCommand
patrick-motard commented 6 years ago

Okay.. I'm a little confused. After some testing I am seeing something different than i thought i saw the other day. I cannot curl the server in the vnet using the servers name from linux containers, nor linux nodes, nor windows nodes. I could have sworn i could use both the name and the IP the other day from all of those locations except the windows container.. I'm not going to be able to look at this again until monday. I'll try to reproduce all of this again and give a more detailed explanation then.

cpunella commented 6 years ago

@JiangtianLi this is the ouput

PS C:\app> resolve-dnsname www.bing.com resolve-dnsname : www.bing.com : This operation returned because the timeout period expired At line:1 char:1

4c74356b41 commented 6 years ago

I'm seeing a really weird behavior, where some of the pods consistently fail to resolve dns, while others work. after updating to latest acs and k8s 1.9.3

upd: its gone on its own after 3 hours. no idea.

4c74356b41 commented 6 years ago

I take it back. networking is extremely unreliable at startup. its just unreliable. no conditions.

4c74356b41 commented 6 years ago

Ok, more findings, acs 0.13.1 doesnt install 2018-02 Cumulative Update for Windows 10 Version 1709 for x64-based Systems (KB4074588) to the windows nodes. is this expected? after installing that update and rebooting internet is gone :)

qmalexander commented 6 years ago

Is there a way to get the internal traffic to work? I run: acs-engine v0.13.1 and Kubernetes 1.9.1. The external traffic works.

msorby commented 6 years ago

Right, is I thought that I had had it working. acs-engine 0.13.0 and K8i 1.9.3, using this image as basis for my container microsoft/windowsservercore:1709_KB4074588. But I'm experiencing the same as @4c74356b41, it's just not reliable. It was working for a bit, but then it stoped and after that it's a no go. This is for external resources.

4c74356b41 commented 6 years ago

@msorby my containers lose internet after node reboot :) tested on 3 clusters built from scratch ;)

qmalexander commented 6 years ago

@JiangtianLi any ideas? :blush:

JiangtianLi commented 6 years ago

@4c74356b41 Regarding issue with reboot, there is a PR to fix it: https://github.com/Azure/acs-engine/pull/2378. acs-engine doesn't choose windows version, it always uses the latest from Azure. @qmalexander For internal traffic, does kube-dns and nslookup kubernetes on your end? Does internal traffic on linux node?

JiangtianLi commented 6 years ago

@madhanrm In case those issues are known from Windows team.

4c74356b41 commented 6 years ago

@JiangtianLi so should I install that kb on agent nodes or not? I thought you said the feb update is required for cluster endpoints to work?

JiangtianLi commented 6 years ago

@4c74356b41 You should not need to manually install anything. acs-engine already patch package in case Feb update is not out. With Feb update, there is no action for you too. What is version on your windows node? What is output of the following: reg query "HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows NT\CurrentVersion" /v BuildLab Get-HotFix

4c74356b41 commented 6 years ago

@JiangtianLi 16299.rs3_release.170928-1534

KB123456
KB4087256
KB999999
KB4056892

the 4056892 is listed as installed by me. 4087256 as system, but 3 days after cluster provision

do you know if it is possible to just delete windows nodes only and redeploy same definition into the same resource group (but built from the pr you mentioned)? or will it crap out? Also, should I use cni or not? I'm asking in terms of stability only. which is more stable at the moment? because for me both do not really work (without that PR at least) :(

msorby commented 6 years ago

@4c74356b41 I lost external dns resolution without node rebooting. But once it's gone it's gone for all pods created.

JiangtianLi commented 6 years ago

@4c74356b41 Azure CNI is not default in networkpolicy and it is also beta stage. If you have any issue with Azure CNI, please report with any detailed repro steps and I will loop in Azure Networking folks.

JiangtianLi commented 6 years ago

@msorby What is ipconfig /all in your container? Can you reach kube-dns from container? Is it external name (www.bing.com) doesn't work or internal name (kubernetes or other k8s service) too?

4c74356b41 commented 6 years ago
Ethernet adapter vEthernet (be1b1dcbfdb5d5238c2680576bdd9d30864cb7e20f639310695879f2b4138d51_l2bridge):

   Connection-specific DNS Suffix  . :
   Description . . . . . . . . . . . : Hyper-V Virtual Ethernet Adapter #3
   Physical Address. . . . . . . . . : 00-15-5D-D4-9E-2D
   DHCP Enabled. . . . . . . . . . . : No
   Autoconfiguration Enabled . . . . : Yes
   Link-local IPv6 Address . . . . . : fe80::462:a0ff:77d2:68a0%21(Preferred)
   IPv4 Address. . . . . . . . . . . : 10.244.5.168(Preferred)
   Subnet Mask . . . . . . . . . . . : 255.255.255.0
   Default Gateway . . . . . . . . . : 10.240.0.1
   DNS Servers . . . . . . . . . . . : 10.0.0.10
   NetBIOS over Tcpip. . . . . . . . : Disabled

I cant resolve anything.

PS C:\> telnet 10.0.0.10 53
Connecting To 10.0.0.10... Could not open connection to the host, on port 53: Connect failed

Get-NSHNetwork | ? name -eq 'l2bridge' | Remove-HNSNetwork do not help.

regarding my previous post. does that look ok? or is something missing from the node?

JiangtianLi commented 6 years ago

@4c74356b41 It appears your kube-dns is not reachable. Can you get kube-dns's status and logs? Does DNS query work from linux node?

4c74356b41 commented 6 years ago

@JiangtianLi yes, resolution works from linux node\containers. this is from linux container:

sh-4.2# getent hosts ya.ru
2a02:6b8::2:242 ya.ru
204.79.197.200  bing.com
13.107.21.200   bing.com
sh-4.2# getent hosts google.tt
2a00:1450:4009:80b::2003 google.tt

how to get kube-dns status? here's the logs from kubedns:

I0302 07:54:36.261100       1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0302 07:54:36.736185       1 dns.go:170] Initialized services and endpoints from apiserver
I0302 07:54:36.736197       1 server.go:135] Setting up Healthz Handler (/readiness)
I0302 07:54:36.736203       1 server.go:140] Setting up cache handler (/cache)
I0302 07:54:36.736210       1 server.go:126] Status HTTP port 8081
I0305 09:07:05.154478       1 logs.go:41] skydns: failure to forward request "read udp 10.244.0.5:54747->168.63.129.16:53: i/o timeout"
I0305 09:07:05.155134       1 logs.go:41] skydns: failure to forward request "read udp 10.244.0.5:54747->168.63.129.16:53: i/o timeout"

nothing valuable before that. other one:

I0302 07:54:26.301287       1 dns.go:146] Starting endpointsController
I0302 07:54:26.301291       1 dns.go:149] Starting serviceController
I0302 07:54:26.301407       1 logs.go:41] skydns: ready for queries on cluster.local. for tcp://0.0.0.0:10053 [rcache 0]
I0302 07:54:26.301415       1 logs.go:41] skydns: ready for queries on cluster.local. for udp://0.0.0.0:10053 [rcache 0]
I0302 07:54:26.804931       1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
XXXX redacted XXXX
I0302 07:54:34.804100       1 dns.go:170] Initialized services and endpoints from apiserver
I0302 07:54:34.804116       1 server.go:135] Setting up Healthz Handler (/readiness)
I0302 07:54:34.804123       1 server.go:140] Setting up cache handler (/cache)
I0302 07:54:34.804132       1 server.go:126] Status HTTP port 8081
JiangtianLi commented 6 years ago

@4c74356b41 kubectl get po -n kube-system -o wide

4c74356b41 commented 6 years ago

@JiangtianLi ah, that status. all running.

kube-dns-v20-597689868c-ftpcv                   3/3       Running   0          4d        10.244.0.6     k8s-linpul-39524942-0
kube-dns-v20-597689868c-m7vps                   3/3       Running   0          4d        10.244.0.5     k8s-linpul-39524942-0
JiangtianLi commented 6 years ago

@4c74356b41 Also what is kubectl get no -o wide output?

4c74356b41 commented 6 years ago

why is kube dns not on master nodes? i would assume it belongs there. but i dont really know k8s all that good :(

NAME                    STATUS    ROLES     AGE       VERSION   EXTERNAL-IP     OS-IMAGE                    KERNEL-VERSION   CONTAINER-RUNTIME
39524k8s9000            Ready     <none>    4d        v1.9.3    51.141.90.143   Windows Server Datacenter   10.0.16299.192           docker://17.6.2
39524k8s9001            NotReady   <none>    4d        v1.9.3    <none>    Windows Server Datacenter   10.0.16299.192           docker://17.6.2
k8s-linpul-39524942-0   Ready     agent     4d        v1.9.3    <none>    Debian GNU/Linux 9 (stretch)   4.13.0-1007-azure   docker://1.13.1
k8s-master-39524942-0   Ready     master    4d        v1.9.3    <none>    Debian GNU/Linux 9 (stretch)   4.13.0-1007-azure   docker://1.13.1
k8s-master-39524942-1   Ready     master    4d        v1.9.3    <none>    Debian GNU/Linux 9 (stretch)   4.13.0-1007-azure   docker://1.13.1
k8s-master-39524942-2   Ready     master    4d        v1.9.3    <none>    Debian GNU/Linux 9 (stretch)   4.13.0-1007-azure   docker://1.13.1

one windows node is shutdown by me to save costs (since its not working anyway).

JiangtianLi commented 6 years ago

@4c74356b41 kube-dns is add-on pod and can be scheduled on agent node. Can you share the output of the following in your container? Test-NetConnection 10.0.0.10 -port 53 Test-NetConnection 10.244.0.5 -port 53 Resolve-DnsName www.bing.com Also on windows node: Test-NetConnection 10.244.0.5 -port 53

4c74356b41 commented 6 years ago

agent node:

PS C:\>   Test-NetConnection 10.244.0.5 -port 53

ComputerName     : 10.244.0.5
RemoteAddress    : 10.244.0.5
RemotePort       : 53
InterfaceAlias   : vEthernet (Ethernet 2)
SourceAddress    : 10.240.0.4
TcpTestSucceeded : True

container:

PS C:\> Test-NetConnection 10.0.0.10 -port 53
WARNING: TCP connect to (10.0.0.10 : 53) failed
WARNING: Ping to 10.0.0.10 failed with status: TimedOut

ComputerName           : 10.0.0.10
RemoteAddress          : 10.0.0.10
RemotePort             : 53
InterfaceAlias         : vEthernet (be1b1dcbfdb5d5238c2680576bdd9d30864cb7e20f639310695879f2b4138d51_l2bridge)
SourceAddress          : 10.244.5.168
PingSucceeded          : False
PingReplyDetails (RTT) : 0 ms
TcpTestSucceeded       : False

PS C:\> resolve-dnsname bing.com
resolve-dnsname : bing.com : This operation returned because the timeout period expired
At line:1 char:1
+ resolve-dnsname bing.com
+ ~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : OperationTimeout: (bing.com:String) [Resolve-DnsName], Win32Exception
    + FullyQualifiedErrorId : ERROR_TIMEOUT,Microsoft.DnsClient.Commands.ResolveDnsName

PS C:\> Test-NetConnection 10.244.0.5 -port 53
WARNING: TCP connect to (10.244.0.5 : 53) failed
WARNING: Ping to 10.244.0.5 failed with status: TimedOut

ComputerName           : 10.244.0.5
RemoteAddress          : 10.244.0.5
RemotePort             : 53
InterfaceAlias         : vEthernet (be1b1dcbfdb5d5238c2680576bdd9d30864cb7e20f639310695879f2b4138d51_l2bridge)
SourceAddress          : 10.244.5.168
PingSucceeded          : False
PingReplyDetails (RTT) : 0 ms
TcpTestSucceeded       : False
JiangtianLi commented 6 years ago

@4c74356b41 Sorry, updated the commands. Can you retry?

JiangtianLi commented 6 years ago

@4c74356b41 Can you use https://github.com/Microsoft/SDN/blob/master/Kubernetes/windows/hns.psm1 to run

Get-HnsEndpoints | ConvertTo-Json -depth 10
Get-HnsPolicyLists | ConvertTo-Json -depth 10

on windows node? Is kube-proxy running on windows node? sc query kubeproxy

4c74356b41 commented 6 years ago

endpoints: https://paste.ee/p/gqMaE policy lists: https://paste.ee/p/rUJFn

PS C:\> get-service kube*

Status   Name               DisplayName
------   ----               -----------
Running  Kubelet            Kubelet
Running  Kubeproxy          Kubeproxy

but sc query kubeproxy returns nothing

mingw2358 commented 6 years ago

I'm also seeing the same issue as @4c74356b41 on my newly provisioned hybrid cluster with windows containers only. No internal or external dns resolution. I have similar outputs from @JiangtianLi 's command as @4c74356b41.