DNS not working in West-Europe when using AKS >= 0.35.1

ghost commented 5 years ago

Describe the bug After upgrading to AKS-Engine 0.35.1, the DNS lookup in pods is broken in the Kubernetes cluster (the cluster runs in West-Europe DC).

Steps To Reproduce Deploy the attached ARM templates:

Generated with AKS-engine 0.31.1 and deployed in WestEurope (used K8s-version is 1.13.3): ExportedTemplate-k8s-aks0311weu-20190607114252-9700.zip

Generated with AKS-engine 0.35.1 and deployed in WestEurope (used K8s version is 1.13.4): ExportedTemplate-k8s-aks0351weu-20190607112724-6873.zip

Test DNS lookup (we executed nslookup in a busybox container):

# Install busybox
$> cat busybox.yml
apiVersion: v1
kind: Pod
metadata:
  name: busybox
  namespace: default
spec:
  containers:
  - name: busybox
    image: busybox:1.28
    command:
      - sleep
      - "3600"
    imagePullPolicy: IfNotPresent
  restartPolicy: Always

$> kubectl --kubeconfig=aks0311weu.json apply -f busybox.yml
pod/busybox created

$> kubectl --kubeconfig=aks0351weu.json apply -f busybox.yml
pod/busybox created

# DNS lookup
$> kubectl --kubeconfig=aks0311weu.json exec busybox -- nslookup www.microsoft.com
Server:    10.0.0.10
Address 1: 10.0.0.10 kube-dns.kube-system.svc.cluster.local

Name:      www.microsoft.com
Address 1: 2a02:26f0:f4:28e::356e g2a02-26f0-00f4-028e-0000-0000-0000-356e.deploy.static.akamaitechnologies.com
Address 2: 2a02:26f0:f4:288::356e g2a02-26f0-00f4-0288-0000-0000-0000-356e.deploy.static.akamaitechnologies.com
Address 3: 2a02:26f0:f4:281::356e g2a02-26f0-00f4-0281-0000-0000-0000-356e.deploy.static.akamaitechnologies.com
Address 4: 104.73.152.80 a104-73-152-80.deploy.static.akamaitechnologies.com

$> kubectl --kubeconfig=aks0351weu.json exec busybox -- nslookup www.microsoft.com
Server:    10.0.0.10
Address 1: 10.0.0.10

nslookup: can't resolve 'www.microsoft.com'

Expected behavior Both deployments should generate a Kubernetes cluster version with a working DNS for pods.

AKS Engine version DNS issue occured for K8s-clusters generated with AKS-engine 0.35.1

Kubernetes version 1.13.4

Additional context Until yesterday (2019-06-06), it was possible for us to deploy a 1.13.4 Kubernetes cluster with AKS-engine 0.35.1 in a South-Brazil DC with a working DNS setup. But after a retest today, the DNS for this generated Kubernetes cluster is now also broken. Seems that something in the South-Brazil DC has changed between yesterday and today.

welcome[bot] commented 5 years ago

👋 Thanks for opening your first issue here! If you're reporting a 🐞 bug, please make sure you include steps to reproduce it.

ghost commented 5 years ago

JFYI: we repeated the test today using AKS-engine 0.36.4. But DNS issue still exists in West-Europe DC.

ghost commented 5 years ago

JFYI: we repeated the test today using AKS-engine 0.36.4. But DNS issue still exists in West-Europe DC.

I trigger another test a few minutes ago and created one more Kubernetes cluster in the South-Brazil DC using AKS-engine 0.35.1. This time the DNS setup of the cluster works!! These ARM files were used:

ExportedTemplate-k8s-aks0351sbr-20190607121739-407.zip

I applied the same steps to verify the DNS:

$> kubectl --kubeconfig=aks0351sbr.json apply -f busybox.yml
pod/busybox created

$> kubectl --kubeconfig=aks0351sbr.json exec busybox -- nslookup www.microsoft.com
Server:    10.0.0.10
Address 1: 10.0.0.10 kube-dns.kube-system.svc.cluster.local

Name:      www.microsoft.com
Address 1: 2600:1419:d000:38a::356e g2600-1419-d000-038a-0000-0000-0000-356e.deploy.static.akamaitechnologies.com
Address 2: 2600:1419:d000:3b0::356e g2600-1419-d000-03b0-0000-0000-0000-356e.deploy.static.akamaitechnologies.com
Address 3: 2600:1419:d000:389::356e g2600-1419-d000-0389-0000-0000-0000-356e.deploy.static.akamaitechnologies.com
Address 4: 184.51.194.68 a184-51-194-68.deploy.static.akamaitechnologies.com

Seems that there are some inconsistencies between the South-Brazil and West-Europe DC which interfere with the K8s cluster DNS.

jackfrancis commented 5 years ago

@tsc-nemeses Can you provide the api model JSON that (don't include private credentials) you're using to build these clusters?

ghost commented 5 years ago

Hi @jackfrancis, thanks for your support!

Here the API model files. Please don't be surprised that the cluster-name in these files is a bit different to the names used in the attached "ExportedTemplate-..." files. But this is the only difference - the remaining content is exactly the same as used for above attached ARM files.

apimodels-aks0311weu_and_aks0351weu.zip

jackfrancis commented 5 years ago

Hi @tsc-nemeses, I've spent time today trying to repro, and haven been unable to do so thus far. I'm using this api model, which should be close enough to yours to engage the same cluster configuration implementations:

{
  "apiVersion": "vlabs",
  "properties": {
    "orchestratorProfile": {
      "orchestratorType": "Kubernetes",
      "orchestratorRelease": "1.13",
      "kubernetesConfig": {
        "clusterSubnet": "10.244.0.0/16",
        "networkPlugin": "kubenet",
        "networkPolicy": "calico",
        "enableRbac": true,
        "enableAggregatedAPIs": true,
        "addons": [
          {
            "name": "cluster-autoscaler",
            "enabled": true,
            "config": {
              "min-nodes": "1",
              "max-nodes": "10",
              "scan-interval": "30s"
            }
          },
          {
            "name": "tiller",
            "enabled": false
          },
          {
            "name": "kubernetes-dashboard",
            "enabled": false
          },
          {
            "name": "blobfuse-flexvolume",
            "enabled": false
          },
          {
            "name": "keyvault-flexvolume",
            "enabled": false
          }
        ]
      }
    },
    "masterProfile": {
      "count": 1,
      "dnsPrefix": "",
      "vmSize": "Standard_D2_v3",
      "vnetSubnetID": "/subscriptions/<sub id>/resourceGroups/kubernetes-westus2-96769/providers/Microsoft.Network/virtualNetworks/kubernetes-westus2-96769CustomVnet/subnets/kubernetes-westus2-96769CustomSubnetMaster",
      "storageProfile": "ManagedDisks",
      "firstConsecutiveStaticIP": "10.239.255.239",
      "vnetCidr": "10.239.0.0/16"
    },
    "agentPoolProfiles": [
      {
        "availabilityProfile": "VirtualMachineScaleSets",
        "name": "agent1",
        "count": 1,
        "vmSize": "Standard_D2_v3",
        "vnetSubnetID": "/subscriptions/<sub id>/resourceGroups/kubernetes-westus2-96769/providers/Microsoft.Network/virtualNetworks/kubernetes-westus2-96769CustomVnet/subnets/agent1CustomSubnet",
        "storageProfile": "ManagedDisks",
        "osType": "Linux"
      }
    ],
    "linuxProfile": {
      "adminUsername": "azureuser",
      "ssh": {
        "publicKeys": [
          {
            "keyData": ""
          }
        ]
      }
    },
    "servicePrincipalProfile": {
      "clientId": "",
      "secret": ""
    }
  }
}

I'll test in westeurope, though these cluster configs should be re-usable across all regions.

I did notice a couple undesirable symptoms from this config:

$ kubectl top nodes
Error from server (ServiceUnavailable): the server is currently unable to handle the request (get nodes.metrics.k8s.io)
$ kubectl get pods cluster-autoscaler-884965997-q79mj -n kube-system
NAME                                 READY   STATUS             RESTARTS   AGE
cluster-autoscaler-884965997-q79mj   0/1     CrashLoopBackOff   6          13m

Not to be a distraction, but have you encountered any of the above errors?

tl;dr so far I am able to validate container DNS

jackfrancis commented 5 years ago

I'm able to repro failing container DNS in westeurope. I can repro in both v0.31.1 and v0.35.1 versions of AKS Engine :(

So this is (probably) not an issue with the ARM deployment definition changing for the worse.

jackfrancis commented 5 years ago

Confirmed that removing calico from the cluster configuration restores container DNS. Again: only repro'd in westeurope so far (that is odd, looking further into that).

jackfrancis commented 5 years ago

Confirmed that calico + Azure CNI does not exhibit these symptoms in westeurope, so the calico + kubenet scenario is the symptomatic one.

jackfrancis commented 5 years ago

@song-jiang Is there anything obvious we would investigate to root cause calico + kubenet cluster configurations breaking container DNS?

Superficially all the calico componentry looks fine on my repro cluster:

FrancisBookMS:aks-engine jackfrancis$ k get CustomResourceDefinition --all-namespaces -o wide
NAME                                          CREATED AT
bgpconfigurations.crd.projectcalico.org       2019-06-11T23:44:17Z
clusterinformations.crd.projectcalico.org     2019-06-11T23:44:17Z
felixconfigurations.crd.projectcalico.org     2019-06-11T23:44:17Z
globalnetworkpolicies.crd.projectcalico.org   2019-06-11T23:44:17Z
globalnetworksets.crd.projectcalico.org       2019-06-11T23:44:17Z
hostendpoints.crd.projectcalico.org           2019-06-11T23:44:17Z
ippools.crd.projectcalico.org                 2019-06-11T23:44:17Z
networkpolicies.crd.projectcalico.org         2019-06-11T23:44:17Z
networksets.crd.projectcalico.org             2019-06-11T23:44:17Z
FrancisBookMS:aks-engine jackfrancis$ k get svc --all-namespaces -o wide
NAMESPACE     NAME             TYPE        CLUSTER-IP    EXTERNAL-IP   PORT(S)                  AGE     SELECTOR
default       kubernetes       ClusterIP   10.0.0.1      <none>        443/TCP                  9m45s   <none>
kube-system   calico-typha     ClusterIP   10.0.72.230   <none>        5473/TCP                 9m15s   k8s-app=calico-typha
kube-system   kube-dns         ClusterIP   10.0.0.10     <none>        53/UDP,53/TCP,9153/TCP   9m19s   k8s-app=kube-dns
kube-system   metrics-server   ClusterIP   10.0.95.120   <none>        443/TCP                  9m19s   k8s-app=metrics-server
FrancisBookMS:aks-engine jackfrancis$ k get deployments --all-namespaces -o wide
NAMESPACE     NAME                                 READY   UP-TO-DATE   AVAILABLE   AGE   CONTAINERS           IMAGES                                                      SELECTOR
kube-system   calico-typha                         1/1     1            1           10m   calico-typha         calico/typha:v3.7.2                                         k8s-app=calico-typha
kube-system   calico-typha-horizontal-autoscaler   1/1     1            1           10m   autoscaler           k8s.gcr.io/cluster-proportional-autoscaler-amd64:1.1.2-r2   k8s-app=calico-typha-autoscaler
kube-system   cluster-autoscaler                   1/1     1            1           10m   cluster-autoscaler   k8s.gcr.io/cluster-autoscaler:v1.13.4                       app=cluster-autoscaler
kube-system   coredns                              1/1     1            1           10m   coredns              k8s.gcr.io/coredns:1.5.0                                    k8s-app=kube-dns
kube-system   metrics-server                       1/1     1            1           10m   metrics-server       k8s.gcr.io/metrics-server-amd64:v0.2.1                      k8s-app=metrics-server
FrancisBookMS:aks-engine jackfrancis$ k get daemonsets --all-namespaces -o wide
NAMESPACE     NAME                  DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR                 AGE   CONTAINERS            IMAGES                                  SELECTOR
kube-system   azure-ip-masq-agent   2         2         2       2            2           beta.kubernetes.io/os=linux   10m   azure-ip-masq-agent   k8s.gcr.io/ip-masq-agent-amd64:v2.0.0   k8s-app=azure-ip-masq-agent,tier=node
kube-system   calico-node           2         2         2       2            2           beta.kubernetes.io/os=linux   10m   calico-node           calico/node:v3.7.2                      k8s-app=calico-node
kube-system   kube-proxy            2         2         2       2            2           beta.kubernetes.io/os=linux   10m   kube-proxy            k8s.gcr.io/hyperkube-amd64:v1.13.7      component=kube-proxy,tier=node

Our DNS validation E2E is telling us that container resolution is broken:

$ k logs validate-dns-linux-22nmh -c validate-bing -n default completed in 1.370103197s
2019/06/11 16:47:08 
;; connection timed out; no servers could be reached

waiting for DNS resolution
;; connection timed out; no servers could be reached

@feiskyer Have you ever observed this scenario?

jackfrancis commented 5 years ago

@tsc-nemeses Also worth checking, did you manually add a route table attachment to your custom VNET after creating the cluster as described here?

https://github.com/Azure/aks-engine/blob/master/docs/tutorials/custom-vnet.md#post-deployment-attach-cluster-route-table-to-vnet

The above step is required for kubenet.

feiskyer commented 5 years ago

@feiskyer Have you ever observed this scenario?

Not yet. Are there any errors in DNS pod?

ghost commented 5 years ago

@tsc-nemeses Also worth checking, did you manually add a route table attachment to your custom VNET after creating the cluster as described here?

https://github.com/Azure/aks-engine/blob/master/docs/tutorials/custom-vnet.md#post-deployment-attach-cluster-route-table-to-vnet

The above step is required for kubenet.

Hi @jackfrancis, yes we attach the subnet to the routetable always (happens as part of our provisionig logic). Below a screenshot of one of the route tables after the provisioning if completed:

routetable_vnet

I also verified that the VMs have no DNS resolution issues.

ghost commented 5 years ago

Hi @jackfrancis, I created today two new K8s clusters in West-Europe and DNS was working on both... wtf!?! Seems the issue disappeared magically over night. Can you please confirm that DNS now also works for you with calico + kubenet? Thanks a lot.

song-jiang commented 5 years ago

@jackfrancis @tsc-nemeses In your repro cluster, could you verify if DNS traffic between busybox and coreDNS is healthy by nslookup kubernetes.default.svc.cluster.local? This helps us to isolate the problem.

ghost commented 5 years ago

Hi @song-jiang , I created now one more K8s cluster in West-Europe and this time the DNS is failing again (see my previous post, the clusters I created this morning didn't have DNS issues... now the problem is back :( ).

Anyway, I deployed a busybox pod in the cluster and got this result:

$> kubectl --kubeconfig=testweu.json exec busybox -- nslookup kubernetes.default.svc.cluster.local
Server:    10.0.0.10
Address 1: 10.0.0.10

nslookup: can't resolve 'kubernetes.default.svc.cluster.local'
command terminated with exit code 1

# check resolv.conf of busybox
$>  kubectl --kubeconfig=testweu.json exec busybox -- cat /etc/resolv.conf
nameserver 10.0.0.10
search default.svc.cluster.local svc.cluster.local cluster.local rellexsk44hungjgqsg2vkbwnh.ax.internal.cloudapp.net
options ndots:5

# DNS service is configured
$ kubectl --kubeconfig=testweu.json get svc --all-namespaces
NAMESPACE     NAME             TYPE        CLUSTER-IP    EXTERNAL-IP   PORT(S)         AGE
default       kubernetes       ClusterIP   10.0.0.1      <none>        443/TCP         50m
kube-system   calico-typha     ClusterIP   10.0.68.179   <none>        5473/TCP        50m
kube-system   kube-dns         ClusterIP   10.0.0.10     <none>        53/UDP,53/TCP   49m
kube-system   metrics-server   ClusterIP   10.0.23.233   <none>        443/TCP         50m

# Show endpoints of dns-svc
$> kubectl --kubeconfig=testweu.json describe svc kube-dns -n kube-system
Name:              kube-dns
Namespace:         kube-system
Labels:            addonmanager.kubernetes.io/mode=Reconcile
                   k8s-app=kube-dns
                   kubernetes.io/cluster-service=true
                   kubernetes.io/name=CoreDNS
Annotations:       kubectl.kubernetes.io/last-applied-configuration:
                     {"apiVersion":"v1","kind":"Service","metadata":{"annotations":{"prometheus.io/port":"9153","prometheus.io/scrape":"true"},"labels":{"addon...
                   prometheus.io/port: 9153
                   prometheus.io/scrape: true
Selector:          k8s-app=kube-dns
Type:              ClusterIP
IP:                10.0.0.10
Port:              dns  53/UDP
TargetPort:        53/UDP
Endpoints:         10.244.0.3:53
Port:              dns-tcp  53/TCP
TargetPort:        53/TCP
Endpoints:         10.244.0.3:53
Session Affinity:  None
Events:            <none>

# CoreDNS pod is running
$> kubectl --kubeconfig=testweu.json get po -n kube-system -o custom-columns=NAME:.metadata.name,STATUS:.status.phase,IP:.status.podIP
NAME                                                  STATUS    IP
azure-ip-masq-agent-8jhn4                             Running   172.16.0.4
azure-ip-masq-agent-tlqt9                             Running   172.16.0.200
calico-node-4nqkv                                     Running   172.16.0.200
calico-node-fn7wn                                     Running   172.16.0.4
calico-typha-5b9c58cb4c-b9dsj                         Running   172.16.0.4
calico-typha-horizontal-autoscaler-847fc7bc8d-687dx   Running   10.244.1.3
cluster-autoscaler-f6689bc54-8s2r9                    Running   10.244.0.2
coredns-59b998c9dd-fs85c                              Running   10.244.0.3
kube-addon-manager-k8s-master-32653971-0              Running   172.16.0.200
kube-apiserver-k8s-master-32653971-0                  Running   172.16.0.200
kube-controller-manager-k8s-master-32653971-0         Running   172.16.0.200
kube-proxy-gx9bf                                      Running   172.16.0.200
kube-proxy-w5wj4                                      Running   172.16.0.4
kube-scheduler-k8s-master-32653971-0                  Running   172.16.0.200
metrics-server-69b44566d5-vcs7s                       Running   10.244.1.2

song-jiang commented 5 years ago

@tsc-nemeses Thanks for the details. Just to confirm are you running a cluster with one agent node and all pods include busybox and coreDNS are scheduled on the same agent node?

Could you create a nginx pod and try to check traffic between busybox and nginx? Also what happened if you ping coreDNS pod IP directly from busybox.

ghost commented 5 years ago

Hi @song-jiang, thanks a lot for your hlep! Yes, I can confirm that we run 1 master node and 1 agent node (for our developer setup at least).

$> kubectl --kubeconfig=testweu.json get nodes -o wide
NAME                                   STATUS   ROLES    AGE    VERSION   INTERNAL-IP    EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION      CONTAINER-RUNTIME
k8s-master-32653971-0                  Ready    master   4h1m   v1.13.4   172.16.0.200   <none>        Ubuntu 16.04.6 LTS   4.15.0-1042-azure   docker://3.0.4
k8s-mttestweunod-32653971-vmss000000   Ready    agent    4h     v1.13.4   172.16.0.4     <none>        Ubuntu 16.04.6 LTS   4.15.0-1042-azure   docker://3.0.4

Here the tests with NGINX:

# Deploy NGINX
$ kubectl --kubeconfig=testweu.json create deployment nginx --image=nginx

# Check nginx pod is up
$ kubectl --kubeconfig=testweu.json get po -o wide
NAME                   READY   STATUS    RESTARTS   AGE     IP           NODE                                   NOMINATED NODE   READINESS GATES
busybox                1/1     Running   3          3h19m   10.244.1.4   k8s-mttestweunod-32653971-vmss000000   <none>           <none>
nginx-5c7588df-5qdlj   1/1     Running   0          35s     10.244.1.5   k8s-mttestweunod-32653971-vmss000000   <none>           <none>

# request NGINX from busybox
$ kubectl --kubeconfig=testweu.json exec busybox -- wget  http://10.244.1.5 -O -
<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
<style>
    body {
        width: 35em;
        margin: 0 auto;
        font-family: Tahoma, Verdana, Arial, sans-serif;
    }
</style>
</head>
<body>
<h1>Welcome to nginx!</h1>
<p>If you see this page, the nginx web server is successfully installed and
working. Further configuration is required.</p>

<p>For online documentation and support please refer to
<a href="http://nginx.org/">nginx.org</a>.<br/>
Commercial support is available at
<a href="http://nginx.com/">nginx.com</a>.</p>

<p><em>Thank you for using nginx.</em></p>
</body>
</html>
Connecting to 10.244.1.5 (10.244.1.5:80)
-                    100% |*******************************|   612   0:00:00 ETA

# Get CoreDNS IP
$ kubectl --kubeconfig=testweu.json get po -n kube-system -o wide | grep coredns
coredns-59b998c9dd-fs85c                              1/1     Running   0          4h8m   10.244.0.3     k8s-master-32653971-0                  <none>           <none>

# Ping from busybox pod
kubectl --kubeconfig=testweu.json exec -ti busybox -- sh
## Ping coreDNS
/ # ping 10.244.0.3
PING 10.244.0.3 (10.244.0.3): 56 data bytes
^C
--- 10.244.0.3 ping statistics ---
5 packets transmitted, 0 packets received, 100% packet loss

## Ping Google DNS
/ # ping 8.8.8.8
PING 8.8.8.8 (8.8.8.8): 56 data bytes
^C
--- 8.8.8.8 ping statistics ---
7 packets transmitted, 0 packets received, 100% packet loss

## Ping metric-server pod
/ # ping 10.244.1.2
PING 10.244.1.2 (10.244.1.2): 56 data bytes
64 bytes from 10.244.1.2: seq=0 ttl=63 time=0.086 ms
64 bytes from 10.244.1.2: seq=1 ttl=63 time=0.112 ms
64 bytes from 10.244.1.2: seq=2 ttl=63 time=0.101 ms
64 bytes from 10.244.1.2: seq=3 ttl=63 time=0.106 ms
64 bytes from 10.244.1.2: seq=4 ttl=63 time=0.080 ms
^C
--- 10.244.1.2 ping statistics ---
5 packets transmitted, 5 packets received, 0% packet loss
round-trip min/avg/max = 0.080/0.097/0.112 ms

## Ping WORKER node
/ # ping 172.16.0.4
PING 172.16.0.4 (172.16.0.4): 56 data bytes
64 bytes from 172.16.0.4: seq=0 ttl=64 time=0.058 ms
64 bytes from 172.16.0.4: seq=1 ttl=64 time=0.057 ms
64 bytes from 172.16.0.4: seq=2 ttl=64 time=0.075 ms
^C
--- 172.16.0.4 ping statistics ---
3 packets transmitted, 3 packets received, 0% packet loss
round-trip min/avg/max = 0.057/0.063/0.075 ms

## Ping MASTER node
/ # ping 172.16.0.200
PING 172.16.0.200 (172.16.0.200): 56 data bytes
64 bytes from 172.16.0.200: seq=0 ttl=63 time=0.818 ms
64 bytes from 172.16.0.200: seq=1 ttl=63 time=0.968 ms
^C
--- 172.16.0.200 ping statistics ---
2 packets transmitted, 2 packets received, 0% packet loss

song-jiang commented 5 years ago

@tsc-nemeses No problem. From your log, I can see coreDNS is located on mater node while nginx and metric-server is located on the same node as busybox. That suggest connection issues from agent to master.

You can schedule a nginx on master node via nodeSelector and check connections to confirm.
You can then use tcpdump on both master node and slave node to trace packets. Please make sure vnet routing table is setup correctly, if you see right packet been sent out from agent but failed to arrive at master, this imply an azure network issue in your vnet.

jackfrancis commented 5 years ago

That would also explain the randomness of the errors, as when you get a coreos scheduled to the node you don't have to leave the VM to get to the resolver.

jackfrancis commented 5 years ago

Unable to repro this morning, I manually deleted the coredns pod until it was rescheduled onto the master:

$ k get pods --all-namespaces -o wide
NAMESPACE     NAME                                                  READY   STATUS    RESTARTS   AGE   IP               NODE                             NOMINATED NODE   READINESS GATES
kube-system   azure-ip-masq-agent-2wzk6                             1/1     Running   0          44m   10.239.1.4       k8s-agent1-11094137-vmss000000   <none>           <none>
kube-system   azure-ip-masq-agent-pfk6p                             1/1     Running   0          44m   10.239.255.239   k8s-master-11094137-0            <none>           <none>
kube-system   calico-node-grs8m                                     1/1     Running   0          44m   10.239.255.239   k8s-master-11094137-0            <none>           <none>
kube-system   calico-node-md8wb                                     1/1     Running   0          44m   10.239.1.4       k8s-agent1-11094137-vmss000000   <none>           <none>
kube-system   calico-typha-6fdb448858-qr5px                         1/1     Running   0          44m   10.239.1.4       k8s-agent1-11094137-vmss000000   <none>           <none>
kube-system   calico-typha-horizontal-autoscaler-847fc7bc8d-xbh79   1/1     Running   0          44m   10.244.1.4       k8s-agent1-11094137-vmss000000   <none>           <none>
kube-system   cluster-autoscaler-85ffdb4687-vg6w5                   1/1     Running   1          45m   10.244.0.2       k8s-master-11094137-0            <none>           <none>
kube-system   coredns-5b6f59676b-cczzw                              1/1     Running   0          34m   10.244.0.3       k8s-master-11094137-0            <none>           <none>
kube-system   kube-addon-manager-k8s-master-11094137-0              1/1     Running   0          44m   10.239.255.239   k8s-master-11094137-0            <none>           <none>
kube-system   kube-apiserver-k8s-master-11094137-0                  1/1     Running   0          44m   10.239.255.239   k8s-master-11094137-0            <none>           <none>
kube-system   kube-controller-manager-k8s-master-11094137-0         1/1     Running   0          44m   10.239.255.239   k8s-master-11094137-0            <none>           <none>
kube-system   kube-proxy-zlg87                                      1/1     Running   0          44m   10.239.255.239   k8s-master-11094137-0            <none>           <none>
kube-system   kube-proxy-zxjj5                                      1/1     Running   0          44m   10.239.1.4       k8s-agent1-11094137-vmss000000   <none>           <none>
kube-system   kube-scheduler-k8s-master-11094137-0                  1/1     Running   0          44m   10.239.255.239   k8s-master-11094137-0            <none>           <none>
kube-system   metrics-server-69b44566d5-qsjvx                       1/1     Running   0          44m   10.244.1.2       k8s-agent1-11094137-vmss000000   <none>           <none>

Container DNS is working fine.

ghost commented 5 years ago

Hi guys, thanks for deeper investigation. I can confirm that the issue seems to be caused that the coreDNS-pod is not reachable when running on a different VM (if it is running on a master node the busybox cannot reach the coreDNS pod, when it is running on the same node it works).

This issue is obviously DC specific and occurs in West-Europe DC, but cannot be reproduce in SouthBrazil DC (just as example).

Evidence for West-Europe DC specific K8s-DNS issue when using AKS-engine 0.35.1 (with K8s 1.13.4):

# WEST-EUROPE system pods (here with coreDNS running on master node):
$> kubectl --kubeconfig=mtd826weu.json get po -o wide -n kube-system
NAME                                                  READY   STATUS    RESTARTS   AGE     IP             NODE                                   NOMINATED NODE   READINESS GATES
azure-ip-masq-agent-m286g                             1/1     Running   0          9m6s    172.16.0.4     k8s-mtmtd826weun-40425019-vmss000000   <none>           <none>
azure-ip-masq-agent-ptnt8                             1/1     Running   0          9m6s    172.16.0.200   k8s-master-40425019-0                  <none>           <none>
calico-node-fxpxd                                     1/1     Running   0          9m11s   172.16.0.4     k8s-mtmtd826weun-40425019-vmss000000   <none>           <none>
calico-node-mkflw                                     1/1     Running   0          9m11s   172.16.0.200   k8s-master-40425019-0                  <none>           <none>
calico-typha-5b9c58cb4c-cgfj4                         1/1     Running   0          9m11s   172.16.0.4     k8s-mtmtd826weun-40425019-vmss000000   <none>           <none>
calico-typha-horizontal-autoscaler-847fc7bc8d-lkw5m   1/1     Running   0          9m6s    10.244.1.3     k8s-mtmtd826weun-40425019-vmss000000   <none>           <none>
cluster-autoscaler-6459599fc-nqpjl                    1/1     Running   0          9m11s   10.244.0.3     k8s-master-40425019-0                  <none>           <none>
coredns-59b998c9dd-88s9g                              1/1     Running   0          9m5s    10.244.0.2     k8s-master-40425019-0                  <none>           <none>
kube-addon-manager-k8s-master-40425019-0              1/1     Running   0          8m44s   172.16.0.200   k8s-master-40425019-0                  <none>           <none>
kube-apiserver-k8s-master-40425019-0                  1/1     Running   0          8m41s   172.16.0.200   k8s-master-40425019-0                  <none>           <none>
kube-controller-manager-k8s-master-40425019-0         1/1     Running   0          8m34s   172.16.0.200   k8s-master-40425019-0                  <none>           <none>
kube-proxy-2qgjm                                      1/1     Running   0          9m6s    172.16.0.4     k8s-mtmtd826weun-40425019-vmss000000   <none>           <none>
kube-proxy-tmdmn                                      1/1     Running   0          9m6s    172.16.0.200   k8s-master-40425019-0                  <none>           <none>
kube-scheduler-k8s-master-40425019-0                  1/1     Running   0          8m41s   172.16.0.200   k8s-master-40425019-0                  <none>           <none>
metrics-server-69b44566d5-r7dmp                       1/1     Running   0          9m6s    10.244.1.2     k8s-mtmtd826weun-40425019-vmss000000   <none>           <none>

# Verfiy that busybox is running on worker node
$> kubectl --kubeconfig=mtd826weu.json get pods -o wide
NAME      READY   STATUS    RESTARTS   AGE   IP           NODE                                   NOMINATED NODE   READINESS GATES
busybox   1/1     Running   0          48m   10.244.1.4   k8s-mtmtd826weun-40425019-vmss000000   <none>           <none>

# DNS lookup fails as long the coreDNS pod is not running on the same node as the busybox pod is running:
$> kubectl --kubeconfig=mtd826weu.json exec busybox -- nslookup www.microsoft.com
nslookup: can't resolve 'www.microsoft.com'
Server:    10.0.0.10
Address 1: 10.0.0.10

command terminated with exit code 1

# Re-schedule coreDNS pod on worker node
$> kubectl --kubeconfig=mtd826weu.json delete pod coredns-59b998c9dd-88s9g -n kube-system
pod "coredns-59b998c9dd-88s9g" deleted
$> kubectl --kubeconfig=mtd826weu.json get po -o wide -n kube-system | grep coredns
coredns-59b998c9dd-ws5gc                              1/1     Running   0          31s   10.244.1.5     k8s-mtmtd826weun-40425019-vmss000000   <none>           <none>

# Re-test DNS lookup form busybox pod (works now because coreDNS is running on same node)
$ kubectl --kubeconfig=mtd826weu.json exec busybox -- nslookup www.microsoft.com
Server:    10.0.0.10
Address 1: 10.0.0.10 kube-dns.kube-system.svc.cluster.local

Name:      www.microsoft.com
Address 1: 2a02:26f0:f4:295::356e g2a02-26f0-00f4-0295-0000-0000-0000-356e.deploy.static.akamaitechnologies.com
Address 2: 2a02:26f0:f4:28c::356e g2a02-26f0-00f4-028c-0000-0000-0000-356e.deploy.static.akamaitechnologies.com
Address 3: 2a02:26f0:f4:29d::356e g2a02-26f0-00f4-029d-0000-0000-0000-356e.deploy.static.akamaitechnologies.com
Address 4: 23.208.77.128 a23-208-77-128.deploy.static.akamaitechnologies.com

################ RE-TEST WITH SOUTH-BRAZIL DC #############

# SOUTH-BRAZIL system pods (DNS is running on master node here!)
$> kubectl --kubeconfig=mtd826sbr.json get po -o wide -n kube-system
NAME                                                  READY   STATUS    RESTARTS   AGE     IP             NODE                                   NOMINATED NODE   READINESS GATES
azure-ip-masq-agent-gcr8j                             1/1     Running   0          7m54s   172.16.0.4     k8s-mtmtd826sbrn-40511682-vmss000000   <none>           <none>
azure-ip-masq-agent-pps7b                             1/1     Running   0          7m54s   172.16.0.200   k8s-master-40511682-0                  <none>           <none>
calico-node-fsj9l                                     1/1     Running   0          7m58s   172.16.0.4     k8s-mtmtd826sbrn-40511682-vmss000000   <none>           <none>
calico-node-lqhzl                                     1/1     Running   0          7m58s   172.16.0.200   k8s-master-40511682-0                  <none>           <none>
calico-typha-5b9c58cb4c-k8869                         1/1     Running   0          7m58s   172.16.0.4     k8s-mtmtd826sbrn-40511682-vmss000000   <none>           <none>
calico-typha-horizontal-autoscaler-847fc7bc8d-dn4zp   1/1     Running   0          7m54s   10.244.0.3     k8s-mtmtd826sbrn-40511682-vmss000000   <none>           <none>
cluster-autoscaler-9f6cd9f57-k8lld                    1/1     Running   0          7m59s   10.244.1.2     k8s-master-40511682-0                  <none>           <none>
coredns-59b998c9dd-5nlhn                              1/1     Running   0          7m53s   10.244.1.3     k8s-master-40511682-0                  <none>           <none>
kube-addon-manager-k8s-master-40511682-0              1/1     Running   0          7m32s   172.16.0.200   k8s-master-40511682-0                  <none>           <none>
kube-apiserver-k8s-master-40511682-0                  1/1     Running   0          7m19s   172.16.0.200   k8s-master-40511682-0                  <none>           <none>
kube-controller-manager-k8s-master-40511682-0         1/1     Running   0          7m14s   172.16.0.200   k8s-master-40511682-0                  <none>           <none>
kube-proxy-cf4d8                                      1/1     Running   0          7m54s   172.16.0.200   k8s-master-40511682-0                  <none>           <none>
kube-proxy-n9xnv                                      1/1     Running   0          7m54s   172.16.0.4     k8s-mtmtd826sbrn-40511682-vmss000000   <none>           <none>
kube-scheduler-k8s-master-40511682-0                  1/1     Running   0          7m32s   172.16.0.200   k8s-master-40511682-0                  <none>           <none>
metrics-server-69b44566d5-6dqd7                       1/1     Running   0          7m54s   10.244.0.2     k8s-mtmtd826sbrn-40511682-vmss000000   <none>           <none>

## Verfiy that busybox is running on worker node
$> kubectl --kubeconfig=mtd826sbr.json get pods -o wide
NAME      READY   STATUS    RESTARTS   AGE   IP           NODE                                   NOMINATED NODE   READINESS GATES
busybox   1/1     Running   0          46m   10.244.0.4   k8s-mtmtd826sbrn-40511682-vmss000000   <none>           <none>

$> kubectl --kubeconfig=mtd826sbr.json exec busybox -- nslookup www.microsoft.com
Server:    10.0.0.10
Address 1: 10.0.0.10 kube-dns.kube-system.svc.cluster.local

Name:      www.microsoft.com
Address 1: 2600:1419:d000:38a::356e g2600-1419-d000-038a-0000-0000-0000-356e.deploy.static.akamaitechnologies.com
Address 2: 2600:1419:d000:386::356e g2600-1419-d000-0386-0000-0000-0000-356e.deploy.static.akamaitechnologies.com
Address 3: 184.51.132.196 a184-51-132-196.deploy.static.akamaitechnologies.com
macpro-tobi:tmp t$

I also compared how the coreDNS pod is scheduled when using AKS-engine 0.31.1 (this was the previously used version and we haven't recognized DNS issues with it). There the pod is consistently scheduled on the worker node. This explains why we haven't noticed this connectivity issue before.

ghost commented 5 years ago

So, I think this needs to be handed back to the Azure support for further investigation. Thanks all for your help!

piotrgwiazda commented 5 years ago

Hi. What was the outcome? Having the same issue in UK South. Tried Kubernetes 1.12.8 and 1.13.5 with kubenet. Works fine for a day or two and then dns lookups become slower until they timeout.

//EDIT: It turned out that with Azure CNI plugin everything works fine. Seems that it was a kubenet proble,.

stale[bot] commented 5 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

marcinma commented 5 years ago

Possible similar issue happening on AKS created for az (random dns resolution issues)

ghost commented 5 years ago

Together with Azure support we figured out that the root-cause was a race-condition.

Our use-case was:

we deployed our VNet + Subnets with ARM
we deployed K8s with ARM (unfortunately, this deployment doesn't automatically add associations between Subnet and Routetable)
To close this gap we executed another VNet ARM deployment after the K8s-ARM deployment was finished. But this second VNET ARM deployment was the cause for the trouble! Some of the ARM changes were colliding with still ongoing changes from the previous K8s-ARM deployment. The API didn't report here errors and also the Azure-Portal showed the association between Subnet and Routetable - so for us everything looked fine but in real the association between Subnet and Routetable didn't exist and traffic wasn't routed properly between K8s nodes!

We found three options to fix this:

Add a delay between the K8s ARM deployment and the second VNET ARM deployment (pretty unreliable solution)
Don't use an ARM deployment to add the missing association between Subnet and Routetable and use the Azure networking API instead
Run the second VNET ARM deployment twice.. the second deployment fixed the issue in our tests

jackfrancis commented 5 years ago

Thanks for reporting back @tsc-nemeses. So it sounds like the root cause is that the 2nd ARM deployment returns a successful state, but the deployment is not actually in a terminal state (i.e., it is not really finished?)

That sounds like an issue we might report to the ARM team, if so.

Just curious, why not use option 2 and just do a vanilla Azure API call?

ghost commented 5 years ago

Exactly, the 2nd ARM deployment isn't really finished. We go on our side now with option 2 ;)

jackfrancis commented 5 years ago

Thanks again for sharing all this detailed info to help other customers who are having similar problems with similar workflows.

Should we rename this issue to better reflect what we've learned (so that other folks can more easily find it), and then close it?

stale[bot] commented 4 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Azure / aks-engine

DNS not working in West-Europe when using AKS >= 0.35.1 #1444