Azure / acs-engine

WE HAVE MOVED: Please join us at Azure/aks-engine!
https://github.com/Azure/aks-engine
MIT License
1.03k stars 560 forks source link

Custom clusterSubnet unable to access pods [ Error from server: error dialing backend: dial tcp: lookup i/o timeout] #2186

Closed VipinPS closed 6 years ago

VipinPS commented 6 years ago

What happened ?

Our goal was to create an internal kubernetes cluster using our Internal VNET with is having express-route connectivity. Our internal VNET resides in a separate ResourceGroup. So deploying such a custom cluster was not possible using ACS or AKS. So we preferred ACS-Engine to bring our own VNET for deploying an internal facing Kubernetes cluster. The maximum IP block which we can assign to a single kubernetes cluster is a /28 block (another limitation). We need only the master , minions and kubernetes Loadbalancer service use our internal blocks.

We have tried to deploy a cluster using using following ARM template. We use our internal IP blocks for vnetCidr which is used by master and agent nodes and use separate clusterSubnet address. The cluster was created successfully. Initially, the kube-dashboard and DNS pods were failing, it was solved by adding route table to our internal subnet. We were able to deploy apps and use internal kubernetes loadbalancer service for accessing them. But we were unable to access pods created on separate clusterSubnet using kubectl.

We get the following errors while accessing pods:

:~$ kubectl exec -it nginx-31893996-8mxj0 bash
Error from server: error dialing backend: dial tcp: lookup k8s-acusnlpk8s-34362440-0 on 10.165.65.8:53: read udp 10.246.65.55:35783->10.165.65.8:53: i/o timeout to 

How to reproduce it ?

I've added some details which can help someone to replicate this problem.

Details :

Ubuntu 16.04.3
Acs-Engine version :  v0.12.4
Docker Version : 1.12.6
Kubernetes Version : v1.7.9

Details of all containers :

:~# kubectl get pods -o wide --all-namespaces
NAMESPACE     NAME                                            READY     STATUS    RESTARTS   AGE       IP             NODE
default       nginx-31893996-8mxj0                            1/1       Running   0          3m        10.244.1.9     k8s-acusnlpk8s-29067030-0
default       nginx-31893996-sm74r                            1/1       Running   0          3m        10.244.1.8     k8s-acusnlpk8s-29067030-0
kube-system   heapster-2574232661-jk932                       2/2       Running   0          12m       10.244.1.7     k8s-acusnlpk8s-29067030-0
kube-system   kube-addon-manager-k8s-master-29067030-0        1/1       Running   0          14m       10.246.65.55   k8s-master-29067030-0
kube-system   kube-apiserver-k8s-master-29067030-0            1/1       Running   0          14m       10.246.65.55   k8s-master-29067030-0
kube-system   kube-controller-manager-k8s-master-29067030-0   1/1       Running   0          15m       10.246.65.55   k8s-master-29067030-0
kube-system   kube-dns-v20-2000462293-szqjp                   3/3       Running   0          13m       10.244.1.2     k8s-acusnlpk8s-29067030-0
kube-system   kube-dns-v20-2000462293-vxb3b                   3/3       Running   0          13m       10.244.1.3     k8s-acusnlpk8s-29067030-0
kube-system   kube-proxy-3ttq3                                1/1       Running   0          13m       10.246.65.53   k8s-acusnlpk8s-29067030-0
kube-system   kube-proxy-4tn54                                1/1       Running   0          13m       10.246.65.55   k8s-master-29067030-0
kube-system   kube-scheduler-k8s-master-29067030-0            1/1       Running   0          14m       10.246.65.55   k8s-master-29067030-0
kube-system   kubernetes-dashboard-732940207-pgcpt            1/1       Running   0          13m       10.244.1.5     k8s-acusnlpk8s-29067030-0
kube-system   tiller-deploy-2745651589-s0sv1                  1/1       Running   0          14m       10.244.1.4     k8s-acusnlpk8s-29067030-0

ARM Template used for deploying :

{
  "apiVersion": "vlabs",
  "properties": {
      "orchestratorProfile": {
      "orchestratorType": "Kubernetes",
      "kubernetesConfig": {
      "enableRbac": true,
      "networkPolicy": "none",
      "clusterSubnet": "10.244.0.0/16",
      "maxPods": 30
    }
    },
    "masterProfile": {
      "count": 1,
      "dnsPrefix": "",
      "vmSize": "Standard_A3",
      "vnetSubnetId": "/subscriptions/[REDACTED]/resourceGroups/[REDACTED]/providers/Microsoft.Network/virtualNetworks/[REDACTED]/subnets/[REDACTED]",
      "firstConsecutiveStaticIP": "10.246.65.55",
      "vnetCidr": "10.246.65.48/28",
      "ipAddressCount": 5
    },
    "agentPoolProfiles": [
      {
        "name": "agentpool1",
        "count": 1,
        "vmSize": "Standard_A3",
        "vnetSubnetId": "/subscriptions/[REDACTED]/resourceGroups/[REDACTED]/providers/Microsoft.Network/virtualNetworks/[REDACTED]/subnets/[REDACTED]",
        "availabilityProfile": "AvailabilitySet",
         "name": "acusnlpk8s"
      }
    ],
    "linuxProfile": {
      "adminUsername": "azureuser",
      "ssh": {
        "publicKeys": [
          {
            "keyData": "[REDACTED]"
         }
        ]
      }
    },
    "servicePrincipalProfile": {
      "clientId": "[REDACTED]",
      "secret": "[REDACTED]"
    }
  }
}

System resolver settings :

:~# cat /etc/resolv.conf
# Dynamic resolv.conf(5) file for glibc resolver(3) generated by resolvconf(8)
#     DO NOT EDIT THIS FILE BY HAND -- YOUR CHANGES WILL BE OVERWRITTEN
nameserver 10.160.35.137
nameserver 10.160.35.136
nameserver 10.165.65.8
search reddog.microsoft.com
VipinPS commented 6 years ago

Duplicate of #1603

BAMSHK commented 4 years ago

Was it finally resolved? How was it resolved?