kube-dns and dashboard get into a crash loop

jazzsir commented 7 years ago

I've deployed the DNS and Dashboard Add-on, according to Step 5: Deploy Add-ons, but they get into a crash loop. My installation procedure is the same as Manual Installation except for adding "--storage-backend=etcd2" and "--storage-media-type=application/json" to kube-apiserver.yaml because apiserver pod periodically restarts.

details as the below:

hardware configuration: bare metal VM(KVM)

master01 ~ # cat /etc/*release*
DISTRIB_ID="Container Linux by CoreOS"
DISTRIB_RELEASE=1353.7.0
DISTRIB_CODENAME="Ladybug"
DISTRIB_DESCRIPTION="Container Linux by CoreOS 1353.7.0 (Ladybug)"
NAME="Container Linux by CoreOS"
ID=coreos
VERSION=1353.7.0
VERSION_ID=1353.7.0
BUILD_ID=2017-04-26-2154
PRETTY_NAME="Container Linux by CoreOS 1353.7.0 (Ladybug)"
ANSI_COLOR="38;5;75"
HOME_URL="https://coreos.com/"
BUG_REPORT_URL="https://issues.coreos.com"

master01 ~ # kubectl version
Client Version: version.Info{Major:"1", Minor:"6", GitVersion:"v1.6.1", GitCommit:"b0b7a323cc5a4a2019b2e9520c21c7830b7f708e", GitTreeState:"clean", BuildDate:"2017-04-03T20:44:38Z", GoVersion:"go1.7.5", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"6", GitVersion:"v1.6.1+coreos.0", GitCommit:"9212f77ed8c169a0afa02e58dce87913c6387b3e", GitTreeState:"clean", BuildDate:"2017-04-04T00:32:53Z", GoVersion:"go1.7.5", Compiler:"gc", Platform:"linux/amd64"}

master01 ~ # kubectl get pods --namespace=kube-system -o wide
NAME                                   READY     STATUS             RESTARTS   AGE       IP             NODE
kube-apiserver-172.30.1.100            1/1       Running             0         3m        172.30.1.100   172.30.1.100
kube-controller-manager-172.30.1.100   1/1       Running             7         3h        172.30.1.100   172.30.1.100
kube-dns-v20-gg303                     2/3       CrashLoopBackOff   56         1h        10.2.68.2      172.30.1.101
kube-proxy-172.30.1.100                1/1       Running             3         3h        172.30.1.100   172.30.1.100
kube-proxy-172.30.1.101                1/1       Running             2         3h        172.30.1.101   172.30.1.101
kube-proxy-172.30.1.102                1/1       Running             2         3h        172.30.1.102   172.30.1.102
kube-proxy-172.30.1.103                1/1       Running             2         3h        172.30.1.103   172.30.1.103
kube-scheduler-172.30.1.100            1/1       Running             5         3h        172.30.1.100   172.30.1.100
kubernetes-dashboard-v1.6.0-p98p6      0/1       CrashLoopBackOff   24         1h        10.2.2.2       172.30.1.102

kube-dns logs

master01 ~ # kubectl describe pods kube-dns-v20-gg303 --namespace=kube-system
Name:           kube-dns-v20-gg303
Namespace:      kube-system
Node:           172.30.1.101/172.30.1.101
Start Time:     Thu, 18 May 2017 20:13:40 +0900
Labels:         k8s-app=kube-dns
                version=v20
Annotations:    kubernetes.io/created-by={"kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"ReplicationController","namespace":"kube-system","name":"kube-dns-v20","uid":"0cb39011-3bbb-11e7-87e9-5254...
                scheduler.alpha.kubernetes.io/critical-pod=
                scheduler.alpha.kubernetes.io/tolerations=[{"key":"CriticalAddonsOnly", "operator":"Exists"}]
Status:         Running
IP:             10.2.68.2
Controllers:    ReplicationController/kube-dns-v20
Containers:
  kubedns:
    Container ID:       docker://85b373f7ae5a106d995f47a2c8c4af6ff1d531f2033a193043e6e380a904a10c
    Image:              gcr.io/google_containers/kubedns-amd64:1.8
    Image ID:           docker-pullable://gcr.io/google_containers/kubedns-amd64@sha256:39264fd3c998798acdf4fe91c556a6b44f281b6c5797f464f92c3b561c8c808c
    Ports:              10053/UDP, 10053/TCP
    Args:
      --domain=cluster.local.
      --dns-port=10053
    State:              Running
      Started:          Thu, 18 May 2017 22:04:46 +0900
    Last State:         Terminated
      Reason:           Error
      Exit Code:        137
      Started:          Mon, 01 Jan 0001 00:00:00 +0000
      Finished:         Thu, 18 May 2017 22:04:46 +0900
    Ready:              False
    Restart Count:      30
    Limits:
      memory:   170Mi
    Requests:
      cpu:              100m
      memory:           70Mi
    Liveness:           http-get http://:8080/healthz-kubedns delay=60s timeout=5s period=10s #success=1 #failure=5
    Readiness:          http-get http://:8081/readiness delay=3s timeout=5s period=10s #success=1 #failure=3
    Environment:        <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-722hj (ro)
  dnsmasq:
    Container ID:       docker://f298d29fb13e2c398c5736478f237c55b4117681d2fa7111f1aba86eb867a247
    Image:              gcr.io/google_containers/kube-dnsmasq-amd64:1.4
    Image ID:           docker-pullable://gcr.io/google_containers/kube-dnsmasq-amd64@sha256:a722df15c0cf87779aad8ba2468cf072dd208cb5d7cfcaedd90e66b3da9ea9d2
    Ports:              53/UDP, 53/TCP
    Args:
      --cache-size=1000
      --no-resolv
      --server=127.0.0.1#10053
      --log-facility=-
    State:              Running
      Started:          Thu, 18 May 2017 22:05:26 +0900
    Last State:         Terminated
      Reason:           Error
      Exit Code:        137
      Started:          Mon, 01 Jan 0001 00:00:00 +0000
      Finished:         Thu, 18 May 2017 22:05:26 +0900
    Ready:              True
    Restart Count:      30
    Liveness:           http-get http://:8080/healthz-dnsmasq delay=60s timeout=5s period=10s #success=1 #failure=5
    Environment:        <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-722hj (ro)
  healthz:
    Container ID:       docker://47b39617e5fbaa7ab3a01ecdb4d2fa666948dcbbff1a6c71854dc66fae603d2a
    Image:              gcr.io/google_containers/exechealthz-amd64:1.2
    Image ID:           docker-pullable://gcr.io/google_containers/exechealthz-amd64@sha256:503e158c3f65ed7399f54010571c7c977ade7fe59010695f48d9650d83488c0a
    Port:               8080/TCP
    Args:
      --cmd=nslookup kubernetes.default.svc.cluster.local 127.0.0.1 >/dev/null
      --url=/healthz-dnsmasq
      --cmd=nslookup kubernetes.default.svc.cluster.local 127.0.0.1:10053 >/dev/null
      --url=/healthz-kubedns
      --port=8080
      --quiet
    State:              Running
      Started:          Thu, 18 May 2017 21:57:28 +0900
    Last State:         Terminated
      Reason:           Completed
      Exit Code:        0
      Started:          Mon, 01 Jan 0001 00:00:00 +0000
      Finished:         Thu, 18 May 2017 21:56:43 +0900
    Ready:              True
    Restart Count:      2
    Limits:
      memory:   50Mi
    Requests:
      cpu:              10m
      memory:           50Mi
    Environment:        <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-722hj (ro)
Conditions:
  Type          Status
  Initialized   True 
  Ready         False 
  PodScheduled  True 
Volumes:
  default-token-722hj:
    Type:       Secret (a volume populated by a Secret)
    SecretName: default-token-722hj
    Optional:   false
QoS Class:      Burstable
Node-Selectors: <none>
Tolerations:    <none>
Events:
  FirstSeen     LastSeen        Count   From                    SubObjectPath                   Type            Reason          Message
  ---------     --------        -----   ----                    -------------                   --------        ------          -------
  58m           58m             1       kubelet, 172.30.1.101                                   Normal          SandboxChanged  Pod sandbox changed, it will be killed and re-created.
  58m           58m             1       kubelet, 172.30.1.101   spec.containers{healthz}        Normal          Pulled          Container image "gcr.io/google_containers/exechealthz-amd64:1.2" already present on machine
.
.
.
  51m           51m             1       kubelet, 172.30.1.101   spec.containers{dnsmasq}        Normal          Killing         Killing container with id docker://26d92966bcc447662734359705db6f226fa5140b4e0937434c1113ab5883ac2d:pod "kube-dns-v20-gg303_kube-system(0cb52788-3bbb-11e7-87e9-5254000e34f9)" container "dnsmasq" is unhealthy, it will be killed and re-created.
  50m           50m             1       kubelet, 172.30.1.101   spec.containers{dnsmasq}        Normal          Started         Started container with id cd42a0e9a0e12547d61c661eb91b6e5d4afd22b10c00a86ad099295f8f71161f
  50m           50m             1       kubelet, 172.30.1.101   spec.containers{dnsmasq}        Normal          Created         Created container with id cd42a0e9a0e12547d61c661eb91b6e5d4afd22b10c00a86ad099295f8f71161f
  50m           50m             1       kubelet, 172.30.1.101   spec.containers{kubedns}        Normal          Killing         Killing container with id docker://d6010d6aba3b1c2fb918a6f35fdd2631ad77f968b1f2f560e04b0375ab82fe5a:pod "kube-dns-v20-gg303_kube-system(0cb52788-3bbb-11e7-87e9-5254000e34f9)" container "kubedns" is unhealthy, it will be killed and re-created.
  50m           50m             1       kubelet, 172.30.1.101   spec.containers{kubedns}        Normal          Created         Created container with id 7e8f43084708ce0ddf8dbb7ef0b4216fe186df8b90bb9048776c9435eebbe00c
  50m           50m             1       kubelet, 172.30.1.101   spec.containers{kubedns}        Normal          Started         Started container with id 7e8f43084708ce0ddf8dbb7ef0b4216fe186df8b90bb9048776c9435eebbe00c
  49m           49m             1       kubelet, 172.30.1.101   spec.containers{dnsmasq}        Normal          Killing         Killing container with id docker://cd42a0e9a0e12547d61c661eb91b6e5d4afd22b10c00a86ad099295f8f71161f:pod "kube-dns-v20-gg303_kube-system(0cb52788-3bbb-11e7-87e9-5254000e34f9)" container "dnsmasq" is unhealthy, it will be killed and re-created.
  48m           48m             1       kubelet, 172.30.1.101   spec.containers{kubedns}        Normal          Killing         Killing container with id docker://7e8f43084708ce0ddf8dbb7ef0b4216fe186df8b90bb9048776c9435eebbe00c:pod "kube-dns-v20-gg303_kube-system(0cb52788-3bbb-11e7-87e9-5254000e34f9)" container "kubedns" is unhealthy, it will be killed and re-created.
  47m           47m             1       kubelet, 172.30.1.101   spec.containers{dnsmasq}        Normal          Killing         Killing container with id docker://61be0b1575a99f2140b1baf98137af50ac49d2cd02b6d0221f594870301a8423:pod "kube-dns-v20-gg303_kube-system(0cb52788-3bbb-11e7-87e9-5254000e34f9)" container "dnsmasq" is unhealthy, it will be killed and re-created.
  46m           45m             8       kubelet, 172.30.1.101                                   Warning         FailedSync      Error syncing pod, skipping: failed to "StartContainer" for "kubedns" with CrashLoopBackOff: "Back-off 2m40s restarting failed container=kubedns pod=kube-dns-v20-gg303_kube-system(0cb52788-3bbb-11e7-87e9-5254000e34f9)"

  44m   44m     1       kubelet, 172.30.1.101           Warning FailedSync      Error syncing pod, skipping: [failed to "StartContainer" for "dnsmasq" with CrashLoopBackOff: "Back-off 5m0s restarting failed container=dnsmasq pod=kube-dns-v20-gg303_kube-system(0cb52788-3bbb-11e7-87e9-5254000e34f9)", failed to "StartContainer" for "kubedns" with CrashLoopBackOff: "Back-off 2m40s restarting failed container=kubedns pod=kube-dns-v20-gg303_kube-system(0cb52788-3bbb-11e7-87e9-5254000e34f9)"]
  44m   44m     6       kubelet, 172.30.1.101           Warning FailedSync      Error syncing pod, skipping: [failed to "StartContainer" for "kubedns" with CrashLoopBackOff: "Back-off 2m40s restarting failed container=kubedns pod=kube-dns-v20-gg303_kube-system(0cb52788-3bbb-11e7-87e9-5254000e34f9)", failed to "StartContainer" for "dnsmasq" with CrashLoopBackOff: "Back-off 5m0s restarting failed container=dnsmasq pod=kube-dns-v20-gg303_kube-system(0cb52788-3bbb-11e7-87e9-5254000e34f9)"]
  41m   16m     5       kubelet, 172.30.1.101           Warning FailedSync      Error syncing pod, skipping: [failed to "StartContainer" for "dnsmasq" with CrashLoopBackOff: "Back-off 5m0s restarting failed container=dnsmasq pod=kube-dns-v20-gg303_kube-system(0cb52788-3bbb-11e7-87e9-5254000e34f9)", failed to "StartContainer" for "kubedns" with CrashLoopBackOff: "Back-off 5m0s restarting failed container=kubedns pod=kube-dns-v20-gg303_kube-system(0cb52788-3bbb-11e7-87e9-5254000e34f9)"]
  42m   15m     40      kubelet, 172.30.1.101           Warning FailedSync      Error syncing pod, skipping: [failed to "StartContainer" for "kubedns" with CrashLoopBackOff: "Back-off 5m0s restarting failed container=kubedns pod=kube-dns-v20-gg303_kube-system(0cb52788-3bbb-11e7-87e9-5254000e34f9)", failed to "StartContainer" for "dnsmasq" with CrashLoopBackOff: "Back-off 5m0s restarting failed container=dnsmasq pod=kube-dns-v20-gg303_kube-system(0cb52788-3bbb-11e7-87e9-5254000e34f9)"]
  58m   13m     14      kubelet, 172.30.1.101   spec.containers{dnsmasq}        Normal  Pulled          Container image "gcr.io/google_containers/kube-dnsmasq-amd64:1.4" already present on machine
  39m   12m     56      kubelet, 172.30.1.101                                   Warning FailedSync      Error syncing pod, skipping: failed to "StartContainer" for "kubedns" with CrashLoopBackOff: "Back-off 5m0s restarting failed container=kubedns pod=kube-dns-v20-gg303_kube-system(0cb52788-3bbb-11e7-87e9-5254000e34f9)"

  58m   9m      14      kubelet, 172.30.1.101   spec.containers{kubedns}        Normal  Pulled          Container image "gcr.io/google_containers/kubedns-amd64:1.8" already present on machine
  46m   9m      18      kubelet, 172.30.1.101   spec.containers{kubedns}        Normal  Killing         (events with common reason combined)
  48m   9m      20      kubelet, 172.30.1.101   spec.containers{kubedns}        Normal  Created         (events with common reason combined)
  48m   9m      20      kubelet, 172.30.1.101   spec.containers{kubedns}        Normal  Started         (events with common reason combined)
  43m   9m      59      kubelet, 172.30.1.101                                   Warning FailedSync      Error syncing pod, skipping: failed to "StartContainer" for "dnsmasq" with CrashLoopBackOff: "Back-off 5m0s restarting failed container=dnsmasq pod=kube-dns-v20-gg303_kube-system(0cb52788-3bbb-11e7-87e9-5254000e34f9)"

  46m   9m      227     kubelet, 172.30.1.101   spec.containers{kubedns}        Warning BackOff         Back-off restarting failed container
  8m    8m      1       kubelet, 172.30.1.101                                   Normal  SandboxChanged  Pod sandbox changed, it will be killed and re-created.
.
.
.
  2m    2m      1       kubelet, 172.30.1.101   spec.containers{kubedns}        Normal  Created         Created container with id 1f74017422e1585ff27295056382736fc005befcb17886136e24d711e7587427
  2m    2m      1       kubelet, 172.30.1.101   spec.containers{dnsmasq}        Normal  Started         Started container with id 7f3b8ed565c7125e8e53c827b6eb5922e2bee30fc3f220505e1a322ec3d8b213
  2m    2m      1       kubelet, 172.30.1.101   spec.containers{dnsmasq}        Normal  Killing         Killing container with id docker://c11131b03e29341502a783713f103026a27ad3bcab39932906ba2d477593883c:pod "kube-dns-v20-gg303_kube-system(0cb52788-3bbb-11e7-87e9-5254000e34f9)" container "dnsmasq" is unhealthy, it will be killed and re-created.
  2m    2m      1       kubelet, 172.30.1.101   spec.containers{dnsmasq}        Normal  Created         Created container with id 7f3b8ed565c7125e8e53c827b6eb5922e2bee30fc3f220505e1a322ec3d8b213
  8m    1m      5       kubelet, 172.30.1.101   spec.containers{kubedns}        Normal  Pulled          Container image "gcr.io/google_containers/kubedns-amd64:1.8" already present on machine
  1m    1m      1       kubelet, 172.30.1.101   spec.containers{kubedns}        Normal  Killing         Killing container with id docker://1f74017422e1585ff27295056382736fc005befcb17886136e24d711e7587427:pod "kube-dns-v20-gg303_kube-system(0cb52788-3bbb-11e7-87e9-5254000e34f9)" container "kubedns" is unhealthy, it will be killed and re-created.
  32s   32s     1       kubelet, 172.30.1.101   spec.containers{dnsmasq}        Normal  Killing         Killing container with id docker://7f3b8ed565c7125e8e53c827b6eb5922e2bee30fc3f220505e1a322ec3d8b213:pod "kube-dns-v20-gg303_kube-system(0cb52788-3bbb-11e7-87e9-5254000e34f9)" container "dnsmasq" is unhealthy, it will be killed and re-created.
  1m    32s     2       kubelet, 172.30.1.101   spec.containers{kubedns}        Normal  Created         (events with common reason combined)
  1m    32s     2       kubelet, 172.30.1.101   spec.containers{kubedns}        Normal  Started         (events with common reason combined)
  8m    32s     5       kubelet, 172.30.1.101   spec.containers{dnsmasq}        Normal  Pulled          Container image "gcr.io/google_containers/kube-dnsmasq-amd64:1.4" already present on machine

master01 ~ # kubectl logs kube-dns-v20-gg303 -c kubedns --namespace=kube-system
I0518 13:09:09.956186       1 server.go:94] Using https://10.3.0.1:443 for kubernetes master, kubernetes API: <nil>
I0518 13:09:09.958748       1 server.go:99] v1.5.0-alpha.0.1651+7dcae5edd84f06-dirty
I0518 13:09:09.958772       1 server.go:101] FLAG: --alsologtostderr="false"
I0518 13:09:09.958793       1 server.go:101] FLAG: --dns-port="10053"
I0518 13:09:09.958836       1 server.go:101] FLAG: --domain="cluster.local."
I0518 13:09:09.958857       1 server.go:101] FLAG: --federations=""
I0518 13:09:09.958867       1 server.go:101] FLAG: --healthz-port="8081"
I0518 13:09:09.958885       1 server.go:101] FLAG: --kube-master-url=""
I0518 13:09:09.958895       1 server.go:101] FLAG: --kubecfg-file=""
I0518 13:09:09.958905       1 server.go:101] FLAG: --log-backtrace-at=":0"
I0518 13:09:09.958916       1 server.go:101] FLAG: --log-dir=""
I0518 13:09:09.958926       1 server.go:101] FLAG: --log-flush-frequency="5s"
I0518 13:09:09.959007       1 server.go:101] FLAG: --logtostderr="true"
I0518 13:09:09.959020       1 server.go:101] FLAG: --stderrthreshold="2"
I0518 13:09:09.959030       1 server.go:101] FLAG: --v="0"
I0518 13:09:09.959039       1 server.go:101] FLAG: --version="false"
I0518 13:09:09.959049       1 server.go:101] FLAG: --vmodule=""
I0518 13:09:09.959090       1 server.go:138] Starting SkyDNS server. Listening on port:10053
I0518 13:09:09.959184       1 server.go:145] skydns: metrics enabled on : /metrics:
I0518 13:09:09.959208       1 dns.go:166] Waiting for service: default/kubernetes
I0518 13:09:09.959534       1 logs.go:41] skydns: ready for queries on cluster.local. for tcp://0.0.0.0:10053 [rcache 0]
I0518 13:09:09.959554       1 logs.go:41] skydns: ready for queries on cluster.local. for udp://0.0.0.0:10053 [rcache 0]
I0518 13:09:39.959638       1 dns.go:172] Ignoring error while waiting for service default/kubernetes: Get https://10.3.0.1:443/api/v1/namespaces/default/services/kubernetes: dial tcp 10.3.0.1:443: i/o timeout. Sleeping 1s before retrying.
E0518 13:09:39.960119       1 reflector.go:214] pkg/dns/dns.go:154: Failed to list *api.Endpoints: Get https://10.3.0.1:443/api/v1/endpoints?resourceVersion=0: dial tcp 10.3.0.1:443: i/o timeout
E0518 13:09:39.960339       1 reflector.go:214] pkg/dns/dns.go:155: Failed to list *api.Service: Get https://10.3.0.1:443/api/v1/services?resourceVersion=0: dial tcp 10.3.0.1:443: i/o timeout
I0518 13:10:10.960181       1 dns.go:172] Ignoring error while waiting for service default/kubernetes: Get https://10.3.0.1:443/api/v1/namespaces/default/services/kubernetes: dial tcp 10.3.0.1:443: i/o timeout. Sleeping 1s before retrying.
E0518 13:10:10.960470       1 reflector.go:214] pkg/dns/dns.go:154: Failed to list *api.Endpoints: Get https://10.3.0.1:443/api/v1/endpoints?resourceVersion=0: dial tcp 10.3.0.1:443: i/o timeout
E0518 13:10:10.960532       1 reflector.go:214] pkg/dns/dns.go:155: Failed to list *api.Service: Get https://10.3.0.1:443/api/v1/services?resourceVersion=0: dial tcp 10.3.0.1:443: i/o timeout
I0518 13:10:16.832928       1 server.go:133] Received signal: terminated, will exit when the grace period ends
E0518 13:10:41.960854       1 reflector.go:214] pkg/dns/dns.go:155: Failed to list *api.Service: Get https://10.3.0.1:443/api/v1/services?resourceVersion=0: dial tcp 10.3.0.1:443: i/o timeout
I0518 13:10:41.960865       1 dns.go:172] Ignoring error while waiting for service default/kubernetes: Get https://10.3.0.1:443/api/v1/namespaces/default/services/kubernetes: dial tcp 10.3.0.1:443: i/o timeout. Sleeping 1s before retrying.
E0518 13:10:41.960938       1 reflector.go:214] pkg/dns/dns.go:154: Failed to list *api.Endpoints: Get https://10.3.0.1:443/api/v1/endpoints?resourceVersion=0: dial tcp 10.3.0.1:443: i/o timeout

kubernetes-dashboard logs

master01 ~ # kubectl describe pods kubernetes-dashboard-v1.6.0-p98p6 --namespace=kube-system
Name:           kubernetes-dashboard-v1.6.0-p98p6
Namespace:      kube-system
Node:           172.30.1.102/172.30.1.102
Start Time:     Thu, 18 May 2017 20:30:44 +0900
Labels:         k8s-app=kubernetes-dashboard
                kubernetes.io/cluster-service=true
                version=v1.6.0
Annotations:    kubernetes.io/created-by={"kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"ReplicationController","namespace":"kube-system","name":"kubernetes-dashboard-v1.6.0","uid":"6e6bd6a7-3bbd...
                scheduler.alpha.kubernetes.io/critical-pod=
                scheduler.alpha.kubernetes.io/tolerations=[{"key":"CriticalAddonsOnly", "operator":"Exists"}]
Status:         Running
IP:             10.2.2.2
Controllers:    ReplicationController/kubernetes-dashboard-v1.6.0
Containers:
  kubernetes-dashboard:
    Container ID:       docker://ccda3aa5626ed514c2676481f55dff27f95395cc758d04764c219dc292487054
    Image:              gcr.io/google_containers/kubernetes-dashboard-amd64:v1.6.0
    Image ID:           docker-pullable://gcr.io/google_containers/kubernetes-dashboard-amd64@sha256:4ad64dfa7159ff4a99a65a4f96432f2fdb6542857cf230858b3159017833a882
    Port:               9090/TCP
    State:              Waiting
      Reason:           CrashLoopBackOff
    Last State:         Terminated
      Reason:           Error
      Exit Code:        1
      Started:          Mon, 01 Jan 0001 00:00:00 +0000
      Finished:         Thu, 18 May 2017 22:12:12 +0900
    Ready:              False
    Restart Count:      27
    Limits:
      cpu:      100m
      memory:   50Mi
    Requests:
      cpu:              100m
      memory:           50Mi
    Liveness:           http-get http://:9090/ delay=30s timeout=30s period=10s #success=1 #failure=3
    Environment:        <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-722hj (ro)
Conditions:
  Type          Status
  Initialized   True 
  Ready         False 
  PodScheduled  True 
Volumes:
  default-token-722hj:
    Type:       Secret (a volume populated by a Secret)
    SecretName: default-token-722hj
    Optional:   false
QoS Class:      Guaranteed
Node-Selectors: <none>
Tolerations:    <none>
Events:
  FirstSeen     LastSeen        Count   From                    SubObjectPath   Type            Reason          Message
  ---------     --------        -----   ----                    -------------   --------        ------          -------
  59m           57m             13      kubelet, 172.30.1.102                   Warning         FailedSync      Error syncing pod, skipping: failed to "StartContainer" for "kubernetes-dashboard" with CrashLoopBackOff: "Back-off 2m40s restarting failed container=kubernetes-dashboard pod=kubernetes-dashboard-v1.6.0-p98p6_kube-system(6e6c7bae-3bbd-11e7-87e9-5254000e34f9)"

  57m   57m     1       kubelet, 172.30.1.102   spec.containers{kubernetes-dashboard}   Normal  Created         Created container with id  d370ec3d54856b2ddc9c9eae663cd3250f875bab40b44568f6c4ebfb525c4036
.
.
.
  40m   40m     1       kubelet, 172.30.1.102   spec.containers{kubernetes-dashboard}   Normal  Started         Started container with id d370ec3d54856b2ddc9c9eae663cd3250f875bab40b44568f6c4ebfb525c4036
  1h    17m     13      kubelet, 172.30.1.102   spec.containers{kubernetes-dashboard}   Normal  Pulled          Container image "gcr.io/google_containers/kubernetes-dashboard-amd64:v1.6.0" already present on machine
  34m   17m     4       kubelet, 172.30.1.102   spec.containers{kubernetes-dashboard}   Normal  Created         (events with common reason combined)
  34m   17m     4       kubelet, 172.30.1.102   spec.containers{kubernetes-dashboard}   Normal  Started         (events with common reason combined)
  56m   17m     165     kubelet, 172.30.1.102                                           Warning FailedSync      Error syncing pod, skipping: failed to "StartContainer" for "kubernetes-dashboard" with CrashLoopBackOff: "Back-off 5m0s restarting failed container=kubernetes-dashboard pod=kubernetes-dashboard-v1.6.0-p98p6_kube-system(6e6c7bae-3bbd-11e7-87e9-5254000e34f9)"

  1h    17m     194     kubelet, 172.30.1.102   spec.containers{kubernetes-dashboard}   Warning BackOff         Back-off restarting failed container
  15m   15m     1       kubelet, 172.30.1.102                                           Normal  SandboxChanged  Pod sandbox changed, it will be killed and re-created.
  15m   15m     1       kubelet, 172.30.1.102   spec.containers{kubernetes-dashboard}   Normal  Started         Started container with id 8e8b5d57b22f1e5b77c90169d56de6168ffbe9bd2fec386b80f8ed46cca9402b
  15m   15m     1       kubelet, 172.30.1.102   spec.containers{kubernetes-dashboard}   Normal  Created         Created container with id 8e8b5d57b22f1e5b77c90169d56de6168ffbe9bd2fec386b80f8ed46cca9402b
.
.
.
  6m    2s      30      kubelet, 172.30.1.102                                           Warning FailedSync      Error syncing pod, skipping: failed to "StartContainer" for "kubernetes-dashboard" with CrashLoopBackOff: "Back-off 5m0s restarting failed container=kubernetes-dashboard pod=kubernetes-dashboard-v1.6.0-p98p6_kube-system(6e6c7bae-3bbd-11e7-87e9-5254000e34f9)"

  15m   2s      58      kubelet, 172.30.1.102   spec.containers{kubernetes-dashboard}   Warning BackOff Back-off restarting failed container

master01 ~ # kubectl logs kubernetes-dashboard-v1.6.0-p98p6 -c kubernetes-dashboard --namespace=kube-system
Using HTTP port: 9090
Creating API server client for https://10.3.0.1:443
Error while initializing connection to Kubernetes apiserver. This most likely means that the cluster is misconfigured (e.g., it has invalid apiserver certificates or service accounts configuration) or the --apiserver-host param points to a server that does not exist. Reason: Get https://10.3.0.1:443/version: dial tcp 10.3.0.1:443: i/o timeout
Refer to the troubleshooting guide for more information: https://github.com/kubernetes/dashboard/blob/master/docs/user-guide/troubleshooting.md

etc..

master01 ~ # systemctl status kubelet
    kubelet.service
   Loaded: loaded (/etc/systemd/system/kubelet.service; enabled; vendor preset: disabled)
   Active: active (running) since Thu 2017-05-18 21:56:57 KST; 47min ago
  Process: 799 ExecStartPre=/usr/bin/rkt rm --uuid-file=/var/run/kubelet-pod.uuid (code=exited, status=254)
  Process: 796 ExecStartPre=/usr/bin/mkdir -p /var/log/containers (code=exited, status=0/SUCCESS)
  Process: 730 ExecStartPre=/usr/bin/mkdir -p /etc/kubernetes/manifests (code=exited, status=0/SUCCESS)
 Main PID: 832 (kubelet)
    Tasks: 18 (limit: 32768)
   Memory: 794.0M
      CPU: 54.591s
   CGroup: /system.slice/kubelet.service
                  832 /kubelet --api-servers=http://127.0.0.1:8080 --register-schedulable=false --cni-conf-dir=/etc/kubernetes/cni/net.d --network-plugin=cni --container-runtime=docker --allow-privileged=true --pod-manifest-path=/etc/kubernetes/manifests --hostna
me-override=172.30.1.100 --cluster_dns=10.3.0.10 --cluster_domain=cluster.local
                 1383 journalctl -k -f

May 18 21:57:42 master01 kubelet-wrapper[832]: W0518 12:57:42.229140     832 conversion.go:110] Could not get instant cpu stats: different number of cpus
May 18 22:02:11 master01 kubelet-wrapper[832]: E0518 13:02:11.753371     832 container_manager_linux.go:638] error opening pid file /run/docker/libcontainerd/docker-containerd.pid: open /run/docker/libcontainerd/docker-containerd.pid: no such file or directory
May 18 22:07:11 master01 kubelet-wrapper[832]: E0518 13:07:11.774380     832 container_manager_linux.go:638] error opening pid file /run/docker/libcontainerd/docker-containerd.pid: open /run/docker/libcontainerd/docker-containerd.pid: no such file or directory
May 18 22:12:11 master01 kubelet-wrapper[832]: E0518 13:12:11.788341     832 container_manager_linux.go:638] error opening pid file /run/docker/libcontainerd/docker-containerd.pid: open /run/docker/libcontainerd/docker-containerd.pid: no such file or directory
May 18 22:17:11 master01 kubelet-wrapper[832]: E0518 13:17:11.801143     832 container_manager_linux.go:638] error opening pid file /run/docker/libcontainerd/docker-containerd.pid: open /run/docker/libcontainerd/docker-containerd.pid: no such file or directory
May 18 22:22:11 master01 kubelet-wrapper[832]: E0518 13:22:11.813504     832 container_manager_linux.go:638] error opening pid file /run/docker/libcontainerd/docker-containerd.pid: open /run/docker/libcontainerd/docker-containerd.pid: no such file or directory
May 18 22:27:11 master01 kubelet-wrapper[832]: E0518 13:27:11.825092     832 container_manager_linux.go:638] error opening pid file /run/docker/libcontainerd/docker-containerd.pid: open /run/docker/libcontainerd/docker-containerd.pid: no such file or directory
May 18 22:32:11 master01 kubelet-wrapper[832]: E0518 13:32:11.836884     832 container_manager_linux.go:638] error opening pid file /run/docker/libcontainerd/docker-containerd.pid: open /run/docker/libcontainerd/docker-containerd.pid: no such file or directory
May 18 22:37:11 master01 kubelet-wrapper[832]: E0518 13:37:11.848623     832 container_manager_linux.go:638] error opening pid file /run/docker/libcontainerd/docker-containerd.pid: open /run/docker/libcontainerd/docker-containerd.pid: no such file or directory
May 18 22:42:11 master01 kubelet-wrapper[832]: E0518 13:42:11.869606     832 container_manager_linux.go:638] error opening pid file /run/docker/libcontainerd/docker-containerd.pid: open /run/docker/libcontainerd/docker-containerd.pid: no such file or directory

master01 ~ # cat /var/log/containers/kube-apiserver-172.30.1.100_kube-system_kube-apiserver-b156099b99dd3542897773d74f1d5e425c5abe5c5e62d16adb42a635c77eda49.log
{"log":"E0518 12:57:19.215213       1 reflector.go:201] k8s.io/kubernetes/pkg/client/informers/informers_generated/internalversion/factory.go:70: Failed to list *api.ResourceQuota: Get https://localhost:443/api/v1/resourcequotas?resourceVersion=0: dial tcp [::1]:443: getsockopt: connection refused\n","stream":"stderr","time":"2017-05-18T12:57:19.215844426Z"}
{"log":"E0518 12:57:19.215544       1 reflector.go:201] k8s.io/kubernetes/pkg/client/informers/informers_generated/internalversion/factory.go:70: Failed to list *api.Secret: Get https://localhost:443/api/v1/secrets?resourceVersion=0: dial tcp [::1]:443: getsockopt: connection refused\n","stream":"stderr","time":"2017-05-18T12:57:19.215893429Z"}
{"log":"E0518 12:57:19.215767       1 reflector.go:201] k8s.io/kubernetes/pkg/client/informers/informers_generated/internalversion/factory.go:70: Failed to list *api.Namespace: Get https://localhost:443/api/v1/namespaces?resourceVersion=0: dial tcp [::1]:443: getsockopt: connection refused\n","stream":"stderr","time":"2017-05-18T12:57:19.215900977Z"}
{"log":"E0518 12:57:19.216006       1 reflector.go:201] k8s.io/kubernetes/pkg/client/informers/informers_generated/internalversion/factory.go:70: Failed to list *api.LimitRange: Get https://localhost:443/api/v1/limitranges?resourceVersion=0: dial tcp [::1]:443: getsockopt: connection refused\n","stream":"stderr","time":"2017-05-18T12:57:19.216215568Z"}
{"log":"E0518 12:57:19.216226       1 reflector.go:201] k8s.io/kubernetes/pkg/client/informers/informers_generated/internalversion/factory.go:70: Failed to list *api.ServiceAccount: Get https://localhost:443/api/v1/serviceaccounts?resourceVersion=0: dial tcp [::1]:443: getsockopt: connection refused\n","stream":"stderr","time":"2017-05-18T12:57:19.216377675Z"}
{"log":"E0518 12:57:19.216427       1 reflector.go:201] k8s.io/kubernetes/pkg/client/informers/informers_generated/internalversion/factory.go:70: Failed to list *storage.StorageClass: Get https://localhost:443/apis/storage.k8s.io/v1beta1/storageclasses?resourceVersion=0: dial tcp [::1]:443: getsockopt: connection refused\n","stream":"stderr","time":"2017-05-18T12:57:19.216627425Z"}
{"log":"[restful] 2017/05/18 12:57:19 log.go:30: [restful/swagger] listing is available at https://172.30.1.100:443/swaggerapi/\n","stream":"stderr","time":"2017-05-18T12:57:19.242919533Z"}
{"log":"[restful] 2017/05/18 12:57:19 log.go:30: [restful/swagger] https://172.30.1.100:443/swaggerui/ is mapped to folder /swagger-ui/\n","stream":"stderr","time":"2017-05-18T12:57:19.242957445Z"}
.
.
.

master01 ~ # ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 52:54:00:0e:34:f9 brd ff:ff:ff:ff:ff:ff
    inet 172.30.1.100/24 brd 172.30.1.255 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::5054:ff:fe0e:34f9/64 scope link 
       valid_lft forever preferred_lft forever
3: flannel.1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN group default 
    link/ether e6:9d:ba:c3:93:a6 brd ff:ff:ff:ff:ff:ff
    inet 10.2.4.0/32 scope global flannel.1
       valid_lft forever preferred_lft forever
    inet6 fe80::e49d:baff:fec3:93a6/64 scope link 
       valid_lft forever preferred_lft forever
4: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default 
    link/ether 02:42:4e:38:b2:65 brd ff:ff:ff:ff:ff:ff
    inet 172.17.0.1/16 scope global docker0
       valid_lft forever preferred_lft forever

jbw976 commented 7 years ago

Not sure if this helps, but we were also running into kubedns pod crash loops after upgrading to Kubernetes 1.6 and were able to workaround it by using Calico. See the last ~10 or so commits in this fork that shows you everything we did: https://github.com/rook/coreos-kubernetes/commits/master

jazzsir commented 7 years ago

@jbw976 I followed instruction at https://github.com/rook/coreos-kubernetes/blob/master/Documentation/getting-started.md and used Calico. But, when I started kubelet, COMMAND attributes(docker ps) of all container(proxy, api server, controller and schedule) was "/pause"

jazzsir commented 7 years ago

@jbw976 I finally deployed a master node using Calico. But, "Set Up the CNI config (optional)" link is dead in "https://github.com/rook/coreos-kubernetes/blob/master/Documentation/deploy-workers.md". Do you know where is the "Set Up the CNI config (optional)" guide? I wish I would finish installation without any auto-configuration tools

page-fault-in-nonpaged-area commented 7 years ago

Same problem here. Would prefer have flannel working before Calico

page-fault-in-nonpaged-area commented 7 years ago

Have been attempting to solve this for a while now.

When I run describe kube-dns I get:

Williams-MacBook-Pro:KubeControl demonfuse$ kubectl describe service kube-dns --namespace=kube-system
Name:           kube-dns
Namespace:      kube-system
Labels:         k8s-app=kube-dns
            kubernetes.io/cluster-service=true
            kubernetes.io/name=KubeDNS
Annotations:        <none>
Selector:       k8s-app=kube-dns
Type:           ClusterIP
IP:         10.3.0.10
Port:           dns 53/UDP
Endpoints:      
Port:           dns-tcp 53/TCP
Endpoints:      
Session Affinity:   None
Events:         <none>

I noticed in the above there are no endpoints. Whereas describe service kubernetes I get endpoints.

Williams-MacBook-Pro:KubeControl demonfuse$ kubectl describe svc kubernetes
Name:           kubernetes
Namespace:      default
Labels:         component=apiserver
            provider=kubernetes
Annotations:        <none>
Selector:       <none>
Type:           ClusterIP
IP:         10.3.0.1
Port:           https   443/TCP
Endpoints:      xx.xx.xx.xx:443
Session Affinity:   ClientIP
Events:         <none>

The logs for kubedns, dnsmasq, and healthz I noticed it's having trouble connecting to 10.3.0.1 > health reports nslookup: can't resolve

Williams-MacBook-Pro:KubeControl demonfuse$ kubectl logs --namespace=kube-system $(kubectl get pods --namespace=kube-system -l k8s-app=kube-dns -o name) -c kubedns
I0531 19:15:18.538655       1 server.go:94] Using https://10.3.0.1:443 for kubernetes master, kubernetes API: <nil>
I0531 19:15:18.539824       1 server.go:99] v1.5.0-alpha.0.1651+7dcae5edd84f06-dirty
I0531 19:15:18.539894       1 server.go:101] FLAG: --alsologtostderr="false"
I0531 19:15:18.539926       1 server.go:101] FLAG: --dns-port="10053"
I0531 19:15:18.540001       1 server.go:101] FLAG: --domain="cluster.local."
I0531 19:15:18.540029       1 server.go:101] FLAG: --federations=""
I0531 19:15:18.540051       1 server.go:101] FLAG: --healthz-port="8081"
I0531 19:15:18.540088       1 server.go:101] FLAG: --kube-master-url=""
I0531 19:15:18.540110       1 server.go:101] FLAG: --kubecfg-file=""
I0531 19:15:18.540128       1 server.go:101] FLAG: --log-backtrace-at=":0"
I0531 19:15:18.540165       1 server.go:101] FLAG: --log-dir=""
I0531 19:15:18.540189       1 server.go:101] FLAG: --log-flush-frequency="5s"
I0531 19:15:18.540210       1 server.go:101] FLAG: --logtostderr="true"
I0531 19:15:18.540244       1 server.go:101] FLAG: --stderrthreshold="2"
I0531 19:15:18.540265       1 server.go:101] FLAG: --v="0"
I0531 19:15:18.540296       1 server.go:101] FLAG: --version="false"
I0531 19:15:18.540338       1 server.go:101] FLAG: --vmodule=""
I0531 19:15:18.540415       1 server.go:138] Starting SkyDNS server. Listening on port:10053
I0531 19:15:18.540533       1 server.go:145] skydns: metrics enabled on : /metrics:
I0531 19:15:18.540598       1 dns.go:166] Waiting for service: default/kubernetes
I0531 19:15:18.541278       1 logs.go:41] skydns: ready for queries on cluster.local. for tcp://0.0.0.0:10053 [rcache 0]
I0531 19:15:18.541340       1 logs.go:41] skydns: ready for queries on cluster.local. for udp://0.0.0.0:10053 [rcache 0]
I0531 19:15:48.542106       1 dns.go:172] Ignoring error while waiting for service default/kubernetes: Get https://10.3.0.1:443/api/v1/namespaces/default/services/kubernetes: dial tcp 10.3.0.1:443: i/o timeout. Sleeping 1s before retrying.
E0531 19:15:48.544209       1 reflector.go:214] pkg/dns/dns.go:154: Failed to list *api.Endpoints: Get https://10.3.0.1:443/api/v1/endpoints?resourceVersion=0: dial tcp 10.3.0.1:443: i/o timeout
E0531 19:15:48.544580       1 reflector.go:214] pkg/dns/dns.go:155: Failed to list *api.Service: Get https://10.3.0.1:443/api/v1/services?resourceVersion=0: dial tcp 10.3.0.1:443: i/o timeout
I0531 19:16:19.543942       1 dns.go:172] Ignoring error while waiting for service default/kubernetes: Get https://10.3.0.1:443/api/v1/namespaces/default/services/kubernetes: dial tcp 10.3.0.1:443: i/o timeout. Sleeping 1s before retrying.
E0531 19:16:19.546421       1 reflector.go:214] pkg/dns/dns.go:155: Failed to list *api.Service: Get https://10.3.0.1:443/api/v1/services?resourceVersion=0: dial tcp 10.3.0.1:443: i/o timeout
E0531 19:16:19.546569       1 reflector.go:214] pkg/dns/dns.go:154: Failed to list *api.Endpoints: Get https://10.3.0.1:443/api/v1/endpoints?resourceVersion=0: dial tcp 10.3.0.1:443: i/o timeout

Williams-MacBook-Pro:KubeControl demonfuse$ kubectl logs --namespace=kube-system $(kubectl get pods --namespace=kube-system -l k8s-app=kube-dns -o name) -c dnsmasq
dnsmasq[1]: started, version 2.76 cachesize 1000
dnsmasq[1]: compile time options: IPv6 GNU-getopt no-DBus no-i18n no-IDN DHCP DHCPv6 no-Lua TFTP no-conntrack ipset auth no-DNSSEC loop-detect inotify
dnsmasq[1]: using nameserver 127.0.0.1#10053
dnsmasq[1]: read /etc/hosts - 7 addresses

Williams-MacBook-Pro:KubeControl demonfuse$ kubectl logs --namespace=kube-system $(kubectl get pods --namespace=kube-system -l k8s-app=kube-dns -o name) -c healthz
2017/05/31 19:09:17 Healthz probe on /healthz-dnsmasq error: Result of last exec: nslookup: can't resolve 'kubernetes.default.svc.cluster.local'
, at 2017-05-31 19:09:15.817652465 +0000 UTC, error exit status 1
2017/05/31 19:09:17 Healthz probe on /healthz-kubedns error: Result of last exec: nslookup: can't resolve 'kubernetes.default.svc.cluster.local'
, at 2017-05-31 19:09:15.816970338 +0000 UTC, error exit status 1
2017/05/31 19:09:27 Healthz probe on /healthz-dnsmasq error: Result of last exec: nslookup: can't resolve 'kubernetes.default.svc.cluster.local'
, at 2017-05-31 19:09:25.813204033 +0000 UTC, error exit status 1
2017/05/31 19:09:27 Healthz probe on /healthz-kubedns error: Result of last exec: nslookup: can't resolve 'kubernetes.default.svc.cluster.local'
, at 2017-05-31 19:09:25.812644469 +0000 UTC, error exit status 1
2017/05/31 19:09:37 Healthz probe on /healthz-dnsmasq error: Result of last exec: nslookup: can't resolve 'kubernetes.default.svc.cluster.local'
, at 2017-05-31 19:09:35.815158454 +0000 UTC, error exit status 1
2017/05/31 19:09:37 Healthz probe on /healthz-kubedns error: Result of last exec: nslookup: can't resolve 'kubernetes.default.svc.cluster.local'
, at 2017-05-31 19:09:35.814596885 +0000 UTC, error exit status 1
2017/05/31 19:09:47 Healthz probe on /healthz-dnsmasq error: Result of last exec: nslookup: can't resolve 'kubernetes.default.svc.cluster.local'
, at 2017-05-31 19:09:45.811774257 +0000 UTC, error exit status 1
2017/05/31 19:09:47 Healthz probe on /healthz-kubedns error: Result of last exec: nslookup: can't resolve 'kubernetes.default.svc.cluster.local'
, at 2017-05-31 19:09:45.812333 +0000 UTC, error exit status 1
2017/05/31 19:09:57 Healthz probe on /healthz-dnsmasq error: Result of last exec: nslookup: can't resolve 'kubernetes.default.svc.cluster.local'
, at 2017-05-31 19:09:55.814050664 +0000 UTC, error exit status 1
2017/05/31 19:09:57 Healthz probe on /healthz-kubedns error: Result of last exec: nslookup: can't resolve 'kubernetes.default.svc.cluster.local'
, at 2017-05-31 19:09:55.814489628 +0000 UTC, error exit status 1
2017/05/31 19:12:07 Healthz probe on /healthz-dnsmasq error: Result of last exec: nslookup: can't resolve 'kubernetes.default.svc.cluster.local'
, at 2017-05-31 19:12:05.811958358 +0000 UTC, error exit status 1
2017/05/31 19:12:07 Healthz probe on /healthz-kubedns error: Result of last exec: nslookup: can't resolve 'kubernetes.default.svc.cluster.local'
, at 2017-05-31 19:12:05.812245808 +0000 UTC, error exit status 1
2017/05/31 19:14:17 Healthz probe on /healthz-dnsmasq error: Result of last exec: nslookup: can't resolve 'kubernetes.default.svc.cluster.local'
, at 2017-05-31 19:14:15.808253036 +0000 UTC, error exit status 1
2017/05/31 19:14:17 Healthz probe on /healthz-kubedns error: Result of last exec: nslookup: can't resolve 'kubernetes.default.svc.cluster.local'
, at 2017-05-31 19:14:15.809691998 +0000 UTC, error exit status 1
2017/05/31 19:16:27 Healthz probe on /healthz-dnsmasq error: Result of last exec: nslookup: can't resolve 'kubernetes.default.svc.cluster.local'
, at 2017-05-31 19:16:25.813103916 +0000 UTC, error exit status 1
2017/05/31 19:16:27 Healthz probe on /healthz-kubedns error: Result of last exec: nslookup: can't resolve 'kubernetes.default.svc.cluster.local'
, at 2017-05-31 19:16:25.813576285 +0000 UTC, error exit status 1

According to the kube-dns troubleshooter guide, I used busy box for a simple nslookup, I got:

Williams-MacBook-Pro:KubeControl demonfuse$ kubectl exec -ti busybox -- nslookup kubernetes.default
Server:    10.3.0.10
Address 1: 10.3.0.10

nslookup: can't resolve 'kubernetes.default'

Any ideas? I followed the instructions on the Kubernetes guide to the letter and I'm using flannel without calico. How would I approach resolving kubernetes.default?

Also, the dashboard (Add-ons page on official guide) seems to be having the same problem.

page-fault-in-nonpaged-area commented 7 years ago

Still no luck.

jazzsir commented 7 years ago

@Ascendance I think there are a log of missing parts in the instructions. I recommend you use a Vagrantfile and some scripts in "https://coreos.com/kubernetes/docs/latest/kubernetes-on-vagrant.html"

mfaizanse commented 6 years ago

I am facing the same issue. I an not using calico, just flanneld. Did you find the solution? @jazzsir @Ascendance

hsteckylf commented 6 years ago

We just had the same issue with kubedns and dashboard crash looping up as well, but are using weave. No luck resolving yet.

mfaizanse commented 6 years ago

@hsteckylf Check the kube-proxy logs, I found some issues in the logs, fixed them and the issue is gone now.

hsteckylf commented 6 years ago

Thanks! In our case, it ended up being due to the same issue as https://github.com/weaveworks/weave/issues/1875 with all of the weave IPAM IP space being allocated to unreachable (old) pods. After looping through weave rmpeer and recovering those IPs, all of the connections and pods were restored.

anjanijha commented 6 years ago

just change the port 6443 to 443 $vi /etc/kubernetes/manifests/kube-apiserver.yaml on the master and changing the liveness probe: livenessProbe: failureThreshold: 8 httpGet: host: 127.0.0.1 path: /healthz port: 443 # was 6443 scheme: HTTPS and restart the kubelet

xiaoguazh commented 6 years ago

@mfaizanse could you please share more of how did you resolve the issue ?

I've same issue, I think this log give hints

` $ kubectl logs kube-dns-86f4d74b45-gb4t7 -n kube-system -c kubedns

reflector.go:201] k8s.io/dns/pkg/dns/dns.go:150: Failed to list *v1.Service:   Get https://10.96.0.1:443/api/v1/services?resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
reflector.go:201] k8s.io/dns/pkg/dns/dns.go:147: Failed to list *v1.Endpoints: Get https://10.96.0.1:443/api/v1/endpoints?resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout

Waiting for services and endpoints to be initialized from apiserver...
dns.go:167] Timeout waiting for initialization

$ kubectl get endpoints kubernetes hostnames kube-dns

NAME         ENDPOINTS                                      AGE
kubernetes   192.168.56.101:6443                            3h
hostnames    10.10.1.3:9376,10.10.2.3:9376,10.10.2.4:9376   46m
Error from server (NotFound): endpoints "kube-dns" not found

`

I try search and read more articles, but I haven't have a clear step how to troubleshoot and resolve such problem.

coreos / coreos-kubernetes

kube-dns and dashboard get into a crash loop #878