Open jazzsir opened 7 years ago
Not sure if this helps, but we were also running into kubedns
pod crash loops after upgrading to Kubernetes 1.6 and were able to workaround it by using Calico. See the last ~10 or so commits in this fork that shows you everything we did: https://github.com/rook/coreos-kubernetes/commits/master
@jbw976 I followed instruction at https://github.com/rook/coreos-kubernetes/blob/master/Documentation/getting-started.md and used Calico. But, when I started kubelet, COMMAND attributes(docker ps) of all container(proxy, api server, controller and schedule) was "/pause"
@jbw976 I finally deployed a master node using Calico. But, "Set Up the CNI config (optional)" link is dead in "https://github.com/rook/coreos-kubernetes/blob/master/Documentation/deploy-workers.md". Do you know where is the "Set Up the CNI config (optional)" guide? I wish I would finish installation without any auto-configuration tools
Same problem here. Would prefer have flannel working before Calico
Have been attempting to solve this for a while now.
When I run describe kube-dns I get:
Williams-MacBook-Pro:KubeControl demonfuse$ kubectl describe service kube-dns --namespace=kube-system
Name: kube-dns
Namespace: kube-system
Labels: k8s-app=kube-dns
kubernetes.io/cluster-service=true
kubernetes.io/name=KubeDNS
Annotations: <none>
Selector: k8s-app=kube-dns
Type: ClusterIP
IP: 10.3.0.10
Port: dns 53/UDP
Endpoints:
Port: dns-tcp 53/TCP
Endpoints:
Session Affinity: None
Events: <none>
I noticed in the above there are no endpoints. Whereas describe service kubernetes I get endpoints.
Williams-MacBook-Pro:KubeControl demonfuse$ kubectl describe svc kubernetes
Name: kubernetes
Namespace: default
Labels: component=apiserver
provider=kubernetes
Annotations: <none>
Selector: <none>
Type: ClusterIP
IP: 10.3.0.1
Port: https 443/TCP
Endpoints: xx.xx.xx.xx:443
Session Affinity: ClientIP
Events: <none>
The logs for kubedns, dnsmasq, and healthz I noticed it's having trouble connecting to 10.3.0.1 > health reports nslookup: can't resolve
Williams-MacBook-Pro:KubeControl demonfuse$ kubectl logs --namespace=kube-system $(kubectl get pods --namespace=kube-system -l k8s-app=kube-dns -o name) -c kubedns
I0531 19:15:18.538655 1 server.go:94] Using https://10.3.0.1:443 for kubernetes master, kubernetes API: <nil>
I0531 19:15:18.539824 1 server.go:99] v1.5.0-alpha.0.1651+7dcae5edd84f06-dirty
I0531 19:15:18.539894 1 server.go:101] FLAG: --alsologtostderr="false"
I0531 19:15:18.539926 1 server.go:101] FLAG: --dns-port="10053"
I0531 19:15:18.540001 1 server.go:101] FLAG: --domain="cluster.local."
I0531 19:15:18.540029 1 server.go:101] FLAG: --federations=""
I0531 19:15:18.540051 1 server.go:101] FLAG: --healthz-port="8081"
I0531 19:15:18.540088 1 server.go:101] FLAG: --kube-master-url=""
I0531 19:15:18.540110 1 server.go:101] FLAG: --kubecfg-file=""
I0531 19:15:18.540128 1 server.go:101] FLAG: --log-backtrace-at=":0"
I0531 19:15:18.540165 1 server.go:101] FLAG: --log-dir=""
I0531 19:15:18.540189 1 server.go:101] FLAG: --log-flush-frequency="5s"
I0531 19:15:18.540210 1 server.go:101] FLAG: --logtostderr="true"
I0531 19:15:18.540244 1 server.go:101] FLAG: --stderrthreshold="2"
I0531 19:15:18.540265 1 server.go:101] FLAG: --v="0"
I0531 19:15:18.540296 1 server.go:101] FLAG: --version="false"
I0531 19:15:18.540338 1 server.go:101] FLAG: --vmodule=""
I0531 19:15:18.540415 1 server.go:138] Starting SkyDNS server. Listening on port:10053
I0531 19:15:18.540533 1 server.go:145] skydns: metrics enabled on : /metrics:
I0531 19:15:18.540598 1 dns.go:166] Waiting for service: default/kubernetes
I0531 19:15:18.541278 1 logs.go:41] skydns: ready for queries on cluster.local. for tcp://0.0.0.0:10053 [rcache 0]
I0531 19:15:18.541340 1 logs.go:41] skydns: ready for queries on cluster.local. for udp://0.0.0.0:10053 [rcache 0]
I0531 19:15:48.542106 1 dns.go:172] Ignoring error while waiting for service default/kubernetes: Get https://10.3.0.1:443/api/v1/namespaces/default/services/kubernetes: dial tcp 10.3.0.1:443: i/o timeout. Sleeping 1s before retrying.
E0531 19:15:48.544209 1 reflector.go:214] pkg/dns/dns.go:154: Failed to list *api.Endpoints: Get https://10.3.0.1:443/api/v1/endpoints?resourceVersion=0: dial tcp 10.3.0.1:443: i/o timeout
E0531 19:15:48.544580 1 reflector.go:214] pkg/dns/dns.go:155: Failed to list *api.Service: Get https://10.3.0.1:443/api/v1/services?resourceVersion=0: dial tcp 10.3.0.1:443: i/o timeout
I0531 19:16:19.543942 1 dns.go:172] Ignoring error while waiting for service default/kubernetes: Get https://10.3.0.1:443/api/v1/namespaces/default/services/kubernetes: dial tcp 10.3.0.1:443: i/o timeout. Sleeping 1s before retrying.
E0531 19:16:19.546421 1 reflector.go:214] pkg/dns/dns.go:155: Failed to list *api.Service: Get https://10.3.0.1:443/api/v1/services?resourceVersion=0: dial tcp 10.3.0.1:443: i/o timeout
E0531 19:16:19.546569 1 reflector.go:214] pkg/dns/dns.go:154: Failed to list *api.Endpoints: Get https://10.3.0.1:443/api/v1/endpoints?resourceVersion=0: dial tcp 10.3.0.1:443: i/o timeout
Williams-MacBook-Pro:KubeControl demonfuse$ kubectl logs --namespace=kube-system $(kubectl get pods --namespace=kube-system -l k8s-app=kube-dns -o name) -c dnsmasq
dnsmasq[1]: started, version 2.76 cachesize 1000
dnsmasq[1]: compile time options: IPv6 GNU-getopt no-DBus no-i18n no-IDN DHCP DHCPv6 no-Lua TFTP no-conntrack ipset auth no-DNSSEC loop-detect inotify
dnsmasq[1]: using nameserver 127.0.0.1#10053
dnsmasq[1]: read /etc/hosts - 7 addresses
Williams-MacBook-Pro:KubeControl demonfuse$ kubectl logs --namespace=kube-system $(kubectl get pods --namespace=kube-system -l k8s-app=kube-dns -o name) -c healthz
2017/05/31 19:09:17 Healthz probe on /healthz-dnsmasq error: Result of last exec: nslookup: can't resolve 'kubernetes.default.svc.cluster.local'
, at 2017-05-31 19:09:15.817652465 +0000 UTC, error exit status 1
2017/05/31 19:09:17 Healthz probe on /healthz-kubedns error: Result of last exec: nslookup: can't resolve 'kubernetes.default.svc.cluster.local'
, at 2017-05-31 19:09:15.816970338 +0000 UTC, error exit status 1
2017/05/31 19:09:27 Healthz probe on /healthz-dnsmasq error: Result of last exec: nslookup: can't resolve 'kubernetes.default.svc.cluster.local'
, at 2017-05-31 19:09:25.813204033 +0000 UTC, error exit status 1
2017/05/31 19:09:27 Healthz probe on /healthz-kubedns error: Result of last exec: nslookup: can't resolve 'kubernetes.default.svc.cluster.local'
, at 2017-05-31 19:09:25.812644469 +0000 UTC, error exit status 1
2017/05/31 19:09:37 Healthz probe on /healthz-dnsmasq error: Result of last exec: nslookup: can't resolve 'kubernetes.default.svc.cluster.local'
, at 2017-05-31 19:09:35.815158454 +0000 UTC, error exit status 1
2017/05/31 19:09:37 Healthz probe on /healthz-kubedns error: Result of last exec: nslookup: can't resolve 'kubernetes.default.svc.cluster.local'
, at 2017-05-31 19:09:35.814596885 +0000 UTC, error exit status 1
2017/05/31 19:09:47 Healthz probe on /healthz-dnsmasq error: Result of last exec: nslookup: can't resolve 'kubernetes.default.svc.cluster.local'
, at 2017-05-31 19:09:45.811774257 +0000 UTC, error exit status 1
2017/05/31 19:09:47 Healthz probe on /healthz-kubedns error: Result of last exec: nslookup: can't resolve 'kubernetes.default.svc.cluster.local'
, at 2017-05-31 19:09:45.812333 +0000 UTC, error exit status 1
2017/05/31 19:09:57 Healthz probe on /healthz-dnsmasq error: Result of last exec: nslookup: can't resolve 'kubernetes.default.svc.cluster.local'
, at 2017-05-31 19:09:55.814050664 +0000 UTC, error exit status 1
2017/05/31 19:09:57 Healthz probe on /healthz-kubedns error: Result of last exec: nslookup: can't resolve 'kubernetes.default.svc.cluster.local'
, at 2017-05-31 19:09:55.814489628 +0000 UTC, error exit status 1
2017/05/31 19:12:07 Healthz probe on /healthz-dnsmasq error: Result of last exec: nslookup: can't resolve 'kubernetes.default.svc.cluster.local'
, at 2017-05-31 19:12:05.811958358 +0000 UTC, error exit status 1
2017/05/31 19:12:07 Healthz probe on /healthz-kubedns error: Result of last exec: nslookup: can't resolve 'kubernetes.default.svc.cluster.local'
, at 2017-05-31 19:12:05.812245808 +0000 UTC, error exit status 1
2017/05/31 19:14:17 Healthz probe on /healthz-dnsmasq error: Result of last exec: nslookup: can't resolve 'kubernetes.default.svc.cluster.local'
, at 2017-05-31 19:14:15.808253036 +0000 UTC, error exit status 1
2017/05/31 19:14:17 Healthz probe on /healthz-kubedns error: Result of last exec: nslookup: can't resolve 'kubernetes.default.svc.cluster.local'
, at 2017-05-31 19:14:15.809691998 +0000 UTC, error exit status 1
2017/05/31 19:16:27 Healthz probe on /healthz-dnsmasq error: Result of last exec: nslookup: can't resolve 'kubernetes.default.svc.cluster.local'
, at 2017-05-31 19:16:25.813103916 +0000 UTC, error exit status 1
2017/05/31 19:16:27 Healthz probe on /healthz-kubedns error: Result of last exec: nslookup: can't resolve 'kubernetes.default.svc.cluster.local'
, at 2017-05-31 19:16:25.813576285 +0000 UTC, error exit status 1
According to the kube-dns troubleshooter guide, I used busy box for a simple nslookup, I got:
Williams-MacBook-Pro:KubeControl demonfuse$ kubectl exec -ti busybox -- nslookup kubernetes.default
Server: 10.3.0.10
Address 1: 10.3.0.10
nslookup: can't resolve 'kubernetes.default'
Any ideas? I followed the instructions on the Kubernetes guide to the letter and I'm using flannel without calico. How would I approach resolving kubernetes.default?
Also, the dashboard (Add-ons page on official guide) seems to be having the same problem.
Still no luck.
@Ascendance I think there are a log of missing parts in the instructions. I recommend you use a Vagrantfile and some scripts in "https://coreos.com/kubernetes/docs/latest/kubernetes-on-vagrant.html"
I am facing the same issue. I an not using calico, just flanneld. Did you find the solution? @jazzsir @Ascendance
We just had the same issue with kubedns and dashboard crash looping up as well, but are using weave. No luck resolving yet.
@hsteckylf Check the kube-proxy logs, I found some issues in the logs, fixed them and the issue is gone now.
Thanks! In our case, it ended up being due to the same issue as https://github.com/weaveworks/weave/issues/1875 with all of the weave IPAM IP space being allocated to unreachable (old) pods. After looping through weave rmpeer and recovering those IPs, all of the connections and pods were restored.
just change the port 6443 to 443 $vi /etc/kubernetes/manifests/kube-apiserver.yaml on the master and changing the liveness probe: livenessProbe: failureThreshold: 8 httpGet: host: 127.0.0.1 path: /healthz port: 443 # was 6443 scheme: HTTPS and restart the kubelet
@mfaizanse could you please share more of how did you resolve the issue ?
I've same issue, I think this log give hints
` $ kubectl logs kube-dns-86f4d74b45-gb4t7 -n kube-system -c kubedns
reflector.go:201] k8s.io/dns/pkg/dns/dns.go:150: Failed to list *v1.Service: Get https://10.96.0.1:443/api/v1/services?resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
reflector.go:201] k8s.io/dns/pkg/dns/dns.go:147: Failed to list *v1.Endpoints: Get https://10.96.0.1:443/api/v1/endpoints?resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
Waiting for services and endpoints to be initialized from apiserver...
dns.go:167] Timeout waiting for initialization
$ kubectl get endpoints kubernetes hostnames kube-dns
NAME ENDPOINTS AGE
kubernetes 192.168.56.101:6443 3h
hostnames 10.10.1.3:9376,10.10.2.3:9376,10.10.2.4:9376 46m
Error from server (NotFound): endpoints "kube-dns" not found
`
I try search and read more articles, but I haven't have a clear step how to troubleshoot and resolve such problem.
I've deployed the DNS and Dashboard Add-on, according to Step 5: Deploy Add-ons, but they get into a crash loop. My installation procedure is the same as Manual Installation except for adding "--storage-backend=etcd2" and "--storage-media-type=application/json" to kube-apiserver.yaml because apiserver pod periodically restarts.
details as the below:
kube-dns logs
kubernetes-dashboard logs
etc..