coreos / coreos-kubernetes

CoreOS Container Linux+Kubernetes documentation & Vagrant installers
https://coreos.com/kubernetes/docs/latest/
Apache License 2.0
1.1k stars 466 forks source link

apiserver pod restarts every 10-15 sec #871

Open yurchenkosv opened 7 years ago

yurchenkosv commented 7 years ago

Acoordind to this guide I've configured certificates, etcd, flannel, kubelet. Then I've added yaml files to deploy needed services. After starting pods and kubelet service, apiserver pod starts restarting every 10-15 sec, I decided to check health status of pod and here it is:

curl http://127.0.0.1:8080/healthz
[+]ping ok
[-]poststarthook/bootstrap-controller failed: reason withheld
[+]poststarthook/extensions/third-party-resources ok
[-]poststarthook/rbac/bootstrap-roles failed: reason withheld
[+]poststarthook/ca-registration ok
healthz check failed

What with it could be connected?

mreichardt95 commented 7 years ago

+1 I have the same problem. Here are the docker logs for the api-server container:

core@rim-kube-00 /var/log/containers $ docker logs 1c0375091eab
E0421 18:00:40.794195       1 reflector.go:201] k8s.io/kubernetes/pkg/client/informers/informers_generated/internalversion/factory.go:70: Failed to list *api.LimitRange: Get https://localhost:443/api/v1/limitranges?resourceVersion=0: dial tcp [::1]:443: getsockopt: connection refused
E0421 18:00:40.794483       1 reflector.go:201] k8s.io/kubernetes/pkg/client/informers/informers_generated/internalversion/factory.go:70: Failed to list *api.Secret: Get https://localhost:443/api/v1/secrets?resourceVersion=0: dial tcp [::1]:443: getsockopt: connection refused
E0421 18:00:40.794574       1 reflector.go:201] k8s.io/kubernetes/pkg/client/informers/informers_generated/internalversion/factory.go:70: Failed to list *api.Namespace: Get https://localhost:443/api/v1/namespaces?resourceVersion=0: dial tcp [::1]:443: getsockopt: connection refused
E0421 18:00:40.795157       1 reflector.go:201] k8s.io/kubernetes/pkg/client/informers/informers_generated/internalversion/factory.go:70: Failed to list *api.ServiceAccount: Get https://localhost:443/api/v1/serviceaccounts?resourceVersion=0: dial tcp [::1]:443: getsockopt: connection refused
E0421 18:00:40.795771       1 reflector.go:201] k8s.io/kubernetes/pkg/client/informers/informers_generated/internalversion/factory.go:70: Failed to list *storage.StorageClass: Get https://localhost:443/apis/storage.k8s.io/v1beta1/storageclasses?resourceVersion=0: dial tcp [::1]:443: getsockopt: connection refused
E0421 18:00:40.796346       1 reflector.go:201] k8s.io/kubernetes/pkg/client/informers/informers_generated/internalversion/factory.go:70: Failed to list *api.ResourceQuota: Get https://localhost:443/api/v1/resourcequotas?resourceVersion=0: dial tcp [::1]:443: getsockopt: connection refused
[restful] 2017/04/21 18:00:40 log.go:30: [restful/swagger] listing is available at https://172.16.100.1:443/swaggerapi/
[restful] 2017/04/21 18:00:40 log.go:30: [restful/swagger] https://172.16.100.1:443/swaggerui/ is mapped to folder /swagger-ui/
I0421 18:00:40.908618       1 serve.go:79] Serving securely on 0.0.0.0:443
I0421 18:00:40.908737       1 serve.go:94] Serving insecurely on 127.0.0.1:8080
E0421 18:00:43.549152       1 status.go:62] apiserver received an error that is not an metav1.Status: rpc error: code = 13 desc = transport: write tcp 172.16.100.1:53826->172.16.100.1:2379: write: broken pipe
I0421 18:00:43.549495       1 trace.go:61] Trace "Create /api/v1/namespaces" (started 2017-04-21 18:00:40.949240619 +0000 UTC):
[19.91µs] [19.91µs] About to convert to expected version
[88.421µs] [68.511µs] Conversion done
[98.739µs] [10.318µs] About to store object in database
"Create /api/v1/namespaces" [2.600124542s] [2.600025803s] END
E0421 18:00:43.568014       1 client_ca_hook.go:58] rpc error: code = 13 desc = transport: write tcp 172.16.100.1:53826->172.16.100.1:2379: write: broken pipe
I0421 18:00:51.047205       1 trace.go:61] Trace "Create /api/v1/namespaces/kube-system/pods" (started 2017-04-21 18:00:41.046728665 +0000 UTC):
[47.498µs] [47.498µs] About to convert to expected version
[162.802µs] [115.304µs] Conversion done
"Create /api/v1/namespaces/kube-system/pods" [10.000450892s] [10.00028809s] END
I0421 18:00:51.782038       1 trace.go:61] Trace "Create /api/v1/namespaces/kube-system/pods" (started 2017-04-21 18:00:41.781540741 +0000 UTC):
[29.386µs] [29.386µs] About to convert to expected version
[237.267µs] [207.881µs] Conversion done
"Create /api/v1/namespaces/kube-system/pods" [10.000471298s] [10.000234031s] END
E0421 18:00:56.656148       1 status.go:62] apiserver received an error that is not an metav1.Status: rpc error: code = 13 desc = transport: write tcp 172.16.100.1:55898->172.16.100.1:2379: write: connection reset by peer
I0421 18:00:56.656526       1 trace.go:61] Trace "Create /api/v1/nodes" (started 2017-04-21 18:00:47.109748316 +0000 UTC):
[60.578µs] [60.578µs] About to convert to expected version
[127.397µs] [66.819µs] Conversion done
[132.808µs] [5.411µs] About to store object in database
"Create /api/v1/nodes" [9.546721837s] [9.546589029s] END
E0421 18:00:56.656824       1 status.go:62] apiserver received an error that is not an metav1.Status: rpc error: code = 13 desc = transport is closing
E0421 18:00:56.656959       1 status.go:62] apiserver received an error that is not an metav1.Status: rpc error: code = 13 desc = transport is closing
I0421 18:00:56.657685       1 trace.go:61] Trace "Create /api/v1/nodes" (started 2017-04-21 18:00:47.108386618 +0000 UTC):
[25.143µs] [25.143µs] About to convert to expected version
[97.799µs] [72.656µs] Conversion done
[107.476µs] [9.677µs] About to store object in database
"Create /api/v1/nodes" [9.549278532s] [9.549171056s] END
I0421 18:00:56.658814       1 trace.go:61] Trace "Create /api/v1/nodes" (started 2017-04-21 18:00:47.110480333 +0000 UTC):
[40.581µs] [40.581µs] About to convert to expected version
[129.452µs] [88.871µs] Conversion done
[135.733µs] [6.281µs] About to store object in database
"Create /api/v1/nodes" [9.548312485s] [9.548176752s] END
I0421 18:01:02.106516       1 trace.go:61] Trace "Create /api/v1/namespaces/kube-system/pods" (started 2017-04-21 18:00:52.10616893 +0000 UTC):
[28.44µs] [28.44µs] About to convert to expected version
[99.684µs] [71.244µs] Conversion done
"Create /api/v1/namespaces/kube-system/pods" [10.000320483s] [10.000220799s] END
yurchenkosv commented 7 years ago

Here are my apiserver logs: [restful] 2017/04/23 12:59:36 log.go:30: [restful/swagger] listing is available at https://192.168.2.171:443/swaggerapi/ [restful] 2017/04/23 12:59:36 log.go:30: [restful/swagger] https://192.168.2.171:443/swaggerui/ is mapped to folder /swagger-ui/ E0423 12:59:36.825682 1 reflector.go:201] k8s.io/kubernetes/pkg/client/informers/informers_generated/internalversion/factory.go:70: Failed to list *storage.StorageClass: Get https://localhost:443/apis/storage.k8s.io/v1beta1/storageclasses?resourceVersion=0: dial tcp [::1]:443: getsockopt: connection refused E0423 12:59:36.836876 1 reflector.go:201] k8s.io/kubernetes/pkg/client/informers/informers_generated/internalversion/factory.go:70: Failed to list *api.ResourceQuota: Get https://localhost:443/api/v1/resourcequotas?resourceVersion=0: dial tcp [::1]:443: getsockopt: connection refused E0423 12:59:36.836959 1 reflector.go:201] k8s.io/kubernetes/pkg/client/informers/informers_generated/internalversion/factory.go:70: Failed to list *api.Secret: Get https://localhost:443/api/v1/secrets?resourceVersion=0: dial tcp [::1]:443: getsockopt: connection refused E0423 12:59:36.837004 1 reflector.go:201] k8s.io/kubernetes/pkg/client/informers/informers_generated/internalversion/factory.go:70: Failed to list *api.Namespace: Get https://localhost:443/api/v1/namespaces?resourceVersion=0: dial tcp [::1]:443: getsockopt: connection refused E0423 12:59:36.837043 1 reflector.go:201] k8s.io/kubernetes/pkg/client/informers/informers_generated/internalversion/factory.go:70: Failed to list *api.LimitRange: Get https://localhost:443/api/v1/limitranges?resourceVersion=0: dial tcp [::1]:443: getsockopt: connection refused E0423 12:59:36.837078 1 reflector.go:201] k8s.io/kubernetes/pkg/client/informers/informers_generated/internalversion/factory.go:70: Failed to list *api.ServiceAccount: Get https://localhost:443/api/v1/serviceaccounts?resourceVersion=0: dial tcp [::1]:443: getsockopt: connection refused I0423 12:59:36.852122 1 serve.go:79] Serving securely on 0.0.0.0:443 I0423 12:59:36.852235 1 serve.go:94] Serving insecurely on 127.0.0.1:8080 E0423 12:59:37.785811 1 status.go:62] apiserver received an error that is not an metav1.Status: rpc error: code = 13 desc = transport is closing I0423 12:59:37.786117 1 trace.go:61] Trace "Create /api/v1/namespaces" (started 2017-04-23 12:59:36.875563324 +0000 UTC): [12.464µs] [12.464µs] About to convert to expected version [58.562µs] [46.098µs] Conversion done [65.873µs] [7.311µs] About to store object in database "Create /api/v1/namespaces" [910.530482ms] [910.464609ms] END E0423 12:59:37.788226 1 client_ca_hook.go:58] rpc error: code = 13 desc = transport is closing

TerraTech commented 7 years ago

I just hit this today trying to upgrade from 1.5.7 ==> 1.6.2

Are you using the following combination? hyperkube:v1.6.x + etcd2 ?

From: https://groups.google.com/forum/#!topic/kubernetes-announce/UoN3XroTDn0

Internal Storage Layer
upgrade to etcd3 prior to upgrading to 1.6 OR explicitly specify --storage-type=etcd2 --storage-media-type=application/json when starting the apiserver

I haven't had a chance yet to test this yet, but I think that may be the problem.

Also, I'm not sure when etcd3 will be mainlined into the stable offering but I do see etcd2 will be removed in 2/2018. https://coreos.com/blog/toward-etcd-v3-in-container-linux.html

TerraTech commented 7 years ago

I believe the above announcement recommendation is wrong as '--storage-type' doesn't exist. I believe it was supposed to specify '--storage-backend' instead.

https://github.com/kubernetes/kubernetes/blob/v1.6.2/staging/src/k8s.io/apiserver/pkg/server/options/etcd.go#L84

TerraTech commented 7 years ago

I can now confirm this has fixed my kube-apiserver looping problem and my master node is now upgraded to v1.6.2