coreos / etcd-operator

etcd operator creates/configures/manages etcd clusters atop Kubernetes
https://coreos.com/blog/introducing-the-etcd-operator.html
Apache License 2.0
1.75k stars 741 forks source link

Init container hangs indefinitely #2077

Open jicowan opened 5 years ago

jicowan commented 5 years ago

After applying the manifest for the example cluster [a 3 node etcd cluster] the init container hangs indefinitely. The last message skip reconciliation: running ([]), pending ([example-etcd-cluster-vnjpsbdfmn])" cluster-name=example-etcd-cluster cluster-namespace=default pkg=cluster keeps repeating. When I look at the logs for the example-etcd-cluster-vnjpsbdfmn pod it says, Error from server (BadRequest): container "etcd" in pod "example-etcd-cluster-vnjpsbdfmn" is waiting to start: PodInitializing. I see no other logs that indicate what the issue might be.

time="2019-04-14T23:21:54Z" level=info msg="creating cluster with Spec:" cluster-name=example-etcd-cluster cluster-namespace=default pkg=cluster
time="2019-04-14T23:21:54Z" level=info msg="{" cluster-name=example-etcd-cluster cluster-namespace=default pkg=cluster
time="2019-04-14T23:21:54Z" level=info msg="    \"size\": 3," cluster-name=example-etcd-cluster cluster-namespace=default pkg=cluster
time="2019-04-14T23:21:54Z" level=info msg="    \"repository\": \"quay.io/coreos/etcd\"," cluster-name=example-etcd-cluster cluster-namespace=default pkg=cluster
time="2019-04-14T23:21:54Z" level=info msg="    \"version\": \"3.2.13\"" cluster-name=example-etcd-cluster cluster-namespace=default pkg=cluster
time="2019-04-14T23:21:54Z" level=info msg="}" cluster-name=example-etcd-cluster cluster-namespace=default pkg=cluster
time="2019-04-14T23:21:54Z" level=info msg="cluster created with seed member (example-etcd-cluster-vnjpsbdfmn)" cluster-name=example-etcd-cluster cluster-namespace=default pkg=cluster
time="2019-04-14T23:21:54Z" level=info msg="start running..." cluster-name=example-etcd-cluster cluster-namespace=default pkg=cluster
time="2019-04-14T23:22:02Z" level=info msg="skip reconciliation: running ([]), pending ([example-etcd-cluster-vnjpsbdfmn])" cluster-name=example-etcd-cluster cluster-namespace=default pkg=cluster
brunowego commented 5 years ago

@jicowan same here. Did you find a solution? Thanks.

jicowan commented 5 years ago

@brunowego Not, yet.

NickCarton commented 5 years ago

I've got the same issue

brunowego commented 5 years ago

After change from flannel network to calico, this not happen more. Try switch network.

nvtkaszpir commented 5 years ago

please investigate events in kubectl cluster, especially from etcd pods, there should be an info why pod is still in initializing state. Usually it's related to insufficient resources (too high cpu/memory requests per pod), or incorrectly configured storage (for example pod in in zone A while PV was created in zone B, thus you should create new storageclass with volumeBindingMode: WaitForFirstConsumer).