coreos / etcd-operator

etcd operator creates/configures/manages etcd clusters atop Kubernetes
https://coreos.com/blog/introducing-the-etcd-operator.html
Apache License 2.0
1.75k stars 741 forks source link

Every node but the first fails to connect when using Istio #2101

Open RJPercival opened 5 years ago

RJPercival commented 5 years ago

When using Istio, the Etcd node pods have an istio-proxy sidecar that proxies all of their network traffic. However, it takes a moment for this sidecar to start. During this time, the Etcd container won't be able to make network connections and so can't connect to other nodes. This results in Etcd logging an error and terminating. Because restartPolicy is set to Never, it won't recover from this. Could you consider changing restartPolicy so that the nodes can recover from this situation (upon restarting, the istio-proxy will be ready and the problem won't recur)?

Related issue: https://github.com/kubernetes/kubernetes/issues/65502

supernomad commented 5 years ago

I have this same problem and it means I am unable to use the etcd-operator. I am assuming the issue here with restart, is that there is not PVC and the operator uses individual pod's instead of a stateful set? I was surprised by this fact, but it would make sense why restart is disallowed. Are there any plans to rectify this issue? I believe there is work to take on using PVC's at the very least, which if that lands means that the restart issue is going to be moot?

xmlking commented 5 years ago

We are also planing to enable istio. Worried if this is going to break.

xmlking commented 5 years ago

we switched to operator less etcd cluster with https://github.com/etcd-io/etcd/blob/master/hack/kubernetes-deploy/etcd.yml