[Feature] Liveness Probe on multi-node etcd

ishan16696 commented 2 years ago

Feature (What you would like to be added): Configure the correct livenessProbe in multi-node etcd.

Motivation (Why is this needed?): Right now I found 2 ways to configure the livenessProbe:

/health endpoint /health endpoint returns false if one of the following conditions is met (source):
- there is no etcd leader or leader-election is currently going on.
- the latency of a QGET request exceeds 1sec
Disadvantage of Method 1 (/health endpoint).
- If there is no Quorum, etcd-follower will get restart by kubelet but etcd's restart isn't a right behavior.
- If etcd cluster is overloaded which is bad, but restart will generate even more load
ETCDCTL_API=3 etcdctl endpoint health --endpoints=${ENDPOINTS} --command-timeout=Xs etcdctl endpoint health command performs a GET on the "health" key(source)
- fails if there is no etcd leader or leader-election is going on as I think GET request will fail if there is no etcd leader.
Disadvantage of Method 2 (etcdctl endpoint health).
- If there is no Quorum, etcd-follower will get restart by kubelet but etcd's restart isn't a right behavior.

This issue of liveness is already reported on etcd here and it has been fixed by this PR but this feature will come with etcd v3.6 and we don't know the timeline when it will be released.

What options we have:

Go with the method 2(etcdctl endpoint health) and whenever etcd release v3.6, switch to that version and configure the correct liveness probe.
Other way is fork the etcd v3.4.13 and make the necessary changes and use the custom image.

ishan16696 commented 2 years ago

cc @abdasgupta @aaronfern @timuthy

timuthy commented 2 years ago

Reading this I wonder about the option keeping the current liveness probe but setting --consistency="s"? This mimics the implementation of the linked PR https://github.com/etcd-io/etcd/pull/13399. WDYT @ishan16696?

ishan16696 commented 2 years ago

Reading this I wonder about the option keeping the current liveness probe but setting --consistency="s"

yes, this make sense to me .... I don’t see any major issue with this. we can configure the livenessProbe like this making the consistency serializable instead of linearizable

- /bin/sh
      - -ec
      - ETCDCTL_API=3
      - etcdctl
      - --cert= ...
      - --key= ....
      - --cacert= ....
      - --endpoints=https://etcd-main-local:2379
      - get
      - foo
      - --consistency="s"

ishan16696 commented 2 years ago

Hi @timuthy , Unfortunately this way of using --consistency="s" is not working for following scenario.

Start an etcd cluster with 3 members.
Add some dummy data.
Stop all the members
Start only one member
then try etcdctl get foo --endpoints=http://127.0.0.1:2379 --consistency="s". Unfortunately --consistency="s" won't help us in this case until quorum will get satisfy.

timuthy commented 2 years ago

As discussed, the assumption wasn't that all members are down but rather that if e.g. 1/3 are available, this last running pod is not killed by the liveness probe as well because it doesn't make sense.

ishan16696 commented 2 years ago

this last running pod is not killed by the liveness probe as well because it doesn't make sense.

I'm not saying last pod is killed due to liveness probe ... I'm saying last pod can be killed/crash due to some reason, why you are assuming that last pod can’t be killed.

ishan16696 commented 2 years ago

The point I'm making is, suppose all pods/member of etcd-cluster crashes/goes down due to some reasons, they will not be able to come up due to this livenessProbe: etcdctl get foo --consistency="s" will get fail even for 1/3 member until quorum will achieve.

timuthy commented 2 years ago

they will not be able to come up due to this livenessProbe

I assume that they will come up fast enough, so that the livenessProbes don't fail beyond the expected threshold (initialDelaySeconds + periodSeconds * failureThreshold) and thus the containers aren't restarted before they're up and running.

ishan16696 commented 2 years ago

(initialDelaySeconds + periodSeconds * failureThreshold)

yeah exactly but then we have to depend on this configuration and set them accordingly.

unmarshall commented 2 years ago

IMO using --consistency=s is non-deterministic as the time it takes to start the etcd process cannot be guaranteed. The correct thing to do would be to create a /livez or /healthz endpoint which is served out of backup-restore container. If this endpoint has come up and responds with a 200 OK then etcd pod should be considered as LIVE which then prevents it from getting restarted. For readiness probe you can probably use etcdctl with --consistency=s which ensures that the traffic can be redirected to this etcd pod. For redirecting traffic to a single etcd instance it is not important to check if the cluster is healthy (i.e quorum is not lost). If the quorum is lost then the request will anyways get rejected if it has to by etcd member.

timuthy commented 2 years ago

IMO using --consistency=s is non-deterministic as the time it takes to start the etcd process cannot be guaranteed. The correct thing to do would be to create a /livez or /healthz endpoint which is served out of backup-restore container.

Can you sketch how an implementation of /livez would look like to make it deterministic?

ishan16696 commented 2 years ago

Reopening this issue as livenessProbe can be improved to resolve this kind of scenario https://github.com/gardener/etcd-druid/issues/280#issuecomment-1143639829

timuthy commented 2 years ago

I'd be interested in knowing what is planned here. Do you already have something in mind?

ishan16696 commented 2 years ago

I reopened it for future reference as whenever we will upgrade our Etcd version(may be to 3.6) then we can also update the livenessProbe for etcd.

timuthy commented 2 years ago

TBH, I don't get the motivation of having an open issue if there is no issue today (as claimed by https://github.com/gardener/etcd-druid/issues/280#issuecomment-1160311146). If you want to keep this as a future reference then you can still take the information from this issue, even if it's closed. At the moment, this issue represents an open item in https://github.com/gardener/etcd-druid/issues/107 and gives the impression that work needs to be done on our end.

Either we see a problem with the readiness/liveness probes as they are configured today and discuss/implement a solution or we open a ☂️ issue for upgrading to etcd 3.6 and list all the requirements/changes once we plan to do so.

ishan16696 commented 2 years ago

ok, then feel free to close.

ishan16696 commented 2 years ago

Found out the upstream issue which is also described in this comment https://github.com/gardener/etcd-druid/issues/280#issuecomment-1143639829 Mentioning this for future reference whenever we want to move to etcd v3.6

ishan16696 commented 2 years ago

We had introduced the livenessProbe in multi-node etcd with this PR: https://github.com/gardener/etcd-druid/pull/396 But we found out that livenessProbe was causing more harm than good. Then, we decided to remove the livenessProbe from etcd cluster with this PR: https://github.com/gardener/etcd-druid/pull/423 So, I'm opening this issue just to track the liveness Probe for etcd cluster and whenever we will upgrade our etcd version then we can introduce the livenessProbe. Please refer to this upstream PR: https://github.com/etcd-io/etcd/pull/13525

/on-hold

ishan16696 commented 1 year ago

This issue needs to pick after https://github.com/gardener/etcd-druid/issues/445 etcd version upgrade.

shreyas-s-rao commented 1 year ago

/assign @ishan16696

gardener / etcd-druid

[Feature] Liveness Probe on multi-node etcd #280