aenix-io / etcd-operator

New generation community-driven etcd-operator!
https://etcd.aenix.io
Apache License 2.0
83 stars 14 forks source link

Guess etcd replicas number function #239

Open kvaps opened 2 months ago

kvaps commented 2 months ago

According to the latest meeting 2024-06-18 MINUTES we decided that we need a function that guesses the needed amount of etcd replicas.

It can be used for recovering non-exising STS object and also for scaling from 0 Design ref: https://github.com/aenix-io/etcd-operator/pull/181

Proposal:

lllamnyp commented 2 months ago

I would definitely like to drop these steps altogether.

Check cluster-state configmap

  • if configmap exists and initial-cluster-members defined

    • if there are any hostnames defined in initial-cluster-members

    • take the hostname of pod with highest number and +1

      • save value into guessed variable

This seems redundant, as we already have this info from checking the Endpoints object:

read pods pods that falls under StatefulSet label selector

  • if there are any pods

    • take the pod name with highest number and +1

    • if value is greater then value in guessed, save value into guessed variable


I don't like this step at all:

if value is greater then value in guessed, save value into guessed variable

IMO, if we found a value from a reliable source, such as member list, we should never fall back to a less reliable source, such as "number of endpoints". Only if the more reliable source is unavailable (e.g. we cannot get member list due to lack of quorum), should we try guessing the right number of replicas from Endpoints or PVCs.

kvaps commented 2 months ago

@lllamnyp

I would definitely like to drop these steps:

Check cluster-state configmap

it is created at initial and keeps existing all the time. It should always contain correct infromation, until someone will remove it, why no using it?

read pods pods that falls under StatefulSet label selector This seems redundant, as we already have this info from checking the Endpoints object

Are all our pods always get into service endpoints? If so it can be omitted. Also is there any chance that by running this check service and endpoints will not be exising?

If we consider member list as reliable source, then you're right, let's return it directly

v2:

Kirill-Garbar commented 2 months ago

Etcd-headless service will always have endpoints - it doesn't rely on readiness probes => so all created pods with ip addresses will be in the headless-service. This service is ensured in the very beginning => so it must exist.

I personally do not like checking cluster-state configmap because in the past we agreed that this is some kind of cache and it would be nice to get this info from etcd pvcs. So amount of pvcs in my opinion is more reliable source than cluster-state cm. So cm can be checked but as a last resort.

kvaps commented 2 months ago

Okay it seems cluster-state configmap check makes no sense, so removed:

v3:

lllamnyp commented 2 months ago

Okay it seems cluster-state configmap check makes no sense, so removed:

v3:

  • return guessed

LGTM

lllamnyp commented 1 month ago

This function is tentatively implemented here as

func (o *observables) desiredReplicas() (max int) {}