Open kvaps opened 7 months ago
Another idea to consider is if a user manually recreate a replica (by deleting the pod and the PVC). In such cases we need to verify within the cluster that the old replica no longer exists.
etcd operator should be able to scale cluster up and down and react to pod deletion or PVC deletion.
There should be fields status.replicas
and status.instanceNames
in order to understand what instances are members,
which of them should become members and which of them should be removed.
We should introduce new status condition Rescaling
that will be False
if everything is fine and True
if
cluster currently is rescaling or fixing, for example when pod (in case of emptyDir) or PVC is deleted.
Cluster state configmap should contain ETCD_INITIAL_CLUSTER
only from the list of status.instanceNames
as they're
healthy cluster members.
Field status.replicas
should be filled on reconciliation based on current number of ready replicas if cluster is not
in rescaling state. Firstly, it's filled when cluster is bootstrapped.
Field status.instanceNames
should be filled on reconciliation based on current ready replicas if cluster is not in
rescaling state.
When spec.replicas
> status.replicas
operator should scale cluster up.
Process is the following:
spec.replicas
Rescaling
to True
with Reason: ScalingClusterUp
etcdctl member add
for each new memberstatus.replicas
and status.instanceNames
in accordance with spec.replicas
and current pod namesRescaling
to False
with Reason: ReplicasMatchSpec
ETCD_INITIAL_CLUSTER
according to status.instanceNames
In case of errors, EtcdCluster will be stuck on Recaling
stage without damaging cluster.
If user cancellation (by updating EtcdCluster's spec.replicas
to old value), StatefulSet spec.replicas
should
be reverted back and status condition for Rescaling
should be set to False
.
If user sets spec.replicas
< status.replicas
to both cancel scaling up and perform scaling down, we should update
StatefulSet's spec.replicas
to status.replicas
of CR and set Rescaling
to False
and schedule new reconciliation.
When spec.replicas
< status.replicas
operator should scale cluster down.
Process is the following:
idx=status.replicas - 1
-> crdName-$(idx)
Rescaling
, status True
and Reason: ScalingClusterDown
spec.replicas
to spec.replicas - 1
Service
as root and run send command like etcdctl member remove crdName-$(idx)
.\n
Running this command with an alive pod should be safe as pod should be already sent the SIGTERM
signal by kubelet.Rescaling
to False
with Reason: ReplicasMatchSpec
spec.replicas
< status.replicas
, reschedule reconcile to run this algorithm from the beginning
We need to design a mechanism for scaling a cluster up and down.
When a user modifies
spec.replicas
, the cluster should scale to the required number of replicas accordingly. Currently, we are utilizing a StatefulSet, but we understand that we might have to move away from it in favor of a custom pod controller.Scaling up should work out of the box, but scaling down might be more complex due to several considerations:
We're open to suggestions on how to address these challenges and implement an efficient and reliable scaling mechanism.