Open shreyas-s-rao opened 1 year ago
API-wise this approach seems procedural which we probably want to avoid.
Do we need to expose a function-style operation or isn't it possible to stick to Kubernetes's well known desired/actual state paradigm, i.e. API changes trigger changes that Druid performs in order to transform the actual to the desired one?
isn't it possible to stick to Kubernetes's well known desired/actual state paradigm, i.e. API changes trigger changes that Druid performs in order to transform the actual to the desired one?
That was my first thought as well. The issue here is that any change to volumeClaimTemplate
of the statefulset is forbidden. If we are to allow changing the volume-related spec in the Etcd resource (storageCapacity
, storageClass
), we'll need to define a strict behavior for it, like if a change is made to either of these fields, druid must always recreate the statefulset, accompanied with a deletion of the volume". But that's risky because it means deletion of etcd data.
Of course we can safeguard this operation with something similar to druid.gardener.cloud/pvc-deletion-confirmation
and such, and also compulsorily make druid trigger a full snapshot of the etcd before scaling down the statefulset. But the fact remains that the whole operation remains more or less "procedural" in a strict sense.
Also, Etcd resource does not currently store any state for the volumes it uses (unless we plan to do that using the EtcdMembers
field), so there's no strict desired/actual state maintained for the etcd volumes, atleast at the moment.
Also, other use-cases of this functionality of being able to programatically delete a PVC on-demand are:
handlePermanentQuorumLoss
flow, which reuses the deletePVC
flow/cc @abdasgupta
Pro:
Con:
So, whether we automate the volume replacement (scripts have risks themselves, so a well-tested and safe-guarded druid implementation may still be the better option, even if we are all fearful of volume deletion code), I would still like to raise the question, whether we can offer a way to do that w/o a downtime as initially hoped/planned (HA became the prerequisite for volume replacement for critical clusters and scaling down the stateful set, would defy this goal).
Yes, the stateful set would show "the wrong volume template", but it would adopt recreated volumes, so that these changes can be implemented without any downtime, right?
This would lead to inconsistency in the Kubernetes's well known desired/actual state paradigm
mentioned earlier by @timuthy . The zero-downtime update can still be done, using the steps provided by @unmarshall in https://github.com/gardener/gardener-extension-provider-aws/issues/646#issuecomment-1330188838. So essentially, we can make an Etcd spec change for storageSize / storageClass, accompanied by something like a confirmation annotation to make sure that volumes aren't deleted without explicit confirmation. @vlerenc WDYT?
Yes, exactly. I was commenting on the sentence "Scale down the statefulset (if it isn't already scaled down)" and if you have the capacity to automate https://github.com/gardener/gardener-extension-provider-aws/issues/646#issuecomment-1330188838 that would be most welcome.
/assign
/assign @seshachalam-yv
Blocked until #588 is implemented
Feature (What you would like to be added): I would like to leverage the data restoration functionality of the etcd-backup-restore sidecar (in both single- and multi-node etcd clusters) to allow druid to delete PVC(s) on-demand and trigger recreation of the PVC(s) later.
Motivation (Why is this needed?): There are use-cases where user wants to switch storage class of the PVC to a better one, or to change the volume size of the etcd disk, or maybe to switch to encrypted disks. Today, any changes to the volume configuration in the
Etcd
CR leads to reconciliation errors, as the existing statefulset forbids any updates to thevolumeClaimTemplate
, as can be seen from https://github.com/gardener/gardener-extension-provider-aws/issues/646#issuecomment-1330188838. I would like druid to catch such errors, and possibly even gracefully handle them in a certain way, if I would specify it via maybe annotation(s) on theEtcd
CR.Approach/Hint to the implement solution (optional): Introduce annotations in the
Etcd
CR, something likedruid.gardener.cloud/operation: "recreate-pvc/etcd-main-0"
anddruid.gardener.cloud/operation: recreate-sts
(if necessary), that druid sees and does the following:recreate-sts
annotationAlso enhance the
immutableFieldUpdate
function introduced in https://github.com/gardener/etcd-druid/pull/408 and see how this can be leveraged for the newdruid.gardener.cloud/operation: recreate-sts
annotation, if necessary.cc @unmarshall @timuthy @vlerenc