Feature (What you would like to be added):
Today etcd-druid only reacts to changes done to the etcd CRD. Currently it does not know what was the last successfully applied change(s) in the etcd spec. As part of this enhancement we start to capture successfully applied configuration as part of the status so that the controllers are enabled to compare the last-known-good (LKG) state vs the current state of the etcd resource and take appropriate action if needed.
Motivation (Why is this needed?):
Motivation is the use case for upgrading etcd cluster from a single-node cluster to a multi-node cluster. Currently for a single node cluster Peer URL is not TLS enabled as there is currently no peer. When etcd resource is changed resulting in upgrading the etcd cluster from single to multi-node then secure peer communication is required. TLS configuration to enable peer-to-peer communication is required. To enable scale-up of the etcd cluster, the existing member needs to update its peer URL and make it TLS enabled, so that when additional members start and try and join the cluster (one learner at a time) then they are able to establish peer communication over HTTPS. Change in the peer URL of the existing member requires a mandatory restart of the etcd process (see here). In the current setup this will result in a total of 2 restarts of the etcd pod before the peer URL of the existing member (single node etcd cluster) reflects a TLS enabled URL. To prevent 2 restarts the idea is to delete the StatefulSet and create it again (which will result in a single restart).
etcd-druid needs to know what has changed in the spec in order to conditionally delete STS it needs to what has changed in the spec. controller-runtime does not allow visibility into what has changed. This was possible when using client-go. Therefore we need to capture the LKG configuration as part of the status of the etcd resource.
Feature (What you would like to be added): Today etcd-druid only reacts to changes done to the etcd CRD. Currently it does not know what was the last successfully applied change(s) in the etcd spec. As part of this enhancement we start to capture successfully applied configuration as part of the status so that the controllers are enabled to compare the last-known-good (LKG) state vs the current state of the etcd resource and take appropriate action if needed.
Motivation (Why is this needed?): Motivation is the use case for upgrading etcd cluster from a single-node cluster to a multi-node cluster. Currently for a single node cluster
Peer URL
is not TLS enabled as there is currently no peer. When etcd resource is changed resulting in upgrading the etcd cluster from single to multi-node then secure peer communication is required. TLS configuration to enable peer-to-peer communication is required. To enable scale-up of the etcd cluster, the existing member needs to update its peer URL and make it TLS enabled, so that when additional members start and try and join the cluster (one learner at a time) then they are able to establish peer communication over HTTPS. Change in the peer URL of the existing member requires a mandatory restart of the etcd process (see here). In the current setup this will result in a total of 2 restarts of the etcd pod before the peer URL of the existing member (single node etcd cluster) reflects a TLS enabled URL. To prevent 2 restarts the idea is to delete the StatefulSet and create it again (which will result in a single restart).etcd-druid needs to know what has changed in the spec in order to conditionally delete STS it needs to what has changed in the spec.
controller-runtime
does not allow visibility into what has changed. This was possible when usingclient-go
. Therefore we need to capture the LKG configuration as part of the status of the etcd resource.