Open vlerenc opened 1 year ago
I looked into the issue. In my case, the custodian controller is updating the ETCD resource fine:
We have a condition in Custodian Controller that ignores all the reconciliation of status if there is an error message present in ETCD CR in https://github.com/gardener/etcd-druid/blob/09e62b2de053e45fe2faddc352180b3defd6164a/controllers/etcd_custodian_controller.go#L91 Probably, @vlerenc faced some error and after that he tried to scale down the ETCD cluster. So, custodian controller did not update the scaled down version. As a mitigation to this bug, I can make changes in custodian controller so that it update the status even before checking the presence of error in ETCD CR. Is it okay @vlerenc ?
I can't reproduce the issue on my system. I am closing this ticket. Please reopen if the issue is faced again. /close
Describe the bug: Just a small thing. When only 2 out of 3 pods are available (I forcefully scaled down the statefulset continuously to 2 replicas in an endless loop), the etcd resource status is not properly updated and at least the quorum information is stale (whatever it was before - in my case, because my replica count was 0 before I set it to 2, it remained stuck at false).
Expected behavior: That the status is correctly reflected and in this case, the ETCD cluster had quorum and was operational, so it should have been reported as true.
How To Reproduce (as minimally and precisely as possible): See above.
Logs:
Anything else we need to know?: The issue was discussed with @aaronfern out-of-band in https://sap-ti.slack.com/archives/C0177NLL8V9/p1670587196752109.