Closed nishantapatil3 closed 1 year ago
Hey @nishantapatil3, were you able to solve the issue? I'm on the v0.2.10 and the issue seems to still persist
Hey @nishantapatil3, were you able to solve the issue? I'm on the v0.2.10 and the issue seems to still persist
I wasn't able to solve this issue, this might require restructuring on how the certificate is injected into webhook before manager starts.
Describe the bug
Bug on
v0.2.7
version seen after my commit https://github.com/cisco-open/cluster-registry-controller/pull/33A scenario that I want to discuss further. (I found this today)
Legend:
v0.2.2
old cluster registry - OCRv0.2.7
new cluster registry - NCRWhen a NCR is deployed OCR is currently a leader but NCR will not be ready unless a CA Bundle is generated and injected into ValidatingWebhook which leads to NCR never being ready(not getting leaderElection)
By default leaderElection is true in CR - https://github.com/cisco-open/cluster-registry-controller/blob/master/deploy/charts/cluster-registry/values.yaml#L54
This can be solved by A) force leaderElection on NCR (dont know how to) B) disable leaderElection as Webhook check is validated periodically here - https://github.com/cisco-open/cluster-registry-controller/blob/master/pkg/cert/renewer.go#L117
for example: while upgrading from OCR to NCR there a short window where two cluster-registries will be deployed, one will be a leader and other waiting to be a leader
v0.2.2 cluster-registry:
/metrics
as readiness probe comes up without webhook validation and marks as ready there by terminating the old pod of cluster-registryv0.2.7 cluster-registry:
/readyz
as readiness probe is not marked as ready unless wehook is ready which is where the webhook awaits for leaderelection to generate ca Bundle and mark new pod of cluster registry as ready and there by kill the old podSteps to reproduce the issue
deploy helm chart with replicaset:2 and leaderelection enabled and check if both pods are ready
Expected behavior to set readiness probe to ready(with webhook CA Bundle ready) before leaderElection
Screenshots
Continues to be in this state until
cluster-registry-controller-controller-b8b499b68-llg4r
is killed so that leaderElection is transferred tocluster-registry-controller-controller-b8b499b68-nkvcp
Additional context doesn't work if there are two v0.2.7 cluster registries, one pod waits for another to hand over the lease. Sorry I missed checking this before committing into cluster-registry.
Quick Solution If
leaderElection: false
then the above issue is not seen