FoundationDB / fdb-kubernetes-operator

A kubernetes operator for FoundationDB
Apache License 2.0
240 stars 83 forks source link

Pending/Unreachable Pod can delay coordinator selection #845

Closed johscheuer closed 2 years ago

johscheuer commented 3 years ago

In the current implementation of the subreconcilers a Pod that is in a Pending state or is unreachable can delay the selection of new coordinators. The reason is that UpdatePodConfig will trigger a requeue if not all Pods are reachable and have to correct ConfigMap. Depending how long the Pod is in that state this can be critical for the cluster since a coordinator that's lost during that time will not be replaced by a new one which could lead to losing the quorum.

We have a related issue with a different focus: https://github.com/FoundationDB/fdb-kubernetes-operator/issues/732

johscheuer commented 2 years ago

I think that issue should be resolved but we have to validate that.

johscheuer commented 2 years ago

Was tested and the issue is resolved.