FoundationDB / fdb-kubernetes-operator

A kubernetes operator for FoundationDB
Apache License 2.0
240 stars 83 forks source link

Operator stuck on "cannot find enough pods to recruit coordinators" in triple redundancy mode since 0.37.0 #839

Closed funkypenguin closed 2 years ago

funkypenguin commented 3 years ago

Hey guys!

On v0.38.0 (and 0.37.0) I've not been able to get my cluster to reconcile when I use triple redundancy mode.

Here's my CR:

spec:
  databaseConfiguration:
    redundancy_mode: triple
    storage: 4
  processCounts:
    cluster_controller: -1
    log: -1
    stateless: -1

The 4 pods are created within about 30s, but the operator continually logs:

{"level":"info","ts":1625871790.9355466,"logger":"controller","msg":"Reconciliation terminated early","namespace":"prod","cluster":"retort-fdb","subReconciler":"controllers.GenerateInitialClusterFile","message":"cannot find enough pods to recruit coordinators","requeueAfter":15}

As soon as I change redunancy_mode to double, the operator reconciles the cluster.

My reading of https://apple.github.io/foundationdb/configuration.html#configuration-choosing-redundancy-mode indicates that 4 machines should be a supported configuration for triple redundancy mode. To be sure, I also tried storage: 5, but the same behaviour resulted.

I've previously had triple redundancy mode working with this cluster configuration on 0.36.0.

Thanks! D

johscheuer commented 3 years ago

The operator currently tries to select 5 coordinators (see: https://github.com/FoundationDB/fdb-kubernetes-operator/blob/master/api/v1beta1/foundationdbcluster_types.go#L1365-L1371) that might be a little bit different to the documentation i only states Ideal number of coordination servers so we might want to document that in a better place (if there is a gap). That means that you need at least 5 stateful Pods (log, storage). What Faultdomain did you configure? The default will be host based (https://github.com/FoundationDB/fdb-kubernetes-operator/blob/master/docs/manual/fault_domains.md#option-1-single-kubernetes-replication) that means when multiple Pods are running on thee same host only one of them can be elected as a coordinator.

johscheuer commented 3 years ago

I think we should update the "error" message how many Pods we are expecting and how many we got.

funkypenguin commented 3 years ago

Thanks @johscheuer - my takeaway here is that if I want triple redundancyMode (I do), I need 5 stateful pods, spread across 5 separate hosts. I've got this working now, and I've noted that if I take a host down (i.e., for upgrades/maintenance), my existing FDB clusters continue to operate (presumably degraded), but I'm unable to deploy any new clusters.

Cheers! D

johscheuer commented 3 years ago

By "unable to deploy any new clusters" you mean creating a different FoundationDB cluster in the same namespace? One broken cluster shouldn't block the other one, since the controller queue should ensure that both clusters are reconciled (and they are independent). It can take a longer time for the second cluster to reconcile since the first cluster will consume resources from the operator (the operator tries to elect a new coordinator until the quorum is reached again). I would recommend to run at least 6 stateful Pods for a cluster with triple redundancy. I'll add the documentation label and we should add this information to our user manual.

funkypenguin commented 3 years ago

What I mean is that it appears that if I have:

  1. a Kubernetes cluster with 5 nodes, with
  2. a reconciled FDB cluster in triple redundancy mode with
  3. 5 stateful pods, each on a separate host,

.. and I loose one of my nodes (leaving 4 remaining), my reconciled FDB cluster can still be used.

However, until a 5th node is available again, I won't be able to deploy a new FDB cluster, since the operator won't start a triple-redundancy cluster on only 4 nodes.

D

johscheuer commented 3 years ago

What I mean is that it appears that if I have:

  1. a Kubernetes cluster with 5 nodes, with
  2. a reconciled FDB cluster in triple redundancy mode with
  3. 5 stateful pods, each on a separate host,

.. and I loose one of my nodes (leaving 4 remaining), my reconciled FDB cluster can still be used.

However, until a 5th node is available again, I won't be able to deploy a new FDB cluster, since the operator won't start a triple-redundancy cluster on only 4 nodes.

D

Yes, that makes sense since we are waiting for 5 stateful Pods. I'll add some additional information for the coordinators in our fault domain documentation.

johscheuer commented 2 years ago

Reminder to document this setting.