FoundationDB / fdb-kubernetes-operator

A kubernetes operator for FoundationDB
Apache License 2.0
243 stars 82 forks source link

Proposal: Support bin packing of fault domains #926

Open johscheuer opened 3 years ago

johscheuer commented 3 years ago

In our current implementation the operator tries to distribute the process groups across as many fault domains as possible. If the number of fault domains is greater than the number of process groups we create (and we have enough resources) that means that every process group will be on a different fault domain. This might not be desirable in environments with a non-trivial number of process groups. One challenge is that most operational tasks are bound to fault domains since a human operator or the Kubernetes operator can only ensure that e.g. operating on a single fault domain reduces the risk of dataloss. A possible way to solve this is to allow a user to define a number of distinct logical fault domains. The operator would add an additional label to all Pods e.g. foundationdb.org/distribution-key, this key could be used to write a Pod Affinity and a Pod Anti Affinity. The value of the label would be the process group number modulo the number of desired fault domains. We could also extend this to bin pack all stateful processes.

I try to find some time this week to write a more formal proposal/design doc with additional examples and the limitations of this approach. From my current point of view I think this should be fairly simple to be supported in the operator and could have a huge benefit on the operational side for bigger clusters.

brownleej commented 3 years ago

Yes, I think it would be great to get a proposal for this. I'm concerned that this kind of approach would lead to new scenarios where we fail to find capacity, but the right combination of soft constraints and hard constraints could limit that risk in practice.