FoundationDB / fdb-kubernetes-operator

A kubernetes operator for FoundationDB
Apache License 2.0
241 stars 82 forks source link

Stopping instances without replacing them #289

Closed brownleej closed 2 years ago

brownleej commented 4 years ago

There are cases where a user may want to stop instances and not bring them up until further action is taken. For instance, a process on a bad host could misbehave in ways that cannot be fixed by simply excluding it. Stopping instances can also be useful as a way to simulate failures when testing data center failover. I think we should support this through the operator.

The general mechanism is that we should update the monitor conf to not include a start command for the process, and kill the process to get it out of the cluster. We may want to think of this in terms of #277, which might provide a faster path to kill the process.

johscheuer commented 3 years ago

I think it also makes sense to have a way to shutdown a whole cluster. We could either implement that as a special instance ALL or something or a dedicated flag.

johscheuer commented 2 years ago

We currently have two ways for this https://github.com/FoundationDB/fdb-kubernetes-operator/blob/main/docs/cluster_spec.md#buggifyconfig emptyMonitorConf to remove all processes in a cluster and crashLoop to put specific Pods into a crash loop state is there anything else we need?