Add support for multiple cluster controllers

etschannen commented 5 years ago

The cluster controller is a singleton role which can be overwhelmed if too many processes are connected to the database. This limits the total number of processes that can join a database, and therefore limits how large a database can scale.

In addition to scaling concerns, supporting multiple cluster controllers will enable a datacenter that is partitioned from the rest of the system to still provide stale reads from that location.

Finally, having multiple cluster controllers will reduce or eliminate the cost of starting up a new cluster controller when either a cluster controller has died, or we are switching the primary datacenter of a cluster. This should reduce the master recovery times in these scenarios.

Proposed design:

Each region has its own local coordinators and local cluster controller. A region can have more than cluster controller, in which case processes in that region choose which cluster controller to register with arbitrarily.
Workers register only with their local cluster controller.
Clients only connect to their local cluster controller.
The local cluster controllers use the global coordinators to elect one of them to be responsible for electing the master.
Failure monitoring data is shared between cluster controllers.
The list of registered workers is shared between cluster controllers.
After a master is fully recovered, it will attempt to write the coordinated state to all of the local coordinators in each data center.
The master registers with all cluster controllers, if a local cluster controller dies it must resend the registration to the newly elected local cluster controller.
Tlog rejoins are processed by the local cluster controllers.

DemiMarie commented 5 years ago

I take it that clients will need to opt-in to reading potentially stale data, and that the current scalability limit is very large?

etschannen commented 5 years ago

You are correct, clients would need to pass a transaction option to read stale data. Currently, the cluster controller can handle 1000+ processes without any issues. This limit would be even higher however there is another issue which is that every client keeps a connection open to the cluster controller so that they can get updates when the master proxies change.

apple / foundationdb

Add support for multiple cluster controllers #1013