canonical / lxd

Powerful system container and virtual machine manager
https://canonical.com/lxd
GNU Affero General Public License v3.0
4.36k stars 932 forks source link

Only extend cluster database when reaching three members #6230

Closed stgraber closed 4 years ago

stgraber commented 5 years ago

As it currently stands when bringing up clustering, the first three members all immediately become database members. This isn't ideal as unfortunately many people will only bring up a two servers cluster, ending up in the worst situation where loosing either of them leads to a broken cluster.

I believe it would instead be preferable to not have the second member act as a database member until a third member is joined at which point both second and third should be promoted to database members at the same time.

As part of this we should also update our clustering documentation to more strongly explain why two members clusters should be avoided and to more definitely recommend our users run clusters of at least three members.

stgraber commented 5 years ago

@freeekanayaka this can all be done in LXD itself correct?

We only need to modify the joining logic so we don't bring up the database unless there are at least three members in the cluster, at which point we should promote whatever members are needed to have a database backed by three of them.

freeekanayaka commented 5 years ago

Yes, in principle that would work indeed, and we already have high-level logic promotion that should be possible to reuse.

jackstenglein commented 5 years ago

@stgraber

Hello, myself and another student from UT Austin would like to take on this issue for our virtualization class.

stgraber commented 5 years ago

Thanks!

For this one, there is no API extension or much in the way of a user visible change. What we're looking for is that turning on clustering (initial member) will bring that member up with the database role, the first joining member (second cluster member) will not have that role, entirely relying on the initial member for the database, then when joining a third member, both the second and third member will get the database role.

This is because the RAFT database model we use requires consensus, consensus with just two members is problematic as the loss of either causes a stuck database. As in that scenario, the loss of either member would cause the loss of the database, changing to having just the single database member until a third is added actually improves the odd of recovery from 0% to 50% (the initial member can deal with the loss of the second).

The changes here are likely to all happen in lxd/cluster and lxd/api_cluster.go which is where the logic managing joining and leaving cluster members resides. We effectively need to change the logic so that the member doesn't get promoted to a database member unless we now have a total of 3 members, in which case, the second member also needs to be told to become a database member.

Testing this should be straightforward enough by running a test cluster in containers or VMs, joining more and more members each time checking lxc cluster list to see what member is running the database.

As for expected commits, I'm mainly expecting two for this case:

@freeekanayaka and myself should be able to help you with any question you have. The cluster logic can take a little while to wrap your head around, a good first step is likely to setup a simple 3 to 5 members cluster, get a feel of how things work and see the difference between a member that's running the database and one that isn't (main difference is content of database/global).

freeekanayaka commented 5 years ago

On top of what @stgraber said, I'll add that you might need to modify the tests in test/suites/clustering.sh in case there's anything there that assumes that 2-member cluster has 2 database members (I don't think there's any test assuming that, but just a heads up).

jackstenglein commented 4 years ago

Hi @stgraber, I think we are still a bit confused on the cluster logic and wanted to run our approach by you. We are able to prevent the second node that joins a cluster from immediately assuming the database role, and we can get the third node to join to assume the database role.

However, we have difficulty promoting the second node to the database role when the third node joins the cluster. Our current approach when promoting the second node is to call the /internal/cluster/rebalance API near the end of the clusterPutJoin function in lxd/api_cluster.go. We thought that the rebalance function would see the second node could be promoted, and then do so. The promotion request reaches the second node, but fails when the node tries to add the db.ClusterRoleDatabase role at the end of the Promote function in lxd/cluster/membership.go.

The role addition fails with the following error: "UNIQUE constraint failed: nodes_roles.node_id, nodes_roles.role". We have also seen it fail with the error "FOREIGN KEY constraint failed".

Could you provide some guidance on whether our approach is valid and/or what could be causing that error? Thanks!

stgraber commented 4 years ago

Your plan sounds good considering what we have in place right now. I don't love the idea of the joining node being the one kicking a re-balance but with the way things are structured right now, it's our best bet.

So I think it'd be good to know more about the failure. The FOREIGN KEY constraint failed suggests to me that the node_id is wrong, as in, doesn't exist in the nodes table.

Could you push what you have as a draft pull request so I can take a look and do a quick review of what you have so far?

ralubis commented 4 years ago

Hello @stgraber, I created a draft pull request as seen above. As Jack stated, the error can be recreated by adding a third member to a cluster. When adding the third member, the log file shows the promote request reaching the second node and failing.

Additionally, when I run lxc cluster list on the second node after adding the third node, it will hang while the command executes successfully on nodes 1 and 3.

Thank you!

stgraber commented 4 years ago

lxc cluster list during join is normal as while joining the cluster enters exclusive mode, preventing any other database access.