canonical / microcloud

Automated private cloud based on LXD, Ceph and OVN
https://microcloud.is
GNU Affero General Public License v3.0
260 stars 36 forks source link

Add recovery option for quorum loss. #291

Open masnax opened 2 months ago

masnax commented 2 months ago

Similar to LXD's lxd cluster edit command which can be used to repair a cluster that has lost all but one of its nodes, we need a similar mechanism to modify the local cluster configuration if we have lost quorum.

masnax commented 2 months ago

@MggMuggins As a first step, we should investigate whether go-dqlite supports something similar to lxd cluster edit and lxd recover-from-quorum-loss.

MggMuggins commented 2 months ago

lxd recover-from-quorum-loss currently calls the deprecated Node.Recover to reset the dqlite raft log with only the current node as a member of the cluster. Microcluster should use ReconfigureMembership instead.

Node.Recover and ReconfigureMembership invoke dqlite_node_recover_ext. The comment block there indicates that the function should be called exactly once, after which the entire data directory for all remaining dqlite members should be completely replaced by the data dir from the member where dqlite_node_recover_ext was run.

Unless I'm missing some dqlite/raft behavior/context, this isn't being done for recover-from-quorum-loss (not an issue for 3-node clusters but anything larger would run into trouble; have anecdotal evidence but haven't actually done it). The docs for lxd cluster edit indicate that the same yaml should be applied to all cluster members, not just one. Since removing nodes isn't allowed via edit, my guess is that performing the edit on all nodes isn't problematic if each member has the same log before they're shut down, but just guessing here.

In terms of what microcluster should do, my feeling is that we should expose the functionality dqlite provides in allowing a reset to retain part of the cluster, something like:

func (m *MicroCluster) RecoverFromQuorumLoss(keepMembers []string) error

and read cluster.yaml for node IDs etc.

I'm thinking that copying the database dir isn't something we can do in microcluster/microcloud since the DB can't be running during the reset process; I'm happy to be corrected here. Will look more tomorrow.