canonical / go-dqlite

Go bindings for libdqlite
https://dqlite.io
Apache License 2.0
432 stars 69 forks source link

Change cluster join addresses at runtime #282

Closed koh-osug closed 9 months ago

koh-osug commented 9 months ago

In my network nodes come and go. When adding the address and the other peers to join I use:

options := []app.Option{app.WithAddress(db), app.WithCluster(*join)}

Now, I look for an option to dynamically change the cluster. My approach would be to restart all nodes with the updated join information. Is there a function which can change the cluster at runtime?

freeekanayaka commented 9 months ago

In my network nodes come and go. When adding the address and the other peers to join I use:

options := []app.Option{app.WithAddress(db), app.WithCluster(*join)}

This is only needed the very first time you start a node, in order to have it join the cluster. After the node has joined, you don't need to pass those options. That can be done for example by saving that information on disk (in a file or something like that), e.g.:

var options []app.Option
if hasNotJoined {
    options = []app.Option{app.WithAddress(db), app.WithCluster(*join)}
else {
    options = ...
}

Now, I look for an option to dynamically change the cluster. My approach would be to restart all nodes with the updated join information. Is there a function which can change the cluster at runtime?

If I understand correctly what you mean, this is already done by the app code:

https://github.com/canonical/go-dqlite/blob/4edab5e2dd71e25576b5095b2f3ebdea2664e1a0/app/app.go#L565

where a goroutine periodically refreshes the local cache of cluster addresses, so the node's knowledge of the cluster remains up-to-date.

koh-osug commented 9 months ago

This means in summary:

  1. When I start my first node (initially there is only one node and not more) the join peers will be empty
  2. When I start a second node I pass the join address of the first node
  3. When I start node X I supply join addresses of the previous X-1 nodes
  4. I will never touch node 1 and node 2 and any previous nodes again, they will get this addresses implicitly since other nodes have joined them and they are smart enough to use this information.
  5. To restart a node I do not call options = []app.Option{app.WithAddress(db), app.WithCluster(*join)} but this information will be read from a file. I.e. I will just call options = []app.Option{app.WithAddress(db)) and some smart built-in logic from the library will load this file implicitly. But I have to know somehow by using a marker file that this is a restart.

Is this correct?

cole-miller commented 9 months ago

@koh-osug That's about the size of it, except that for point 5, it's fine to unconditionally provide app.WithCluster(*join) -- the app.New function will use those addresses only if it doesn't see the cluster.yaml file at startup. On the other hand, if you don't provide a list of existing nodes to app.New, and there's no cluster.yaml file, that's an error, unless the node in question is the bootstrap node.

koh-osug commented 9 months ago

Thanks a lot.