peer addresses in `Consensus`

tbg commented 9 years ago

Did some light hacking around membership changes today and noticed that Consensus grabs the whole peers HashMap, including the peer addresses, at startup.

One issue with this is that restarting a node with a new port breaks everything: The new port is never known, so the Consensus can never again send leader hints which contain the right correct address.

In the same vein, the Server can't reconnect to such a node (only the other node can connect to it).

I suggest adding the port to the server preamble and using that information to keep the peers map in Server current, and to update the Consensus's accordingly on inbound requests. Future use might make it worthwhile to actually add the full peer address (since the source IP may not necessarily be the one you need to talk to for an outgoing connection). For now probably easiest to add a host:port string.

It's also worth considering removing knowledge of the SocketAddrs completely from Consensus (it's leaking the implementation of transport). The only downside that I can see is that after that change, a leader hint only contain's the leader's ID, but no way to directly connect to it. I don't think that's really necessary though, so that change might also be net positive.

The suggestions are fairly small, so if any part of them seems worthwhile, I'm happy to take a stab at it.

danburkert commented 9 years ago

Did some light hacking around membership changes today and noticed that Consensus grabs the whole peers HashMap, including the peer addresses, at startup.

Awesome, feedback is always appreciated!

I suggest adding the port to the server preamble and using that information to keep the peers map in Server current, and to update the Consensus's accordingly on inbound requests.

I agree, we should be updating the port and ip address on reconnection, if you want to implement it that would be awesome.

It's also worth considering removing knowledge of the SocketAddrs completely from Consensus (it's leaking the implementation of transport). The only downside that I can see is that after that change, a leader hint only contain's the leader's ID, but no way to directly connect to it. I don't think that's really necessary though, so that change might also be net positive.

I agree, but that downside is non-workable since clients do not, and should not, need to know server IDs. My hope is that we can implement the system such that Server IDs are completely internal between peers. On startup, servers will consult their local persistent log for their server ID. If the persistent log is empty (the server is starting up for the first time), the server will attempt to connect to it's consensus group, and a new ID will be generated that has not been previously used. This will require some pretty radical changes with how servers are brought up and new consensus groups are created (groups will have to be created with a only a single member with a fixed ID, and then further members will have to be added through the configuration change mechanism).

tbg commented 9 years ago

I agree, we should be updating the port and ip address on reconnection, if you want to implement it that would be awesome.

Ok, happy to do that.

I agree, but that downside is non-workable since clients do not, and should not, need to know server IDs.

Ah, yeah. I'm still getting used to clients working the way they do here. I'm used to reproposals, where you can talk to any of the nodes in the cluster (and they repropose it at the leader on your behalf). This has the upside of being less work for the client and being more fault-tolerant (but of course less performant in some scenarios due to the extra round-trip). One last question: It doesn't look like a client proposal actually blocks and waits for the commit (unless you're single-node). Rather, it sends out an append request and returns to the client right away. Am I reading that right?

Hoverbear commented 9 years ago

@tschottdorf We do wait! here you can see it waiting. Note the response from the server doesn't come until the entry is commited!

Here is where proposal_request is.

Here we process that if all is well, notice how only the peer messages are made.

Here is where the response is sent!

Feel free to drop by #raft on Mozilla's IRC. :)

Hoverbear commented 9 years ago

These reconnection changes might be also useful for #77 (Membership changes). I might even be willing to argue that changing address/port demands a membership change.

tbg commented 9 years ago

@tschottdorf We do wait! here you can see it waiting. Note the response from the server doesn't come until the entry is commited!

Ah, thanks, the part in advance_commit_index is what I've missed. It looks more familiar now. Are you already handling losing leadership during a proposal process (without reproposals, which seems to be the strategy right now, the pending command would have to be aborted in order to not keep the client waiting forever, though there's no guarantee that it might be committed anyways by the new leader. With reproposals, you wind up executing it multiple times instead).

tbg commented 9 years ago

I might even be willing to argue that changing address/port demands a membership change.

What's the argument for that? I would hope to treat that the same as a temporary downtime. After all, the state of the system is the same. With proper interfaces, Raft wouldn't even be aware that anything had changed - it's only leaking the abstraction that does it.

Hoverbear commented 9 years ago

@tschottdorf Hm, a good argument! :) Hadn't considered that.

Hoverbear commented 9 years ago

Are you already handling losing leadership during a proposal process (without reproposals, which seems to be the strategy right now, the pending command would have to be aborted in order to not keep the client waiting forever, though there's no guarantee that it might be committed anyways by the new leader. With reproposals, you wind up executing it multiple times instead).

I think this should be a unit test....

tbg commented 9 years ago

I might even be willing to argue that changing address/port demands a membership change.

There might still be an argument for that, at least if addresses aren't going to be abstracted out. With my in-review changes (assuming they do what I intend), an inbound connection updates the receiver. But in the long run, there's no reason for every node to hold an open connection to every other node at all times (everyone only talks to the leader in normal operation), so if a few nodes change addresses, the information may not spread. The Raft implementation that I'm accustomed to (Cockroach/MultiRaft) has a Gossip network using which IDs can be mapped to addresses, so that issue wasn't present there.

Hoverbear / old-raft-rs

peer addresses in `Consensus` #92