cowsql / raft

Asynchronous C implementation of the Raft consensus protocol
https://raft.readthedocs.io
Other
45 stars 6 forks source link

Add raft_voter_contact() #187

Open ralight opened 3 months ago

ralight commented 3 months ago

This returns the number of voting nodes that are recently in contact with the leader, to allow determining if the cluster is currently in a degraded / at risk state.

This is a port of a change I submitted to the canonical/raft repo, but tweaked to make it a bit more useful - the count is available not just in the leader part of the union.

I don't know what you'd like to do about struct abi compatibility, please let me know if there's anything to change around that or anything else.

freeekanayaka commented 3 months ago

Hey, thanks, what is your use case for having raft_voter_contact() on followers return the last known value?

ralight commented 3 months ago

We are using the voter contact information in periodic tick and in the state change callback.

In the state change callback it is important for us to be able to distinguish between a leader being disconnected from the rest of the cluster, and a leader that is stepping down to the leadership being manually transferred with raft_transfer.

In the first case, we absolutely want to generate some monitoring event that indicates a problem, but in the second case we do not. I found that using the voting count number was the most straightforward way of determining the difference between these two cases, but it then has to be available after the role has changed from being the leader.

freeekanayaka commented 3 months ago

We are using the voter contact information in periodic tick and in the state change callback.

In the state change callback it is important for us to be able to distinguish between a leader being disconnected from the rest of the cluster, and a leader that is stepping down to the leadership being manually transferred with raft_transfer.

In the first case, we absolutely want to generate some monitoring event that indicates a problem, but in the second case we do not. I found that using the voting count number was the most straightforward way of determining the difference between these two cases, but it then has to be available after the role has changed from being the leader.

I see, thanks for the detailed explanation. It's an interesting use case, however I'm a bit hesitant to expose the contact details in this way. I'd probably prefer a more generic way that includes other information about followers as well, such as how much they are up-to-date. It's something I'd like to do for the v1 version of the API, and it should meet this use case too. This is a long term plan though.

In the meantime, would it be possible for you to perform this kind of bookkeeping from the user side? For example you could keep track that a raft_transfer request is in progress, and use that information to distinguish between the two cases that you describe. Or is there something that would make that not feasible?

ralight commented 3 months ago

I should think we can do it. I'm not entirely what bits are going to be removed under legacy - the obvious solution is to use the raft transfer callback - is that going to remain?

This still isn't a perfect solution because if we lose network connectivity between the transfer call being made and actually sending that call, then we would incorrectly assume everything was ok. That should be a pretty unlikely event though.

freeekanayaka commented 3 months ago

I should think we can do it. I'm not entirely what bits are going to be removed under legacy - the obvious solution is to use the raft transfer callback - is that going to remain?

Not sure to understand your question here. All the bits under legacy are going to be eventually removed, but I don't plan to do it any time soon, and keep this compatibility layer around for a very long time, since there are people depending on it.

The v1 design is not based on callbacks, but rather on a "pull" model, where consumers of the library are in control of the logic flow (they call instead of being called back).

The raft_transfer equivalent in v1 would be roughly:

struct raft raft;
struct raft_event event;
struct raft_update update;

event.type = RAFT_TRANSFER;
event.transfer.server_id = 123;

raft_step(&raft, &event, &update);

This initiates the transfer, which can be then monitored by clients using the raft_transferee() function, which returns the ID of the server that leadership is being transferred to, in combination with the raft_state() and raft_leader() functions. The v1 design is slightly more low-level than the current callback-based APIs, and it's meant to be a bit more flexible and to be easily integrated both in event loops and/or thread-based consumers.

This still isn't a perfect solution because if we lose network connectivity between the transfer call being made and actually sending that call, then we would incorrectly assume everything was ok. That should be a pretty unlikely event though.

I can't quite understand what you mean.

If raft_transfer() returns 0, you can be sure that the transfer request has been accepted and is in progress.

Once the callback files, you are guaranteed that the transfer request has completed (either successfully or not). You can check if the request was successful by calling raft_leader() and checking that it returns the target server.

If connectivity is lost, you'll know that there was a problem because raft_leader() won't return the desired ID (and on top of that raft_state() will return RAFT_FOLLOWER because the leader has stepped down). In that case you'll want your application to generate the monitoring event that you were talking about, because you are sure that a problem has occurred.

Hope I'm not missing something.

ralight commented 3 months ago

Thanks for the description. I don't think you've missed anything, I was thinking about it in a slightly wrong way.

Where does that leave this PR? Are there any modifications that would make it acceptable for you, or would you prefer to work with your long term plan only?

freeekanayaka commented 3 months ago

Thanks for the description. I don't think you've missed anything, I was thinking about it in a slightly wrong way.

Where does that leave this PR? Are there any modifications that would make it acceptable for you, or would you prefer to work with your long term plan only?

Thanks. If the suggested solution is a viable option for you, I'd prefer to park this PR for now, mainly to avoid introducing new public APIs that would likely need to be deprecated later down the road.