Closed spencerkimball closed 9 years ago
@tschottdorf
Tobias, care to consolidate the other lease-related issues here?
Yep, will do.
First steps for the work to be done in the dist_sender
:
storage.NodeDescriptor
instead of a net.Addr
or do we generally want to keep Gossip information bits human readable (which means an extra key, probably renaming node-* to node-id* and adding node-attributes*)?NotLeaderErrors
(if old==new
, just expire).Yes, let's start gossiping a NodeDescriptor.
The attributes are only sorted for the purposes of gossiping. The ones which are specified when starting a node can have an arbitrary order, and I think we should add a comment to that command line flag usage mentioning that attributes should be specified in affinity order. I don't believe you want to pick the replica with the maximum overlap. Better to pick the replica which matches the first node attribute. If none match, choose at random. If multiple match, do same algorithm with second attribute, and so on. The issue with maximum overlap is not all node attributes necessarily have anything to do with suitability when picking replica for lowest latency. For example, "gpu" might be a node attribute.
You just mention reads for the leader cache. All writes will have to go to the leader as well.
We probably will want to persist this cache (and gossip as well). Both items should be added to the TODO.md file.
Latencies would be a good addition. Maybe make a note in the code somewhere appropriate. We'll do that when / if necessary.
Notes from today's call:
How are we dealing with keeping the lease alive? I remember that we thought about doing it at the level of multiraft or even its transport, but with the store sending the initial LeaderLeaseRequest
, it seems awkward to scatter the logic around, so in MultiRaft I would keep it to the minimum by simply making sure nodes don't vote while there's a running lease.
@spencerkimball relating our discussion of what information to send in a Lease
, since it has to be inspected by MultiRaft, we need to hang on to the MultiRaft NodeID and GroupID at all times. We should just embed that information instead of a Replica.
Does that sound reasonable?
Yes, this is what I had in mind. No timers or goroutines...just renew the lease if there's read pressure at the range within a generous offset of the lease expiration.
I'm fine with moving to RaftID (use this not "GroupID" as this data structure is shared outside of multiraft) and RaftNodeID instead of replica.
PTAL at a first stab at doing the work inside of MultiRaft (setting the deadlines for not voting). Search for "horrible" in the diff (func processLease): It's ugly that we have to break the abstraction between MultiRaft and the outside world and unmarshal everything once. Is there a natural way to improve this? We definitely have to synchronously update the deadline, or we might send out votes we shouldn't have sent out. We also don't want to introduce new Raft message types for leases as that would mess with Raft.
I figure we could use special client command IDs... but that only makes it more efficient by not unmarshalling for fun, not any more fun to look at.
Closed by #604