Open ReubenBond opened 1 week ago
This is probably a layman question, but would this proposal have meaningful negative implications on throughput if an expired lease has to be checked / confirmed before activating a new grain?
My understanding is that this would not have any negative impact for grains being already active since the lookup process would be mostly unaffected.
Is the tradeoff for this that there's a stronger guarantee that there won't be duplicate activations during the lease period (and ideally no duplicates since the old silo will terminate itself if it can't renew its lease). But there's a longer period where an old, unreachable silo will still be seen to hold the lease so activations won't be placed elsewhere until that lease is given up?
would this proposal have meaningful negative implications on throughput if an expired lease has to be checked / confirmed before activating a new grain?
No, this does not impact performance. It slightly affects directory hand-off & crash recovery just because we aren't omitting activations hosted on crashed silos, but that is not meaningful.
My understanding is that this would not have any negative impact for grains being already active since the lookup process would be mostly unaffected.
That is correct. Leases are checked centrally, periodically, not at the per-grain level.
But there's a longer period where an old, unreachable silo will still be seen to hold the lease so activations won't be placed elsewhere until that lease is given up?
Yes, that's right: this feature necessarily decreases availability of some subset of grains after a crash.
Specifically: grains known to be hosted on a crashed silo (i.e, registred to other partitions), and grains which were potentially hosted on the crashed silo (i.e, grains belonging to the directory ranges owned by the crashed silo which are not known to be hosted elsewhere).
Fixes #2428 Fixes #5687 Fixes #8242
In #9103, we introduced a strong consistency directory, leveraging the strong guarantees which Orleans' powerful membership provides, as discussed in #1323. This proposal is for a mechanism to go the last mile and offer strong single activation guarantees by means of leases. The new grain directory is strong consistency already, but strong single activation guarantees rely on evicted silos ceasing operation when there is a potential for a grain to be activated elsewhere. Leases are the only practical way to implement this kind of guarantee (see this comment).
The proposal is to add an implicit leasing mechanism based on membership which silos and the directory will use to self-terminate/deactivate activations and to prevent registrations respectively. The proposed mechanism is this:
The valid leasing period must be calculated based on the membership refresh interval. Leases are extended whenever a new membership version is received by a silo.