In domains ( such as pente) where the endorsement rules are 100% endorsement and the spending rules for each state is that any party can spend the state, then we need to coordinate the assembly of transactions across multiple nodes.
To achieve this, it is important that we have an algorithm that allows all nodes to agree on which of them should be selected as the coordinator at any given point in time. And all other nodes delegate their transactions to the coordinator.
The desired properties of that algorithm are
deterministic: all nodes can run the algorithm and eventually agree on a single coordinator at any given point in time
fair: the algorithm results in each node being selected as coordinator for a proportional number of times over a long enough time frame
Current proposal being pursued is:
for each pente domain instance ( smart contract API) the coordinator is selected by choosing one nodes in the configured privacy group of that contract
the selector is a pure function where the inputs are the node names + the current block number and the output is the name of one of the nodes
the function will return the same output for a range of n consecutive blocks ( where n is a configuration parameter of the policy)
for the next range of n blocks, the function will return a different ( or possibly the same) output
over a large sample of ranges, the function will have an even distribution of outputs
the function will be implemented by taking a hash of each node name, taking the hash of the output of b/n (where b is the block number and n is the range size) rounded down to the nearest integer and feeding those into a hashring function.
given that different nodes index the blockchain events with varying latency, it is not assumed that all nodes have the same awareness of "current block number" at any one time. This is accommodated by the following
Each node delegates all transactions to which ever node it determines as the current coordinator based on its latest knowledge of "current block"
The delegate node will accept the delegation if its awareness of current block also results in it being chosen by the selector function. Otherwise, the delegate node rejects the delegation and includes its view of current block number in the response
On receiving the delegation rejection, the sender node can determine if it is ahead or behind (in terms of block indexing) the node it had chosen as delegate.
If the sender node is ahead, it continues to retry the delegation until the delegate node finally catches up and accepts the delegation
If the sender node is behind, it waits until its block indexer catches up and then selects the coordinator for the new range
Coordinator node will continue to coordinate ( send endorsement requests and submit endorsed transactions to base ledger) until its block indexer has reached a block number that causes the coordinator selector to select a different node.
at that time, it waits until all dispatched transactions are confirmed on chain, then delegates all current inflight transactions to the new coordinator.
if the new coordinator is not up to date with block indexing, then it will reject and the delegation will be retried until it catches up.
while a node is the current selected coordinator, it sends endorsement requests to every other node for every transaction that it is coordinating
The endorsement request includes the name of the coordinator node
Each endorsing node runs the selector function to determine if it believes that is the correct coordiantor for the current block number
if not, then it rejects the endorsement and includes its view of the current block number in the rejection message
when the coordinator receives the rejection message, it can determine if it is ahead or behind the requested endorser
if the coordinator is ahead, it retries the endorsement request until the endorser catches up and eventually endorses the transaction
if the coordinator is behind, then it waits until its block indexer reaches the next range boundary and delegates all inflight transactions to the new coordinator
The consequences of this approach are
on range boundary, the throughput of the network dips
if range size is very small ( compared to the variance lag across all nodes' block indexing) then the entire contract could effectively grind to a stand still.
The guidance would be to configure n according to the expectation of variance of lag. e.g. if it is expected that some nodes might be 10 blocks ahead of other nodes at some points in time, then n should be in the order of 100 or even 1000.
NOTE: the submitter key selection algorithm is related to this (because we chose option 1: Coordinator submit for How is the submitter chosen from #327 ) but it is not directly coupled because it is possible for the same coordinator to switch submitter keys on a more frequent basis than the n sized ranges. Although that switch also comes with a flush and therefore throughout dip, the impact should be lower.
Why is this needed?
Currently, pente domain cannot handle concurrency of more than one active node at a time per private EVM contract. Otherwise, concurrent active nodes will assemble transactions that spend the same state(s) and will both refuse to endorse the other.
What would you like to be added?
In domains ( such as pente) where the endorsement rules are 100% endorsement and the spending rules for each state is that any party can spend the state, then we need to coordinate the assembly of transactions across multiple nodes.
The proposal for achieving this, is to implement a policy that complies with combination
2
+1
from https://github.com/kaleido-io/paladin/issues/327.To achieve this, it is important that we have an algorithm that allows all nodes to agree on which of them should be selected as the coordinator at any given point in time. And all other nodes delegate their transactions to the coordinator.
The desired properties of that algorithm are
Current proposal being pursued is:
n
consecutive blocks ( wheren
is a configuration parameter of the policy)n
blocks, the function will return a different ( or possibly the same) outputb/n
(where b is the block number andn
is the range size) rounded down to the nearest integer and feeding those into a hashring function.if the coordinator is behind, then it waits until its block indexer reaches the next range boundary and delegates all inflight transactions to the new coordinator
The consequences of this approach are
The guidance would be to configure
n
according to the expectation of variance of lag. e.g. if it is expected that some nodes might be 10 blocks ahead of other nodes at some points in time, thenn
should be in the order of 100 or even 1000.NOTE: the submitter key selection algorithm is related to this (because we chose option
1: Coordinator submit
forHow is the submitter chosen
from #327 ) but it is not directly coupled because it is possible for the same coordinator to switch submitter keys on a more frequent basis than then
sized ranges. Although that switch also comes with a flush and therefore throughout dip, the impact should be lower.Why is this needed?
Currently, pente domain cannot handle concurrency of more than one active node at a time per private EVM contract. Otherwise, concurrent active nodes will assemble transactions that spend the same state(s) and will both refuse to endorse the other.