Finschia / ostracon

Ostracon, a consensus algorithm, is forked from Tendermint Core. We have added VRF to Tendermint BFT. It adds randomness to PoS Validator elections and improves security.
Apache License 2.0
70 stars 28 forks source link

Random Election using Stake Amount as Discrete Distribution #22

Closed torao closed 4 years ago

torao commented 4 years ago

The scheme of selecting a Proposer and Validators based on PoS can be considered as random sampling from a group with a discrete probability distribution.

Random Sampling based on Categorical Distribution

For simplicity, here is an example in which only a Proposer is selected from candidates with winning probability of p_i = s_i / S.

First, create a pseudo-random number generator using vrf_hash as a seed, and determine the threshold threshold for the Proposer. This random number algorithm should be deterministic and portable to other programming languages, but need not be cryptographic.

val rand = new Random(vrf_hash)
val threshold:Long = (rand.nextDouble() * S).toLong

Second, to make the result deterministic, we retrieve the candidates sorted in descending stake order.

val candidates:List[Candidate] = query("SELECT * FROM candidates WHERE stake > 0 ORDER BY stake DESC, public_key");

Finally, find the candidate hit by the arrow threshold of Proposer.

var proposer = candidates.last
var cumulativeStakeAmount:Long = 0
for(c <- candidates){
  if(cumulativeStakes <= threshold && threshold < cumulativeStakes + c.stake){
    proposer = c
    break
  }
  cumulativeStakes += c.stake
}

This is a common way of random sampling according to a categorical distribution by using a uniform random number. Similar to throwing an arrow on a spinning darts whose width is proportional to the probability of each item.

Untitled (1)

Selecting of a Consensus Group

By applying the above, we can select a consensus group consisting of one Proposer and V Validators. This is equivalent to performing V+1 categorical trials, which is the same as a random sampling model with a multinomial distribution. It's possible to illustrate this notion using a multinomial distribution demo I created in the past. This is equivalent to a model that selects a Proposer and Validators when K is the number of candidates and n=V+1.

As an example of intuitive code, I expand categorical sampling to multinomial.

val thresholds = new Array[Long](V + 1)
for(i <- 0 until thresholds.length){
  thresholds(i) = (rand.nextDouble() * S).toLong
}

var cumulativeStakes:Long = 0
val winner = new Array(thresholds.length)
for(c <- candidates){
  for((t, i) <- thresholds.zipWithIndex){
    if(cumulativeStakes <= t && t < cumulativeStakes + c.stake){
     winner(i) = c
    }
  }
  cumulativeStakes += c.stake
}
val proposer = winner(0)
val validator1 = winner(1)
...

In the above steps, a single candidate may assume multiple roles. If you want to exclude such a case, you can remove the winning candidate from the candidates. In this case, the thresholds must be recalculated because the total S of the population changes.

Computational Complexity

  1. typical sort algorithm: O(N log N), where N is the number of all candidates.
  2. generating random numbers: O(V + 1).
  3. winner extraction: O(N × (V+1)), is the worst case. In many cases, loops can be interrupted.

The computational complexity is mainly affected by the number of candidates N. There is room for improvement by remembering the list of candidates that have been sorted by the stake.

zemyblue commented 4 years ago

This is good suggestion. 👍 But the computational calculation is very higher if there is many validators.

In the above steps, a single candidate may assume multiple roles. If you want to exclude such a case, you can remove the winning candidate from the candidates. In this case, the thresholds must be recalculated because the total S of the population changes.

As above your mention, the selected validator is less than V+1. In this case, is the process of this selecting validators repeated?

torao commented 4 years ago

But the computational calculation is very higher if there is many validators.

The amount of calculation depends on V, but it consists only of simple addition, random number generation and conditional branching (rather than hash computation). So I roughly expect the cost to be modest, e.g., less than 100ms for V ≤ 10^5.

In this case, is the process of this selecting validators repeated?

Yes, it does. If we follow the case that the configuration doesn't allow a node to have multiple roles, 1) it will remove the selected validator from the candidates' list, 2) recalculate S, and 3) find the next winner, repeat these V times. Therefore, the cost of step 2) is added, the computational complexity of the above will be about O(N×V) + O(N×(V+1)) ≒ O(2N×V) in worst case.

zemyblue commented 4 years ago

OK

And how about using hash function like sha256 instead of random function.

torao commented 4 years ago

I think it mainly depends on its speed. And there are several considerations to using the cryptographic hash function.

zemyblue commented 4 years ago

My concern is that the random function can vary from platform to platform and program language and compile version. (https://en.wikipedia.org/wiki/Random_number_generation) So I suggest using the hash function.

egonspace commented 4 years ago

Random sampling is better than my proposal because of less calculation. I have two comment about this proposal.

Duplicate election

As my proposal suggests, there should be consideration of equity in the case of a duplicate election. If a participant is elected twice in a round with a lot of staking, it will be as disadvantageous as a reward as a person who splits as much as possible and runs multiple nodes. * 10000staking-1node rewards < 10staking-1000node rewards in during height h1~hn

So, in this case, duplicate elected validator must have some additional rewards, and additional elected validator should receive a part-reward for the validation job. (Please refer https://github.com/line/tendermint/issues/17)

I'll think more about the reward policy and then make a separate issue.

A way to reduce computations

I think we don't need to calculate this at every round.

for(c <- candidates){
  if(cumulativeStakes <= threshold && threshold < cumulativeStakes + c.stake){
    proposer = c
    break
  }
  cumulativeStakes += c.stake
}

Validators order is fixed for a while if any staking tx is not executed. So if we store following data, we can verify some proposer or validator at receiving proposal or vote easily.

order candidate staking position
1 c100 10000 0, 10000
2 c068 9000 10000, 19000
3 c021 8000 19000, 27000
... ... ... ...
78 c010 1 99999, 100000

Total staking is 100000. This table is updated when staking tx is executed.

I am a validator and someone propose a block. I know the threshold and his position, so I can verify whether he is the right proposer or not without whole calculating and sorting.

torao commented 4 years ago

@zemyblue

My concern is that the random function can vary from platform to platform and program language and compile version.

The PRNG mentioned here means an algorithm defined by LINK 2 Network as its specification (rather than languages or external libraries such as libc). Typical PRNG algorithms, such as Xorshift, LCGs, and MT, produce the same results on all platforms, languages, and compiler versions as long as the algorithm and its parameters are the same.

zemyblue commented 4 years ago

If a participant is elected twice in a round with a lot of staking, it will be as disadvantageous as a reward as a person who splits as much as possible and runs multiple nodes.

But, it is not disadvantageous, because the voting power is set by the staking value of user. one node is not one voting power. Therefore, rewards is given in proportion to the staking value.

But I'm also thinking about what it's like to give them the right to vote as many as they're elected. It means if the user to have 10 voting power is elected 2 times, the user's voting power is 2. But I am not sure this is good. I need to simulate more.

torao commented 4 years ago

@egonspace

Duplicate election

This proposal doesn't point out the incentive scheme, so it needs to do separately. We'll discuss this in #17.

A way to reduce computations

That suggestion is useful. It will be able to reuse the results of the categorical distribution until the next stake transaction issued. If the stakes don't change often, we may be able to use an algorithm such as a binary tree, R-Tree or B-Tree instead of a linear search. In this case, 3.winner extraction will be O(V log N).

egonspace commented 4 years ago

A problem with this algorithm was raised in the last open session.

Random election cannot protect for some candidate to be a validator having voting power over staking or over 1/3 of total extremely.

torao commented 4 years ago

Here a few points.

Strictly speaking, getting one node "accidentally" 1/3f+1 is a slightly different issue from getting distributed Byzantines "accidentally" 1/3f+1 of validators. However, the former has a negligible probability compared to the latter, which will occur with an apparent frequency. Therefore, I believe it is reasonable to treat these as "accidental (expected) BFT Violation Problem" in a common manner.

6240571688222720

torao commented 4 years ago

I'm writing the results of PoC at Wikipage. However, it's not able to upload any images on Wikipage, so attach them to this ticket and use that URL.

5984624369729536 (1) 5487301516591104 (1) 6060332446121984 (1) 6480916430716928 (1) 5445594917896192 (1) 4924536196431872 (1) 5637788396158976 (1) 5467658500440064 (1) 5421674416308224 (1) 5648854043852800 (1) (1)