Finschia / ostracon

Ostracon, a consensus algorithm, is forked from Tendermint Core. We have added VRF to Tendermint BFT. It adds randomness to PoS Validator elections and improves security.
Apache License 2.0
70 stars 28 forks source link

[Proposal] Reward allocation design according to Stake ratio #80

Closed torao closed 4 years ago

torao commented 4 years ago

TL;DR

Reward allocation algorithm in Random Sampling without Replacement (non-duplicate elections) with a compensatory model of opportunity loss.

This means that the reward up to the k-th selection are obtained by random simulations and the rewards that fail to participate in selections after the k-th selection are obtained in terms of probabilistic expectations.

Problem Setting

What we want to do: Creates a randomly selected subset from a set. In this case, the elements of the set have weights that represent the probability of being selected. Candidates who are selected will be rewarded. we want this reward to be proportional to their weight.

On this page, I'm considering a way to select exactly a given number n of elements (Voters) from a set (Candidates) with a probability corresponding to their weight (Stakes). And we also want the reward of each Candidate to be exactly proportional to its weight.

  1. Sampling with replacement: An intuitive way to select exactly n elements randomly from a set of N is to repeat random sampling from the set until the selected subset becomes n. However, this way isn't guaranteed to finish in constant time because the same element may be repeatedly selected. This becomes a problem of liveness in distributed systems.
  2. Sampling without replacement: It's possible to prevent selecting the same element repeatedly by excluding the selected element from the set. However, as the number of elements in the set decreases, the probability of winning the remaining elements increases. For example, in a case with a biased probability of winning, the element with a higher probability of winning is selected earlier, so the one that has a relatively lower probability of winning tends to be higher than the actual bias. This is a problem of fairness.

The purpose of this page is to lead a formula that solves the problem of fairness in 2.

The problem in 2 is that the total Stake in Candidate set becomes smaller as the Voter selection progress, so the probability of being elected by a Candidate who is unlikely to be elected, i.e., a Candidate with lower Stake, becomes a higher probability than the Stake it holding. In fact, as shown in Fig 1, in case we repeat the election that elects 25 Voters from a set of 100 Candidates with inverse bias in Stake, if all 25 Voters acquire the same reward in one election, then the Candidate with the higher Stake will acquire a lower rate of the overall reward distribution.

Fig 1. The ratio of Stake that each Candidate holding and the reward that each one earned in non-duplicate case. In the service requirements, the two should be aligned.

Perspectives and Formulas

I assumed that the underlying cause of the reduction of reward for Candidates who are more likely to be selected is the loss of opportunity due to their inability to participate in selections after selected in the round. In other words, adding the number of elections it has already been elected to and couldn't participate in, and the expected amount of its income, to its reward, would seem to be a fair distribution.

More precisely, the increase in the probability of winning is due to a decrease in the total Stake in the Candidate set as the result of the elected Voter drops out of the Candidate. Therefore, the chance of being elected is getting higher in later elections than in the first.

Let α be the base reward and si be the Candidate i's Stake holdings, The i's expectation reward for a single selection can be expressed as follows:

2020 06 22-11_46_18

where S is the total Stake holdings of the candidates participating in that election.

Expected reward in duplicate case

(You can skip this section because this is an explanation as comparative.) I notate S=S=∑isi if all of Candidates participate in the all n selection (i.e., if there is a possibility of multiple winning for a selection) because the S is a fixed value for all selection. Thus, the expected reward for n selections from a Candidate set can be simply added up and expressed as follows:

2020 06 22-11_47_31

Due to the nature of the multinomial distribution, repeating n=1 100 times, repeating n=25 4 times, and doing n=100 once all yield the same expected reward.

Although this case can work simple and fair like Fig.2, it doesn't always result in n Voters being selected n times due to duplicate elections. Thus, the Byzantine assumption may be weakened by a smaller number of Voters than assumed. And due to the possibility of consecutive duplicate selections, it's not possible to guarantee the end time even if the selection is repeated until a certain number of Voters are selected.

5174790066012160 Fig 2. Reward distribution ratio in case of the duplicate winning scheme.

Expected reward in non-duplicate case

By excluding the selected Voters from the Candidates, exactly n Voters can be selected in n elections (if N≥n).

However, as the number of Candidates decreases, the total amount of Stake in Candidate set decreases, and the probability of being selected increases in later elections. This would result that each total reward amount will not proportional to the number of Stake holdings. The trend is that Candidates with more Stake holdings are selected earlier and those with fewer holdings are less likely to be selected so that Candidates with more Stake holdings are accounted for less compensation. This can be seen from the graph in Fig. 1.

There are two reasons why such a difference occurs.

  1. If a Candidate is selected for the k-th selection, it will not receive a reward equivalent to the remaining n−k selections.
  2. As the total Stake holdings S of the Candidate set decreases as the number of Candidates decreases, the expected reward α¯i for Candidate i fluctuates with the progress of the selection.

Introduce the concept of reward expectation

Assume a scratch lottery to win $1M for a particular person. If 100,000 people participate in this lottery, the expected reward per person is $1,000,000÷100,000 = $10 (eliminating the gambling nature as a pastime, it's worth buying the scratch lottery if you can buy it for $9, but it isn't if it's $11).

This also means that if you have the scratch ticket but don't open, you suffer the $10 opportunity loss. So if you couldn't participate in the lottery for some unavoidable reason such that the numbers weren't written of print error, your opportunity loss is $10. However, in a very rough way, it could say that if you get the expected reward of $10 as compensatory of the opportunity loss, the distribution of worth with other participants will be fair.

The point that participating in the lottery itself is worth $10, regardless of whether you win and get $1M or miss out and get $0.

Opportunity loss compensation model

The following example Fig 3 represents an election that selects four Voters, i,j,k,l, out of a set of N Candidate set.

2020 06 22-11_50_05 Fig 3. An election to select four Voters from a set of N Candidates.

  1. In the first selection, the winner i is given a reward α.
  2. In the second selection, the winner j is also given a reward α. There are two important points here.
    1. First, the total amount of Stake S2 of the second selection has decreased by si due to the departure of i in the first selection, and the winning probability px,2 of the remaining Candidates increase.
    2. Second, the first winner i doesn't participate in the second selection. This means that i is losing the opportunity to get a reward of 2nd selection. If i were left in the Candidates, the expected reward for the second selection would have been α¯i,2=αsiS2. In other words, the amount of opportunity loss that i suffered due to not being able to participate in the 2nd selection can be said to be α¯i,2.
  3. Similarly, in the third selection, the total amount of stake in the Candidate set is S3=S1−si−sj, the winner k is also given a reward α, and i and j suffer an opportunity loss in an amount equal to α¯i,3=αsiS3 and α¯j,3=αsjS3, respectively.
  4. It is the same for the 4th selection, but note that the last selected l has all 4 selection opportunities, so it doesn't have any opportunity loss.

This model is based on the hypothesis that we can make it fair by compensating for lost opportunity rewards as expected reward that would have ben probabilistically obtained i it had participated din the selection.

In summary, a Candidate i who is selected for the k-th selection should be rewarded with the following:

2020 06 22-11_51_46

where Sˆr is the total stake of the Candidates selected from r=1 to k−1.

Result

Here, I show the reward distribution by the simulation. The amount of Stake held by each Candidate and the reward for them are expressed as an overall ratio.

6632103138295808 Fig 4.1. Stake distribution with inverse bias.

6221220796956672 Fig 4.2. Stake distribution with linear bias.

5731319647305728 Fig 4.3. Stake distribution with a flat.

6225672463450112 Fig 4.4. A case where Stake is largely ×100 more biased toward one Candidate.

Fig 4.x shows the result of simulation in which only the base reward for winning the election and the opportunity loss is compensated by the expected reward in addition to the base reward in an election in which 25 Voters are selected from 100 Candidates. For 10000 trials, The base reward-only allocation (red points) results in lower distributions for Candidates with high Stake ratios, and higher distributions with relatively low Stake ratios. It can be seen that the allocation compensated by the expected reward behaves according to the Stake ratio.

Note that in the case where Stakes are flat in Fig 4.3, this shows a fair distribution even without correcting. In other words, the unfairness of bias is an inherent problem of weighted random sampling.

5221767277445120 Fig 5.1. The number of Candidates is very small and close to the number of Voters.

Fig 5 shows the case where the number of candidates is small and the number of candidates is close.

Consideration

This simulation uses floating-point arithmetic. When the unit of reward is an integer, it's necessary to ensure that the results are not biased by rounding. The total amount of reward α×n+δ paid in a single election will vary from election to election. However, if S1 is very large with respect to si, then δ will be a small value.

References

torao commented 1 year ago

I've noticed this problem of inequality of opportunity due to random sampling without replacement is essentially the same as the Monty Hall problem, which is a famous example of the posterior probability of Bays Theorem. A more theoretical explanation may be possible if we consider this problem as a type of Bayesian statistics.