CIP 32 - Attestation Node Incentives - Discussions

codyborn commented 3 years ago

CIP 32 https://github.com/celo-org/celo-proposals/pull/121 proposes introducing Attestation Service operation incentives via slashing. This is the first step toward a more discerning incentive system. As such, it's focused on a min uptime bar (one attestation over 11 days) and will not increase validator rewards. Future proposals will likely support more granular uptime measures and the potential for operators to earn additional reward.

aaronmboyd commented 3 years ago

A little bit devil's advocate here, but I'm not sure I agree with implementing slashing for downtime on a service ultimately unrelated to block proposals and consensus of the protocol. This is not an incentive, it's a disincentive.

I definitely prefer a reward rather than punishment model. Once there is a positive economic incentive wont validators have a real desire to keep their service online? (Especially as new users are increasing).

codyborn commented 3 years ago

Thanks for starting the discussion @aaronmboyd. I think this is a matter of perspective. One could view the existing validator reward to cover all services that a validator provides, including running an attestation service. In this case, by default the protocol assumes the operator is running the attestation service properly, and will only withhold some of the reward (via slashing) if it finds out later that this is not the case. To make it more explicit, we could cap the slashable amount to some limit per epoch OR instead of slashing from the LockedGold, we could reduce the block rewards by some capped amount. WDYT?

I went with this perspective since a majority of validators are running Attestation Services (96/100) and a majority are doing the job well. Making this the default case made sense to me, but I'm definitely open to other thoughts. I also believe that we should increase the block reward to those that run the Attestation Service exceptionally well. Since measuring this on-chain introduces more complexity, it was reserved for a future proposal.

nategraf commented 3 years ago

Just read through the proposal and have some comments

Attack on proposed slashing method

A validator that has a downed attestation service, or simply does not want to run one, could do the following in every slashing period.

Request 100 attestation requests (more generally, the number of validators with registered attestation keys), which guarantees exactly one request will request will be assigned to themselves when selecting issuers.
Fulfill the request by generating the required signature and posting it to the chain.

Note that the validator does not need to have possession of the phone number associated with the identifier they request against, nor does the identifier need to correspond to a known phone number and pepper.

This attack can prevent slashing for ~$5 spent every slashing period, about $15/month under the proposal.

Alternative to MinStakeSlasher

MinStakeSlasher seems like an extra step that should not really be necessary. Is there a reason we can't amend the core contracts (i.e. LockedGold.sol and Validators.sol to mark a group an ineligible, or otherwise reduce its number of electable validators, if the stake drops below the requirement as part of the slashing operation?

More generally, the MinStakeSlasher as proposed makes a validator slashable if their stake drops below the highest slashing penalty and reward (i.e. double signing). This still permits validators to operate with less than the normally required stake, making them less vulnerable to repeated slashing (e.g. if they participate in creating a fork of length greater than 1) and reducing the overall deterrence. Instead of allowing validators to remain elected with a stake less than the requirement, but greater than the single greatest slashing penalty, I would suggest we reduce the number of validators they can elect if they drop below the requirement, and encourage validators to maintain a stake in excess of the requirement as a buffer.

Profitability of running an attestation service

Under the current scheme, running an attestation service is not directly profitable. Validators are paid by the protocol around $75k per year, and it seems reasonable to state that this can be the incentive for running an attestation service, as it is a required part of being a validator under this proposal. If that's the reasoning, I think we should make it explicit.

One gap of slashing as the (dis)incentive is that it does not pave the way to independent attestation providers, and instead entrenches the position that only validators provide attestations. Enabling independent attestation providers has been discussed before, but I am not aware of any conclusion. It may be worth revisiting it now to decide whether we want to keep that option open or not.

Another note here is that a slashing penalty of 40 CELO per slashing period, about 120 CELO per month, may still less than the operational costs of setting up a node and any extra hours needed to keep it online. It's not crazy to imagine some validators deciding to lock up an extra 1360 CELO for a year's worth of slashing instead of actually setting up an attestation service.

codyborn commented 3 years ago

Thanks for the feedback Victor!

Attack on proposed slashing method

I agree. It also doesn't solve for the case where an Attestation Service is partially available but still able to meet the min bar. This is the first step toward more discerning incentives. If we do observe this behavior, we can propose a one-off slashing via a governance proposal. I'd like to see a future attestation incentive CIP take into account:

Attestation success rate (especially in sessions where other issuers' attestations have been solved). This is less likely to be a user-related failure if the user has entered codes from other issuers.
Users who have passed a re-CAPTCHA and DeviceID check. This makes it much harder to automate a script like you mentioned above. To achieve this, we would rely upon the Komenci flow to provide an on-chain attestation after performing this verification.

Alternative to MinStakeSlasher

Good point. We can create a util method similar to SlasherUtil.performSlashing() that slashes and will optionally forceDeaffiliateIfValidator if the stake drops below a min amount. With this proposal, it can also be expected that a validator will want to add buffer to their lockedGold to prevent from being deaffiliated.

Profitability of running an attestation service

I think most validators run an attestation service currently because they want to support the network and it helps differentiation when looking for votes. We see this with 96/100 validators currently running an Attestation Service with no explicit incentive. With slashing comes financial and reputation loss, which I think will make it fair for everyone since many validators have already put effort into running the Attestation Service well.

zviadm commented 3 years ago

Have you thought about a setup that just takes out misbehaving attestation nodes from the pool of nodes that can be selected as an issuer instead?

So instead of a slashing penalty, if a node is suspected to be failing attestations, it becomes blacklisted and blacklisting time can increase exponentially for subsequent failures.

Pros for this approach would be that we can be much more aggressive in determining what it means that node is failing or flaking attestations, because penalty isn't huge. And validators that aren't running healthy attestation nodes will just become blacklisted completely over time. (imo it is fine if only ~50-80% of validators run attestation nodes as long as they are healthy, instead of forcing everyone to run an attestation node and have them be more flaky).

codyborn commented 3 years ago

That's a really good idea to protect the user experience @zviadm. I think this mechanism can work in conjunction with other incentive mechanisms. Without the incentive of slashing or additional rewards, I worry that this alone won't encourage validators to run an Attestation Service. Today there is the foundation voting for AS operators as an incentive, but because some validators are not eligible, it won't apply to everyone and may not be a strong enough incentive. WDYT?

zviadm commented 3 years ago

That's a really good idea to protect the user experience @zviadm. I think this mechanism can work in conjunction with other incentive mechanisms. Without the incentive of slashing or additional rewards, I worry that this alone won't encourage validators to run an Attestation Service. Today there is the foundation voting for AS operators as an incentive, but because some validators are not eligible, it won't apply to everyone and may not be a strong enough incentive. WDYT?

is it a big issue if in this early stage only Foundation Voted validators are strongly incentivized to run an attestation node? i.e. if others don't want to run an attestation nodes, is it really a problem if they just stop running them?

in future, if attestation payouts themselves still aren't enough to incentivize running the node, I would be more in support to add additional pay mechanisms (e.g. diverting some of the cUSD rewards to non blacklisted attestation nodes, as additional validator rewards) instead of forcing people to run an attestation node as part of being a validator with penalties.

When people self-select to run an attestation node, it is easier to make sure and enforce things like everyone having both Twillio and Nexmo accounts (or whatever new messaging platform gets added), having them setup with proper phone numbers, and all the other extra work. It is very different kind of setup work from completing consensus so that is why i think it will be healthier if we promote running a node through rewards instead of penalties.

zviadm commented 3 years ago

Another example of positive enforcement: If we have separate rewards for attestations, later on, we can also adjust rewards based on successful attestations for each node. So now there is another incentive to make sure node runners are completing all the attestations, not just passing the minimum bar.

codyborn commented 3 years ago

@zviadm I agree that it'd be nice to untangle the incentives of running a validator and attestation service. This would also support non-validator attestation operators in the future. One downside is that it will require a hard-fork which will naturally push the timeline out a bit. Since today, the validator rewards are intended to cover operation of the attestation service, do you see any issues with reducing the validator reward and using this difference for rewarding healthy attestation service operation? For example, if validator rewards today are approximately $75k/year, we could reduce them to $60k and have $15k for attestation service incentives?

aaronmboyd commented 3 years ago

Have you thought about a setup that just takes out misbehaving attestation nodes from the pool of nodes that can be selected as an issuer instead?

So instead of a slashing penalty, if a node is suspected to be failing attestations, it becomes blacklisted and blacklisting time can increase exponentially for subsequent failures.

Pros for this approach would be that we can be much more aggressive in determining what it means that node is failing or flaking attestations, because penalty isn't huge. And validators that aren't running healthy attestation nodes will just become blacklisted completely over time. (imo it is fine if only ~50-80% of validators run attestation nodes as long as they are healthy, instead of forcing everyone to run an attestation node and have them be more flaky).

I think this is a great alternative. Especially if we additionally rewarded each completed attestation somehow, the economic incentives would be in place.

zviadm commented 3 years ago

@zviadm I agree that it'd be nice to untangle the incentives of running a validator and attestation service. This would also support non-validator attestation operators in the future. One downside is that it will require a hard-fork which will naturally push the timeline out a bit. Since today, the validator rewards are intended to cover operation of the attestation service, do you see any issues with reducing the validator reward and using this difference for rewarding healthy attestation service operation? For example, if validator rewards today are approximately $75k/year, we could reduce them to $60k and have $15k for attestation service incentives?

Splitting out portion of validator rewards to go to attestation service providers makes sense to me. There are probably many ways to setup the rewards for attestation service providers and most of them will probably work ok. Here is one way that I have been thinking about:

Distribute rewards every epoch. Decide on total rewards for all attestation service providers per epoch. (ex. if total attestation rewards per year is: ~15k * 100, would distribute 4.2k cUSD every epoch)
To calculate distribution for each attestation node:
- choose rolling window duration (e.x. 30 epochs)
- rewardsForNode = "successful attestations for this node during rolling window" / "total successful attestations during rolling window"
- assuming attestations are more or less randomly distributed, over a reasonable rolling window duration (like 30 epochs) it should average out reasonably well across nodes.

Pros of this setup:

incentives for completing as many attestations as possible
no penalty if everyone is missing attestations (i.e. some sort of a systematic failure, like a client side bug, not individual node operator failure)
incentives to run attestation nodes (i.e. if there are only 10 validators running attestation nodes, per node rewards will be huge, so it will be extra motivation for others to run an attestation node too)

Cons of this setup:

There can be some sybil attack scenarios where a malicious node operator could spam bunch of attestations and only complete their own. Thus inflating only their own completion numbers to get bigger share of the rewards.

I think some sort of sybil attack scenario will always exists (no matter the reward scheme), so we would have to depend on governance and kicking out/blacklisting bad actors manually.

codyborn commented 3 years ago

One of the questions that was brought up in the all-core devs call (by @zviadm I believe), was around the potential impact of improving the availability of attestation services. We'd like to know how much we'd need to improve the scores of individual attestations services to make a difference in user completion rate. I've put together this analysis here. The first graph shows the relationship between the failure rate of an Attestation Service (x-axis) and the percentage of users who abandon the flow (y-axis). Each dot is an Attestation Service. As expected, we can see a pretty decent connection of the two. The hypothesis being that getting paired with a poor AS introduces more friction, which leads to more user abandonment. Since a user gets paired with more than one AS, improving one AS node's availability should improve the abandonment for all other AS nodes.

In an attempt to isolate the impact of improving a poor performing issuer, I've also created the following graph: The graph shows a cumulative flow abandonment rate, starting from the best performing issuers (left) and progressively adding in the worst (right). For example, if we exclude every issuer with a failure rate above 30% (0.2 on the x-axis since it’s bucketed on the first decimal) then the flow abandonment rate is 16% (.16 on the y-axis). As we move right on the x-axis, we include more and more worse performing issuers into the calculation; hence the user abandonment rate gets worse. Based on this chart, we can see that if we can improve the failure rate of all issuers to be below 30%, then we can expect overall user abandonment to drop from 20.8% (today) to 16.4%. Failure rate for an issuer is 1 - rate of attestation code completion. This data is aggregated over the time period (12/16/20-01/16/21)

This shows that improving AS availability from what it is today is a worth-while effort; however we won't see significant gains in user completion until we can get all AS services performing in the 20-30% success range (median is currently 22.5%).

codyborn commented 3 years ago

@zviadm @aaronmboyd I'm currently investigating the feasibility of splitting the validator/AS rewards without a hard-fork. If it needed a hard-fork, it'd likely not be on mainnet until Oct 2021.

nategraf commented 3 years ago

@zviadm @aaronmboyd I'm currently investigating the feasibility of splitting the validator/AS rewards without a hard-fork. If it needed a hard-fork, it'd likely not be on mainnet until Oct 2021.

Could you highlight were a hard-fork would be required? So far I think the proposed changes could be implemented in the core contracts layer, but I could be missing something.

codyborn commented 3 years ago

@nategraf The only reason to have to hardfork is if we're hitting the gas limit for the distributeEpochPaymentsFromSigner call and needed to bump it up before adding more complexity. The gas limit for this call is 1MM per call (each signer gets its own call and 1MM gas). We don't have the gas usage instrumented yet in the blockchain client, however from the protocol unit tests running against ganache, I can see that the gas usage is ~ 100k, giving us sufficient space to work with.

mikereinhart commented 3 years ago

Hi - we (Polychain) are in favor of the direction this conversation has moved and support decoupling incentives for validators and attestation services. Thank you to those who have contributed to this discussion!

In particular, the analysis on attestation service performance and user abandonment is very informative because it both confirms the importance of improving attestation rates and helps set a target for attestation rates the protocol could incentivize. We're looking forward to seeing where this goes.

codyborn commented 3 years ago

Just sent out a big update to the Attestation Service Incentives design. Appreciate all of the feedback that went into this. @zviadm a lot of this is riffing on your ideas and if you're open to it, I'd love to make you co-author. https://github.com/celo-org/celo-proposals/pull/161/files?short_path=8d43ce6#diff-8d43ce6ce22e7ebe262974018cb09de0113944f0a54d2af3af073a1183d689d9

zviadm commented 3 years ago

Just sent out a big update to the Attestation Service Incentives design. Appreciate all of the feedback that went into this. @zviadm a lot of this is riffing on your ideas and if you're open to it, I'd love to make you co-author. https://github.com/celo-org/celo-proposals/pull/161/files?short_path=8d43ce6#diff-8d43ce6ce22e7ebe262974018cb09de0113944f0a54d2af3af073a1183d689d9

happy to help. I just skimmed through the updated doc, and got two quick notes:

authorizeAttestationSigner can not be called with address(0) because you need to have ProofOfPossession to authorize an address. This is a feature request though in general, there should be a way to "clear signer" without having to create a new dummy key to authorize.
How difficult would it be code wise to adjust payout to be: AttestationServiceRewardPercentage * ValidatorPayout * "N of elected validators" / "N of attestation serving validators" ? instead of just: AttestationServiceRewardPercentage * ValidatorPayout. If it is not too difficult to add N of elected validators / N of attestion serving validators that would make payout more dynamic right away and solve the issue of potentially not having enough validators who run attestations.

codyborn commented 3 years ago

Good catch on the PoP. I just realized we already have a method removeAttestationSigner() which sets the signer to address(0): https://github.com/celo-org/celo-monorepo/blob/d07b9267115255d03174a218d31dee8d29b473b1/packages/protocol/contracts/common/Accounts.sol#L317

My main hesitancy of automatically increasing the reward potential when less AS are running is that it provides a direct incentive to bring down the scores of others due to the deregistration feature. Once we have better ways to detect legitimate requests on-chain, then I think this dynamic payout makes sense. We can improve the signal by having Komenci include on-chain attestations when a user passes the reCAPTCHA and device checks in addition to only affecting an issuer's score when a user completed all attestations except for the issuer in question.

codyborn commented 3 years ago

cc @mcortesi who had some ideas around an alternative design

codyborn commented 3 years ago

It seems like we've reached a good point with CIP32 thanks to all of the feedback. You can find the latest version of the spec published here. Does anyone have concerns with moving forward with the implementation and subsequent CGPs? cc @zviadm @nategraf @aaronmboyd @mikereinhart @aslawson @asaj @nambrot

YazzyYaz commented 3 years ago

Hi @codyborn it still needs to be discussed in the All-Core Dev Call and other community members would need to weigh in before we move it from Draft stage to Last Call (review period). Once it finishes the review period, then it's accepted as a standard and then implementation can proceed. Reason for this is because this is a distributed community and folks would need an opportunity to weigh in given this impacts rewards, etc. I think it's a good proposal fwiw.

codyborn commented 3 years ago

Makes sense @YazzyYaz. I see we have a Core Devs 5 this Thursday. Do we have an agenda yet?

YazzyYaz commented 3 years ago

Yeah your CIP is on it :) https://github.com/celo-org/celo-proposals/issues/164

Good call out though, I need to add the agenda to the calendar.

devme25 commented 3 years ago

I feel that we should study some metrics to calculate what the actual rewards breakdown would be. ie. the 20% (attestation) + 80% (validation) breakdown. I feel it lies in a matrix around

How much work is involved to validate vs doing attestation
How important is attestation for Celo and Celo users

I think in the early days attestation will be a crucial part to onboard users as we want to have a great onboarding experience but after a few years, it might change.

nategraf commented 3 years ago

I left this comment on the PR before, but I am not sure if anyone saw it and would like a bit of discussion on the idea.

I have a gut feeling that there are too many parameters in the proposed design. It's just an intuition, but I figured I'd propose an alternative that might be a bit of a simpler direction.

With the target completion rate, I am concerned the the steady-state completion rate may be either lower or higher for well-performing nodes. In the lower case, which would be true if we see an uptick in spurious requests or requests from a country we can't actually reliably deliver SMS, the result will be that validators will see their pay reduced without recourse. In the higher case, the system loses any pressure to continue raising completion rates until a governance proposal is passed, which may take time.

As for the global circuit-breaker, we lose the incentive for nodes to be more resilient than average such as by taking action to setup failover SMS providers. It's kind of the opposite of how Eth2 slashing works where the penalties are designed to disincentivize coordinated failure. We do need to have the system not overly punish validators for events outside their control, such as natural disaster in a given region, but we should create an incentive for validators to overcome those difficulties where possible. (In the extreme case, a validator could decide to shut down it's attestations API whenever the circuit breaker is on, as it will no longer have an ill effect on their rewards)

My alternative proposal is to calculate the completion rate on each epoch, then make the threshold for full rewards be a fixed difference from average. For example, if the completion rate for an epoch is 70/100 and the threshold difference is 0.1, any validators with more than a 60% completion rate will get full rewards. Below that I would suggest the rewards be a linear function of the completion rate in relation to the threshold, as is currently proposed. In this example, a validator with 30% completion for the epoch will receive 50% of the max rewards. When the threshold difference is greater than 0, it is possible for all validators to receive full rewards, but there is still pressure to push their individual completion rate up to ensure it is safely above the threshold. It would also be possible to set the difference below 0 to increase the pressure to raise completion rates, but then it would be impossible for all validators to receive full rewards.

In the scenario of an adverse external event (e.g. a regions only telecom failing), overall completion rates may drop, but the rewards metric will have no delay in adjusting its target. Imagining that an outage or bug causes an overall 30% completion rate, the threshold would be 20% allowing validators to receive full rewards while providing an incentive to be as resilient as possible. It also adjusts well in cases that the overall completion rate goes up as an overall completion rate of 90% would set an 80% threshold.

Many of these points come more from a place of intuition than any drawn-out analysis, so I'd love to have some discussion of the pros and cons.

YazzyYaz commented 3 years ago

@nategraf in regards to leaving comments on PRs, I try to recommend folks to just comment on the Issue ticket since PRs get merged everytime and comments get lost (or rather are harder to find again)

codyborn commented 3 years ago

Hey @nategraf, apologies for missing the previous comment in the PR. I definitely think this idea is worth discussing, especially in light of the recent Twilio and Valora 1.11 failures. What I like about this proposal is that the dynamic threshold automatically accounts for failures outside of the AS operator's control. This can obviate the need for the baseline measurement and fallback reward mechanism. The concern that I have with this approach is that it introduces some incentive to negatively impact other AS's score. By making incomplete requests to other AS and completing them when I randomly hit my own, I can drop the avg for each epoch and guarantee the max payout of my AS. Given how easy this attack is to perform, I was hesitant to add any "competitive" incentive. That being said, I think this problem is still present with the existing proposal, although it's a little harder to achieve since you can't push others' score down to lift your score up.

As for the global circuit-breaker, we lose the incentive for nodes to be more resilient than average such as by taking action to setup failover SMS providers.

In this PR, I'm proposing changing the fallback mechanism slightly which I believe will address this concern. In the event of the fallback rewards, rather than all validators receiving a reduced reward, it just sets a cap on the reward loss. A validator with a resilient (an uncorrelated) setup, can still achieve 100% rewards.

zviadm commented 3 years ago

Hey @nategraf, apologies for missing the previous comment in the PR. I definitely think this idea is worth discussing, especially in light of the recent Twilio and Valora 1.11 failures. What I like about this proposal is that the dynamic threshold automatically accounts for failures outside of the AS operator's control. This can obviate the need for the baseline measurement and fallback reward mechanism. The concern that I have with this approach is that it introduces some incentive to negatively impact other AS's score. By making incomplete requests to other AS and completing them when I randomly hit my own, I can drop the avg for each epoch and guarantee the max payout of my AS. Given how easy this attack is to perform, I was hesitant to add any "competitive" incentive. That being said, I think this problem is still present with the existing proposal, although it's a little harder to achieve since you can't push others' score down to lift your score up.

As for the global circuit-breaker, we lose the incentive for nodes to be more resilient than average such as by taking action to setup failover SMS providers.

In this PR, I'm proposing changing the fallback mechanism slightly which I believe will address this concern. In the event of the fallback rewards, rather than all validators receiving a reduced reward, it just sets a cap on the reward loss. A validator with a resilient (an uncorrelated) setup, can still achieve 100% rewards.

As I mentioned in my original comment, I also still prefer dynamic rewards based on completion rate out of total completions. I personally don't think malicious behavior is something to really worry about. First, there is economic cost to malicious behavior because you have to pay for attestation fees, so it isn't just free-for-all. Second, there is huge reputation and potential Governance based action cost to doing something like this.

My personal preference would still be for simplest solution for dynamic rewards. I.e. something as simple as:

Have total reward pool per period.
Calculate total completed attestations across all validators for that period.
Per validator rewards would be: 'validator completion / total completion * total reward pool'.

Period can be 1 epoch for code simplicity. But ideally, it would be 10 or 15 epochs. That would be better conceptually, but probably not worth the extra complexity.

codyborn commented 3 years ago

First, there is economic cost to malicious behavior because you have to pay for attestation fees, so it isn't just free-for-all.

In the case of running Valora on an emulator, the cost is a reCATPCHA (around $.002 to solve based on online abusive marketplaces). We have work slated for this milestone to add device checks, which should increase the costs of attack. The minimum cost will always be $.5/attestation if an attacker decided to not use Komenci. Let me do some attack EV analysis on both proposals (@nategraf and @zviadm) and we can compare.

Second, there is huge reputation and potential Governance based action cost to doing something like this.

I agree that the cost is a significant deterrent. My concern is that it will be difficult to detect this abuse and even harder to point blame on the attacker.

zviadm commented 3 years ago

I agree that the cost is a significant deterrent. My concern is that it will be difficult to detect this abuse and even harder to point blame on the attacker.

Will this be really that hard to detect? We already have stats for attestation completions and averages and standard deviations. If there is one group that has significantly better performance than everyone else, it will be natural to ask them to share their setup so rest of the validators can also achieve much better perf. If that group is achieving better numbers through malicious tactics, i feel like that would become suspicious very easily, and it won't be all that hard to investigate either. (i.e. you find that everyone else is getting same rejected requests, except for one specific group).

Also isn't just about cost but also benefit of it. If someone is rigging the system to just get ~5% more of the rewards, that is just extremely unlikely to be worth the cost. Both because of complexity and also because you are still very likely to get caught and get rejected not just from Celo community but in general from validator community.

And if someone attempts to get +50% of the rewards or something large like that, that would stick out really easily from very simple statistical analysis and become suspicious immediately.

codyborn commented 3 years ago

I don't imagine that an attacker would make it so obvious that their validator would stand out from the rest. They can randomize the requests and spread the success across other validators to do better a majority of the time. I agree that the relatively small gain isn't worth potentially getting caught and the effort is better invested into the uptime of the service. Let me run some numbers to see how the attack would play out with your proposal vs @nategraf's and we can go from there.

codyborn commented 3 years ago

Okay, I threw together two rough models:

Dynamic Target Proposal - Similar to the existing proposal but with a dynamic target completion rate. I used a simple dynamic completion rate by multiplying a constant (eg. .6) by the avg completion rate.
Shared Pool Proposal - Per validator rewards would be: 'validator completion / total completion * total reward pool'

Given that there are multiple variables, it's hard to visualize all at once, but feel free to make a copy and adjust the parameters (highlighted) to see how it affects the attack ROI.

Some observations:

The attacker's organic (starting point) standing has an impact on the Dynamic Target Proposal but not the Shared Pool Proposal.
The Shared Pool Proposal is very sensitive to the cost of attack.
Both models will be impacted by the attacker's validator count (more validators = more profitable).
As the organic total number of requests increases, both become harder to attack.

I think either approach will work, but I'm inclined to go with the the Shared Pool Proposal due to the uncertainty around the true cost of an attestation as well as the fact that the model provides a small bounded ROI in the event of a successful attack. WDYT @zviadm @nategraf?

Also, if you'd like to make any corrections to the models please let me know.

nategraf commented 3 years ago

First thoughts are that I think both of these methods would work well, and I'd support either. I'll make some time tomorrow or this weekend to dive into some more analysis of each and see if I have any other thoughts.

I like the "shared pool" scheme quite a bit. It is indeed very simple!

Additionally it provides the advantage that it will always provide an incentive for validators to improve their setup further, even if they are already at the top of the ranking.

The shadow of this advantage is that there is always potential gain for validators inflating their completion rate, meaning a validator with a well running service may still choose to game the system. (As opposed to capping the payout, which makes it so validators will no longer have that incentive when they are in the top ranks)

zviadm commented 3 years ago

Okay, I threw together two rough models:

Dynamic Target Proposal - Similar to the existing proposal but with a dynamic target completion rate. I used a simple dynamic completion rate by multiplying a constant (eg. .6) by the avg completion rate.

Shared Pool Proposal - Per validator rewards would be: 'validator completion / total completion * total reward pool'

Given that there are multiple variables, it's hard to visualize all at once, but feel free to make a copy and adjust the parameters (highlighted) to see how it affects the attack ROI.

Some observations:

The attacker's organic (starting point) standing has an impact on the Dynamic Target Proposal but not the Shared Pool Proposal.

The Shared Pool Proposal is very sensitive to the cost of attack.

Both models will be impacted by the attacker's validator count (more validators = more profitable).

As the organic total number of requests increases, both become harder to attack.

I think either approach will work, but I'm inclined to go with the the Shared Pool Proposal due to the uncertainty around the true cost of an attestation as well as the fact that the model provides a small bounded ROI in the event of a successful attack. WDYT @zviadm @nategraf?

Also, if you'd like to make any corrections to the models please let me know.

Nice models. I think there might be a bit of a mistake in "shared pool" calculation. Reward increase formula looks like: =$B$7*(D21*($B$1/$B$9)+$B$10)/($B$1*$B$2)*$B$4 I think it should be something like: =$B$7*(D21*($B$1/$B$9)+$B$10)/($B$1*$B$2 + $B$10)*$B$4 notice the extra +$B$10, this is because when attacker is issuing more attestations, that increases denominator for everyone. So total gain is capped. (i.e. even if you spend 1 million $s on an attack you will never be able to take more than the total rewards itself)

codyborn commented 3 years ago

Good catch @zviadm - updated the formula. I think we can make either approach work with some thought in how we set the parameters. If we go with the Shared Pool Proposal we can set a smaller Attestation Reward Percentage (ex. 10%). This would put more pressure to be honest, since a rational validator would be less likely to risk their total reward for a smaller potential gain.

mikereinhart commented 3 years ago

It has been really great to watch this solution evolve over time, thank you to all contributors. A general framework I've used to evaluate options:

All validators should be able to receive full rewards. We shouldn’t have a system that necessarily causes some validators to essentially be slashed by not having full reward achievable.
ASs should be incentivized to perform as best they can under all conditions, including:
- Massive failure across the network
- High performance of their AS
Dynamic targets are preferred as they require less manual intervention when motivating ASs to perform well. Otherwise governance must be used repeatedly used to alter target rates.

I'm generally supportive of both the Shared Pool and Dynamic Target options presented. At this time I have a slight preference for the Dynamic Target Proposal because I feel it accomplishes my 1st point above better due to greater guarantees of full, consistent, more predictable rewards for all validators. The Shared Pool model seems to set up a system of more dynamic rewards, where in extremes, some validators could receive significantly more rewards than the current allocation. This could serve as a great motivation to get back online during system wide outages, but this seems covered adequately by the Dynamic Target as well.

jmrossy commented 3 years ago

RE the two competing models, I have not thought about this deeply but my intuition is that Shared Pool may be a slightly better fit. It's simpler and it potentially incentivizes validators to invest in making their attestation services work very well.

There's a semi-related idea that's been suggested about buying users extra attestations during verification (e.g. give them 6 but they only need 3). In a Shared Pool model, the validators would be competing to deliver the codes as fast and reliably as possible to maximize rewards.

That said, given that SMS delivery is such a tricky problem for some locales, I'm sympathetic to validators not wanting to tackle it. This could lead to some validators giving up attestations entirely, leading to more centralization over time.

nategraf commented 3 years ago

I created a Colab notebook with a simulator to try to shed some more light on the performance of these two models. It could definitly use additional evaluation, but it should help visualize the rewards as a distribution and understand this better as a probabilistic system.

A couple of key takeaways for me so far have been:

The expected profit from attack for the shared pool reward function is a proportional to the number of attacker requests. The slope of the expected profit function is dependent on how many validators the attacker controls and total amount of rewards in the pool, with the slope being negative below a certain threshold and positive above. You can view the last simulation to play with the numbers, but there is essentially a cap on amount of rewards that can be distributed with the shared pool function. Below this cap, the attack is almost never worth the cost, and above the cap it is almost always worth the cost, and may motivate an economically rational attacker to purchase a very large number of requests.
The dynamic target function will always pay out rewards at a rate slightly below 100% on average, even to well performing validators. This is because given the small number of requests that hit any given validator in a given day, they will have "bad days" at a certain rate where they have a completion percentage much lower than their long-term "true" completion rate. This could be smoothed out by expanding the window to cover a sufficient number of days, or by applying a statistical test to estimate the range their "true" completion rate falls in and rewarding them based on an upper value in this range (i.e. giving them the benefit of the doubt when they received a small number of requests in a given day). Or we could simply raise the payout by some small amount (e.g. 10%) to make the expected reward what we expect.

I am hoping we will be able to draw more insights from this simulator approach. I don't have any more time to look into this right now, but anyone should feel free to make a copy and investigate further.

https://colab.research.google.com/drive/114fnqUlFEioKALvr5_W1bOk3wWtD-5ji#scrollTo=8FkGLDzvhGfB

celo-org / celo-proposals