chainflip-io / chainflip-backend

The Chainflip backend repo, including the Chainflip Node and CFE.
52 stars 15 forks source link

[SC-2863] Failing to produce a block should deduct Online Credits / Rep #1048

Closed morelazers closed 2 years ago

morelazers commented 2 years ago

Describe the changes you'd like

Currently there's an Active Validator on Soundcheck not producing blocks.

We should be punishing this, but instead we're giving them Online Credits and Reputation for submitting Heartbeats 😅

Validator ID: 5CX9KrU7aFS5jM67qh7yuXZzhBimGpCrBJuTAQssBGXMvDzD

Does this issue have a Shortcut Story?

Not yet.

Implementation

We can punish inactive block producers by including a reward for block authorship. This way, missing an aura slot has a real (and proportionate) cost to the misbehaving node.

The way to do this would be in an on_initialize hook (probably in the rewards pallet) that pulls the block author from pallet_authorship and simply mints the emissions for the current block to the account of the block author.

A side benefit of this would be that we would likely be able to remove the current (fairly complex) system we have for lazy apportionment of rewards. If each validator pays out their rewards once every num-validators blocks, rather than paying out all rewards every reward_interval blocks, then the performance benefit of the lazy apportionment / redeeming mechanism falls away.

If we apply this as part of a runtime upgrade we'd have to ensure that all rewards are paid out during the upgrade. It might be worth leaving the redeem extrinsic in temporarily as a no-op to avoid breaking assumptions in the CFE / UI / CLI.

Describe alternatives you've considered

Inherents: I initially though inherents would be the right approach for this, however on reflection I think inherents are appropriate when some eternal information needs to be passed in (such as the timestamp). However, in this case, we already have the information required, via pallet_authorship.

Aura: we could handle this by slashing authors who missed their authorship slot. This would be more explicit and arguably the 'correct' solution to the problem. However, it would require a custom fork of the aura pallet to revert the changes from this commit, plus some extra offence handling logic, and I would rather not touch aura if it can be avoided... particularly considering that (a) it's not clear why this was removed from aura in the first place - there might be a good reason and (b) there is an open substrate issue to add this back to core substrate (as per my comment below).

Additional context

session.validators: Vec<ValidatorId>
[
  5F9ofCBWLMFXotypXtbUnbLXkM1Z9As47GRhDDFJDsxKgpBu
  5Ge1xF1U3EUgKiGYjLCWmgcDHXQnfGNEujwXYTjShF6GcmYZ
  5DJVVEYPDFZjj9JtJRE2vGvpeSnzBAUA74VXPSpkGKhJSHbN
  5Decocf1WeTtWdyNMpuC4y6xzERYecat71Xk5GkcT8Y26EBZ
  5GVnJYpiknGcmbXPG97JYENNkzEhvyw9RfGPsTrrCF6tzDzo
  5CX9KrU7aFS5jM67qh7yuXZzhBimGpCrBJuTAQssBGXMvDzD <-- the offline one
]

The Validators currently producing blocks (5CX... is not in the list):

image

The Online and Reputation states for the Account:

image

Tag specific people for comment

@dandanlen @andyjsbell

andyjsbell commented 2 years ago

What's the priority on this one?

morelazers commented 2 years ago

p1

andyjsbell commented 2 years ago

Is #553 related?

morelazers commented 2 years ago

Probably, yep

dandanlen commented 2 years ago

Some interesting context (also relevant to #553) - I found the following comment in the aura pallet source:

// TODO [#3398] Generate offence report for all authorities that skipped their
// slots.

The associated issue has not been touched for > 2 years but links to some extra context. According to this issue and this PR, the current approach to use the im_online pallet to punish non-participating Aura nodes (so similar to our hearbeats).

However, according to that issue, it would be better to use Aura directly rather than hearbeats for heartbeats... so similar to our conclusion? I'll read the PR in detail to get a better understanding.

andyjsbell commented 2 years ago

Nice and simple. LGTM.

A side benefit of this would be that we would likely be able to remove the current (fairly complex) system we have for lazy apportionment of rewards. If each validator pays out their rewards once every num-validators blocks, rather than paying out all rewards every reward_interval blocks, then the performance benefit of the lazy apportionment / redeeming mechanism falls away.

Nice side effect :+1:

If we apply this as part of a runtime upgrade we'd have to ensure that all rewards are paid out during the upgrade. It might be worth leaving the redeem extrinsic in temporarily as a no-op to avoid breaking assumptions in the CFE / UI / CLI.

We might be able to remove it as we are quite still early in the day.

We have briefly discussed this but we should also consider how long they take to authour a block as it has consequences on other parts of the system which use blocks as a notion of time. Probably best in a new issue but I am just wondering on what the timeout is on failing to produce a block.

dandanlen commented 2 years ago

We have briefly discussed this but we should also consider how long they take to authour a block as it has consequences on other parts of the system which use blocks as a notion of time. Probably best in a new issue but I am just wondering on what the timeout is on failing to produce a block.

I think this is something to consider separately. It's a tricky one since a block author can take longer to author a block through no fault of their own (ie. the block might just be heavy due to some benchmarking issue). So it would have to be tracked over time and we would have to compare averages between different nodes or something. Also not sure we have access to reliable block production times - we have the timestamp but it's not precise and is relatively easy to fake.

andyjsbell commented 2 years ago

Also not sure we have access to reliable block production times - we have the timestamp but it's not precise and is relatively easy to fake.

Right, but I presume there is a timeout in not producing one so we are depending on a timestamp in the end anyhow??

dandanlen commented 2 years ago

I'm not sure how aura decides that a block author has missed their slot. The timestamp is allowed to drift by up to 30 seconds according to the implementation in the timestamp pallet. Maybe if the next block author 'overtakes' the current one and authors another valid block first?

Now that I think about it, I'm not entirely sure what the "last block" time in the polka js interface means. Is it the time required by the local node to process the latest block? Or the time taken to author & propagate it? The time taken by the network as a whole? Probably some combination of all of these but I'm not sure. Either way, I don't think we need to focus on this right now.

morelazers commented 2 years ago

LGTM, just one question:

allows the block author to mint the emissions

Is this going to be automatic, or would the Author's CFE be required to submit something? I'm assuming the former since it's neither inherent nor extrinsic, but I thought I'd ask.

dandanlen commented 2 years ago

Automatic. I've edited the proposal to make this clearer:

The way to do this would be in an on_initialize hook (probably in the rewards pallet) that pulls the block author from pallet_authorship and simply mints the emissions for the current block to the account of the block author.

dandanlen commented 2 years ago

@Janislav if you take a look at this - it might be enough to just plug in a different implementation of the RewardDistribution trait instead of the current one that is based on the rewards pallet.

Janislav commented 2 years ago

Makes sense to me. So the solution would be (probably) changing the implementation of the distribute method implemented on the OnDemandRewardsDistribution struct to mint the rewards not the reserve but to the account of the current block author. I would also consider moving implementation to the runtime because it's easier to access Authorship from here.

dandanlen commented 2 years ago

Yes that's the basic idea, but you can delete OnDemandRewardsDistribution and give it a better name - like BlockAuthorRewardDistribution.

I would also consider moving implementation to the runtime because it's easier to access Authorship from here.

Yes that sounds good.

Janislav commented 2 years ago

Alright. After I played a bit around I would probably prefer the following:

1) Leaving the implementation in the rewards pallet (turned out it's probably not so easy to move this to the runtime) 2) Introducing a new trait which we implement in the Runtime and returns us the current block author 3) Add a new config to the rewards pallet which takes the implementation of the trait we implemented in the runtime

With this, we have the AccountId of the current block author available in the rewards pallet and can mint the rewards to his account.

NOTE: Moreover, I think leaving the implementation in the rewards makes more sense because it logically belongs there.

dandanlen commented 2 years ago

The thing is, the entire rewards pallet exists purely to implement the lazy rewards distribution. Once we remove OnDemandRewardsDistribution, I think you'll find you can remove all the storage and methods from the rewards pallet, and there will be nothing left - we will have a pallet where all we really need it a trait implementation. At least that's my impression from reading the code - I haven't tried implementing it myself.

If you want, start along the path you think makes sense, and open a draft PR, I can take a look.

Janislav commented 2 years ago

ah okay... well that changes the situation

Janislav commented 2 years ago

will investigate this a bit more tomorrrow👌

Janislav commented 2 years ago

Okay, so as far as I can tell removing the rewards and pallet and payout the rewards directly should be possible. I just ask myself why we initially decided to pay out the rewards at the end of each epoch and implement this additional logic to make it possible. Pretty sure this was a design decision based on requirements and assumptions, right? Would like to check and fully understand what this would mean.

dandanlen commented 2 years ago

Ideally, we would pay out rewards equally to all validators on every block: this would give us the most accurate accounting and would be the most equitable way of doing things as long as we can rely on all validators behaving themselves.

However it is very inefficient since we have to iterate through a storage map + read/write to ~150 storage locations on every block. Hence we implemented the 'lazy' version where we track what each validator is entitled to without actually crediting it to their account until they ask for it, or until we run an auction. The lazy version is more efficient but significantly more complex.

The latest approach quite neatly solves the efficiency issue, and also adds an added incentive for validators to stay online author blocks, and it's much less complex than lazy rewards.

dandanlen commented 2 years ago

Btw. the new approach may require a change to the cli claim procedure - I believe it currently needs to do make a call to redeem to redeem the rewards before making a claim. This should no longer be necessary. Should be straightforward to fix though.

Janislav commented 2 years ago

Cool. Sounds like a plan. Then I would now:

1) Implementing RewardsDistribution in the runtime and paying out the rewards to the block author 2) Delete the rewards pallet 3) Remove the usage of all structs implemented in the rewards pallet (RewardRollover, Rewarder)

dandanlen commented 2 years ago

Sounds good. Also note that apart from the cli this will be a breaking change for the CFE (the pallet indexes will change when we remove the rewards pallet) and also probably for a bunch of stuff on the UI side of things...

dandanlen commented 2 years ago

@Janislav, pretty sure this can be closed, right?

Janislav commented 2 years ago

Yes, this is already merged