IntersectMBO / ouroboros-network

Specifications of network protocols and implementations of components running these protocols which support a family of Ouroboros Consesus protocols; the diffusion layer of the Cardano Node.
https://ouroboros-network.cardano.intersectmbo.org
Apache License 2.0
274 stars 87 forks source link

Leader VRF value no longer settling ties #4051

Open JaredCorduan opened 2 years ago

JaredCorduan commented 2 years ago

I do not know whether or not we've found a bug or just discovered the consequences of an intentional decision.


The final tie breaker for "slot battles" is the leader VRF value:

https://github.com/input-output-hk/ouroboros-network/blob/9249a70ed9e2365f3963e47cb31b4b1589bca8f6/ouroboros-consensus-protocol/src/Ouroboros/Consensus/Protocol/Praos/Common.hs#L62-L68

In the TPraos protocol (used prior to the Vasil HF), csvLeaderVRF was the leader VRF value. In the Praos protocol, however, csvLeaderVRF is being set to the single VRF value in the block header (prior to the range extension).

This removes a small advantage that small pools previously enjoyed. Small pools are more likely to win this tie breaker, since by being a small pool they need a smaller leader VRF value in order to win the leader check. Using the the VRF value before the range extension is applied removes this small advantage.


The Evidence:

The view, PraosChainSelectView, is populated by the BlockSupportsProtocol class method selectView, which uses the ProtocolHeaderSupportsProtocol class method pHeaderVRFValue to set csvLeaderVRF in the view.


This was discover here: https://github.com/cardano-community/cncli/issues/19

JaredCorduan commented 2 years ago

It seems that the main lever to pull to control the pool splitting problem is by properly incentivising pledge.

and just to connect the dots for everyone (apologies if this is obvious!), we can achieve this by increasing the $a_0$ protocol parameter. (see here)

TerminadaPool commented 2 years ago

we can achieve this by increasing the a0 protocol parameter.

But doing so, in the setting of the current formula, will massively benefit those that can pledge saturate a pool whilst making almost no difference to those that pledge in the 100K to 3M Ada range.

TerminadaPool commented 2 years ago

Unfortunately this code change that removed the leader VRF as the tie breaker for slot battles, was yet another blow to small pools in the decentralisation war against the mega pools. Increasing a0 will not help.

The current reward formula is not fit for purpose.

Here are some quotes about the a0 parameter from CIP proposal 0050:

The a0 parameter represents the fraction of the rewards (R/(1+a0)) which are not paid out unless all of the stake is pledged. An a0 of 0.3 ensures that 1.0 - 1.0/(1.0+0.3) = 23% of the total rewards R will be withheld from low pledge fraction pools and returned to the reserve. The effect of this formula is that increased pledge results in retaining more of the available rewards R. However, this benefit is not linear, rather it is drastically biased towards the saturation limit.

With increasing pledge as a proportion of total stake there is little noticeable effect on rewards until very high pledge percentages. These very high pledge percentages are not attainable except by extremely large stakeholders. Furthermore having such a high pledge percentage would defeat the purpose of community staking since the pool would already be saturated when the maximum pledge benefit is earned.

Without changing the formula, increasing a0 will exacerbate these problems.

daehan-koreapool commented 2 years ago

Unfortunately this code change that removed the leader VRF as the tie breaker for slot battles, was yet another blow to small pools in the decentralisation war against the mega pools. Increasing a0 will not help.

I agree with what @TerminadaPool said here. Regardless of whether previous leader VRF preference to small pools was an intended feature or not, we can all agree that a new update was introduced making it more disadvantageous for small pool operators' profitability.

Making the slot battle tie breaker equally random for all competitors sounds nice superficially, but it's actually against the whole incentivized strategy of Cardano network operation.

Even though Cardano is a proof of stake network, we need more human resource such as builders and participants more than ever. We need to start creating more incentives for people to enter the network, and this starts with small pool operators in my opinion.

cardanoinvest commented 2 years ago

@daehan-koreapool yes, and we had this "bias" towards small pool for 2 years, and I see nothing bad that happened. But with this gone, I see no incentives to delegate to small pool if you care about your ROI. Plus big pool get more rewards from TXs. They have 50 TXs per epoch for example, small pool get 1 - 5. So their total ADA rewards at the end is larger. There should be some bias for small pools to make sense. So where do we go from here? When can this be put to vote at least.

ccgarant commented 1 year ago

Remember that changing protocol parameters (k, min fee etc) is substantially easier technically than changing the code for the protocol. The TPraos -> Praos protocol change was a hard fork, and "rolling back" would also be a hard fork. So changing updatable protocol params "just" requires social consensus, whereas changing the protocol requires social consensus and a hard fork.

Hi @dcoutts, for my edification, trying to learn, in CIP-9, what makes the "Updatable Parameters" aka the reward protocol parameters "substantially easier" to change exactly?

I know they don't require a hardfork. I heard they only need 4/7 signing keys? Could you explain the differences and maybe point me in the right direction? Thanks!

dcoutts commented 1 year ago

Changing any of the updatable protocol parameters simply needs a governance decision and then posting a suitable-signed protocol parameter update on the chain (yes, currently that means 4 of the 7 governance keys). That's it. The change then takes place automatically at the next epoch boundary, without any actions needed on the part of any other user (SPO, wallet users etc).

By contrast, a hard fork is a much more elaborate and time consuming procedure. It requires all SPOs and all other users to upgrade to a new version of the code, giving people enough time to do so. This often involves a lot of communication and time for developers to test their applications on a testnet. Then in addition, once enough end users and SPOs have upgraded, then it also needs a signed protocol parameter update to be posted on the chain (to update the major protocol version).

So a hard fork is really a lot more work, and needs a lot more lead time.

cardanoinvest commented 1 year ago

Sooo, maybe we can revert it back now? And while everything is back to stable, like it was for 2 years, we can run some simulations how this change will affect the play field in the long run? Because I already feel those effects on myself and I know other SPOs with the same issue. My average monthly ROI dropped from stable average 4% to 1.6% . 3 blocks lost out of 8 in 6 Epochs all in battles. Maybe I am just very unlucky, but amount of battles seems bigger for me. And I run my pool from beta and ITN. And Cardano rating doesn't consider that new change. So small pool = bad. Even though it is random and not because server lags or smth.

TerminadaPool commented 1 year ago

I am going to poke my head up and see if I get shot again: I want to disagree that a hard fork is necessary to change how slot battles are settled.

As I understand, the decider of a slot battle is really the next block producer because he chooses which fork to build his block upon. If the next block producer after a slot battle is running software that chooses the winner based on lowest leader VRF then he will build on the block with the lower leader VRF. Other nodes in the network will then follow his chosen fork because it will now be the longest chain with his extra block added.

In other words, I believe that a modification to the software to decide slot battles based on lowest leader VRF is possible to be rolled out without a hard fork. In fact, if someone pointed out where to make the change in the code, I believe stake pool operators could choose to do this themselves. If a group did this then the percentage of slot battles that got decided by leader VRF score would reflect the stake weighed percentage of pool operators that made this change to their software.

Having said all that, I am not sure that I would want to do such a thing even though I run a small pool that would benefit from this change. I say this because I agree with Duncan's general principle of wanting the design to properly reflect the original intentions of the research.

brouwerQ commented 1 year ago

3 blocks lost out of 8 in 6 Epochs all in battles.

Are those all slot battles or mostly height battles? If height battles, are those mostly caused by a high propagation time of the other pools? Because if that is the case, the proposal mentioned/discussed by @TerminadaPool and @dcoutts earlier in this thread (https://github.com/input-output-hk/ouroboros-network/issues/2913#issuecomment-816953407) will be a better solution for this problem. Slot battles and 'real' height battles (so not those caused by really bad propagation time) should be much rarer I think (or you really do have really bad luck now, that's also possible).

brouwerQ commented 1 year ago

I want to disagree that a hard fork is necessary to change how slot battles are settled.

I also think a HF isn't necessary, but it will cause a 'wilder' chain with more forks. Pools with high saturation have an incentive not to upgrade, so a majority of the blocks will probably still be minted by pools using the current rules. With a HF, you can force everyone to use the new rules.

cardanoinvest commented 1 year ago

3 blocks lost out of 8 in 6 Epochs all in battles.

Are those all slot battles or mostly height battles? If height battles, are those mostly caused by a high propagation time of the other pools? Because if that is the case, the proposal mentioned/discussed by @TerminadaPool and @dcoutts earlier in this thread (#2913 (comment)) will be a better solution for this problem. Slot battles and 'real' height battles (so not those caused by really bad propagation time) should be much rarer I think (or you really do have really bad luck now, that's also possible).

2 were in slot battles 1 in height battle. But it was very rare for me before HF, maybe I had like 1 or 2 height battles lost in 2 years. During height battle I won slot battle but lost height to 8s prop. stake pool with 21 nodes confirming.

PhilippeLeLong commented 1 year ago

@TerminadaPool I also think you're right. The winning block in a slot battle isn't enforced by the ledger rules but by the node, so in theory SPOs could run custom code. This would probably lead to longer settlement times for forks.

On the actual topic, I don't think there's a need to roll back to the previous selection rule. In tandem with the minPoolCost this incentivizes pool splitting even more. Most small pools are supposed to die out, so that stake can move to a few small pools so they can become saturated. The actual delegation decision should be non-myopic. Delegators seem to be pretty bad at that, although, in their defense, they are not given the right tools. The ranking mechanism proposed by @brunjlar is still not implemented correctly in Daedalus or any other wallet or pool ranking site.

dcoutts commented 1 year ago

Well done to the eagle-eyed folks who spotted that in principle the tie breaking would not necessarily require a hard fork. That is true, right now. I should caution however that it will not be true in future: there will later be a change to the chain sync protocol that will require that both parties exactly agree on the chain ordering (i.e. including how one does tie breaking), and any disagreement would become a protocol error. So again, it's not a direction that I would encourage anyone to go in.

TerminadaPool commented 1 year ago

there will later be a change to the chain sync protocol that will require that both parties exactly agree on the chain ordering (i.e. including how one does tie breaking), and any disagreement would become a protocol error.

I would have thought there is no fundamental difference between the next block producer choosing one particular single block fork (slot battle winner) versus dropping the block he didn't like as though he never received it. In both cases he will build his block on the slot battle winner he prefers. This sort of thing already currently happens when one of the blocks from a slot battle is received too late by the next block producer. The Nakamoto longest chain rule is then used to decide the canonical chain.

Are you saying that if a block producer mints a block on the wrong slot battle fork, because he didn't receive the other fork in time, that his block will then become invalidated due to a protocol error?

PhilippeLeLong commented 1 year ago

I suppose this has to do with the introduction of input endorses. I also welcome any change that enforces correct behavior by ledger rules. Right now there's nothing stopping block producers from omitting Txs or reordering their mempool. Input endorsers will make this kind of foul play much more difficult.

gitmachtl commented 1 year ago

@TerminadaPool

Are you saying that if a block producer mints a block on the wrong slot battle fork, because he didn't receive the other fork in time, that his block will then become invalidated due to a protocol error?

If your node builds on top of a block that is not the winner, your block and the fork will be sorted out sooner or later yes. Take a look at the issue we had with the node a while ago. If you introduce different node decisions in parallel, you will see a lot of those forks again and a lot of lost blocks too. The only way to avoid this is via a hard fork event.

TerminadaPool commented 1 year ago

@gitmachtl

If your node builds on top of a block that is not the winner, your block and the fork will be sorted out sooner or later yes.

If you are the next block producer after a slot battle and you have not received one of the blocks then you will build on the block you received because it is a valid block. Your block produced is also a valid block. Other nodes in the network will now see your 2 block fork as longer than the alternative 1 block fork from the slot battle.

I mean, it is not like you can wind back time and re-mint your block on the other fork when you eventually receive the other block from the slot battle.

Nobody can know if you did receive both blocks of the slot battle fork and selectively dropped one, or only received the block you prefer.

I realise that I am a nobody on this forum, but surely my argument is simply restating the "longest chain rule".

gitmachtl commented 1 year ago

Sure the longest chain wins, but this only happens if you have the majority of the block producing nodes on that chain. So again, doing this without a hardfork will result in the same behavior we saw before with the node bug. Most pools with large amount of stake will not upgrade to that newer version without massive social pressure.

TerminadaPool commented 1 year ago

With input endorsers I imagine the analogous scenario would be where two pools are slot leader to produce a ranking block for the same slot. Depending on the implementation both of these ranking blocks may be accepted and everyone will agree the ordering of these ranking blocks such that the block with the higher "block VRF" will be ordered later.

In other words, with input endorsers, slot battles will go the way of the dodo. So this whole discussion will be rendered irrelevant.

os11k commented 1 year ago

@rdlrt This is not the case. It's now using the single vrf value from the block. I've been calling it the block_vrf, but it really has nothing to do with the contents of the block. It's just using a random value from an earlier step of the slot leader selection instead of the final step which is the leader_vrf. As long as you're using the same pool keys, two pools (dual leaders) will produce the same vrf value. It's this vrf value that is then hashed again to get the leader_vrf.

@AndrewWestberg sorry for a bit off topic but, does it mean that we can run now 2 BPs for same pool side by side for HA purposes? If there will be no battle between blocks produced by 2 BPs of same pool... Am I missing something?

AndrewWestberg commented 1 year ago

@os11k NO. It does not mean you can now run 2 BPs for the same pool.

papacarp commented 1 year ago

multiple BP nodes with the same keys? that won't help, that doesn't change how consensus chooses between forks.

Sorry to dredge up the past @JaredCorduan but I'm trying to document how ties are settled in the case of multiple BP's running with the same keys. Plus, this is technically still an open issue apparently...

It appears to me there is no preference given, The PraosChainSelectView will get all the way down to EQ here:

https://github.com/input-output-hk/ouroboros-network/blob/0b185d373f915c517906e2d7c88410f51ba15601/ouroboros-consensus-protocol/src/Ouroboros/Consensus/Protocol/Praos/Common.hs#L63-L82

@rdlrt referenced this issue here: https://github.com/input-output-hk/ouroboros-network/issues/2014

Which makes it seem like the decision would fall down to block hash everything else being equal. When I analyzed 9 different well distributed forker events I don't see a clear rule here to use.

8097023 lower block hash https://pooltool.io/realtime/8097023 8078291 higher block hash 8031076 lower block hash 7930749 higher block hash 7842717 lower block hash 7837263 lower block hash 7700528 higher block hash 7658059 higher block hash 7640953 lower block hash

I'm ok saying it's random depending on how blocks are stored in the volatileDB for the block producer that makes the next block. Just want to be accurate as its going in this "learning cardano" book.

JaredCorduan commented 1 year ago

@papacarp

I'm trying to document how ties are settled in the case of multiple BP's running with the same keys.

That's great, thank you!

If two blocks are at the same chain depth, and they have the same issuer (ie f v1 == f v2 in the code snippet above), then it compares the csvIssueNo. In other words, it looks for the higher counter in the operational certificate. If the counters are equal, then chooses the lower lexicographic VRF nonce value (csvTieBreakVRF). In the (astronomically unlikely?) event that the nonces are equal, I do not know what happens.

papacarp commented 1 year ago

@papacarp

I'm trying to document how ties are settled in the case of multiple BP's running with the same keys.

That's great, thank you!

If two blocks are at the same chain depth, and they have the same issuer (ie f v1 == f v2 in the code snippet above), then it compares the csvIssueNo. In other words, it looks for the higher counter in the operational certificate. If the counters are equal, then chooses the lower lexicographic VRF nonce value (csvTieBreakVRF). In the (astronomically unlikely?) event that the nonces are equal, I do not know what happens.

Thank you for your response. Nonces are equal because its the same pool secrets making the same block at the same slot. cncli is sending pooltool the "block vrf's" and I've indeed documented that the vrf is exactly the same for both submitted blocks with different hashes. (just to be precise, i don't keep all the LSB's of the vrf, so if its in the lower vrf then I guess it could be different.. for example for height 8078291 the vrf values are both 1.065202050227698e+154)