IntersectMBO / ouroboros-network

Specifications of network protocols and implementations of components running these protocols which support a family of Ouroboros Consesus protocols; the diffusion layer of the Cardano Node.
https://ouroboros-network.cardano.intersectmbo.org
Apache License 2.0
276 stars 86 forks source link

Extend PBFT tests with protocol parameter update proposals/votes/endorsements/adoption #1514

Closed edsko closed 4 years ago

edsko commented 4 years ago

The existing PBFT tests already deal with delegation certificates. They should be extended with update proposals/votes/endorsements. I think the general setup would be similar: "posting the delegation certificate" would correspond to posting an update proposal and having nodes vote on it, and then the proposal is endorsed when "enough" nodes are restarted (which is also required for delegation certificates). We should then also verify somehow that the update proposal is in the end endorsed -- for example, change a protocol parameter and then produce a block that depends crucially on the new parameter.

edsko commented 4 years ago

Ideally we'd merge these two sets of tests, but that's harder: https://github.com/input-output-hk/ouroboros-consensus/issues/725.

edsko commented 4 years ago

@nfrisby has been busy tracing down other problems. Moving this to the next sprint.

nfrisby commented 4 years ago

I've familiarized myself with the relevant parts of cardano-ledger and the corresponding PDF. The basic changes needed for Network seem pretty straight-forward (though I suspect I might need to introduce some additional synchronization so that nodes restart "immediately" after the onset of the first slot of the new epoch).

Please confirm two observations:

1) Adopted updates only take effect at the beginning of a new epoch.

2) There are only ~two~ (edit) three parts (what else?) of the static node configuration that need to be updated when a protocol update is adopted.

  * ~The slot length (which is passed to the `realBlockchainTime` call in `Node.hs`) must be updated to the new `ppSlotDuration (adoptedProtocolParamters cvsUpdateState)`.~ (Edit: `ppSlotDuration` has not and never will change via a protocol parameter update proposal; see input-output-hk/cardano-ledger#642 and input-output-hk/ouroboros-network#1725.)

  * The maximum mempool size (which is chosen as 2 the max block size of the "current" ledger when the node starts) must be updated according to the new `ppMaxBlockSize (adoptedProtocolParamters cvsUpdateState)`.

  * ~The `byronProtocolVersion` field of the `BlockConfig ByronBlock` record needs to be updated to the new `adoptedProtocolVersion cvsUpdateState`.~ (Edit: it seems like that's not actually necessary; see my next comment on this Issue.)

  * All other protocol parameter values (from what I've seen so far) are only read from `adoptedProtocolParamters cvsUpdateState` as-needed during the various validation routines in `cardano-ledger`.

cc: @edsko @mrBliss @nc6

nfrisby commented 4 years ago

I have to first figure out how these features are all supposed to work before I can write code to test it. This large comment is the result. Since it is essentially documentation and code review, is there a better place for it to live? I anticipate raising some Issues as a result of this review, but haven't done so yet. I plan to at least discuss the details with @edsko first.

From skimming the Byron ledger spec pdf and some of its references and inspecting the cardano-ledger code, I've determined the overall sequence of events involved in a successful protocol update proposal -- I'm ignoring software update proposals for now, since they seem beyond the scope of our current test suite.

  1. A protocol update proposal transaction is created . It proposes new values for some protocol parameters and a greater protocol version number as an identifier. There cannot be two proposals with the same version number.

  2. Genesis key delegates can add vote transactions that refer to such a proposal (by its hash). They don't have to wait; a node could add a proposal and a vote for it to its mempool simultaneously. There are only positive votes, and a proposal has a time-to-live (see ppUpdateProposalTTL) during which to gather sufficient votes. While gathering votes, a proposal is called active.

  3. Once the number of voters satisfies a threshold (currently determined by the srMinThd field of the ppSoftforkRule protocol parameter), the proposal becomes confirmed.

  4. Once the threshold-satisfying vote becomes stable (ie its containing block is >=2k slots old), a block whose header's protocol version number (CC.Block.headerProtocolVersion) is that of the proposal is interpreted as an endorsement of the stably-confirmed proposal by the block's issuer (specifically by the Verification Key of its delegation certificate). Endorsements -- ie any block, since they all contain that header field --- also trigger the system to discard proposals that were not confirmed within their TTL. https://github.com/input-output-hk/cardano-ledger/blob/172b49ff1b6456851f10ae18f920fbfa733be0b0/cardano-ledger/src/Cardano/Chain/Block/Validation.hs#L439-L444 Notably, endorsements for proposals that are not yet stably-confirmed (or do not even exist) are not invalid but rather silently ignored. In other words, no validation applies to the headerProtocolVersion field.

  5. Once the number of endorsers satisfies a threshold (same as for voting), the confirmed proposal becomes a candidate proposal.

  6. At the beginning of an epoch, the candidate proposal with the greatest protocol version number among those candidates whose threshold-satisfying endorsement is stable (ie the block is >=2k slots old) is adopted: the new protocol parameter values have now been changed. If there was no stably-candidated proposal, then nothing happens. Everything is retained; in particular, a candidate proposal whose threshold-satisfying endorsement was not yet stable will be adopted at the subsequent epoch unless it is surpassed in the meantime. When a candidate is adopted, all record of other proposals/votes/endorsements -- regardless of their state -- is discarded. The explanation for this is that such proposals would now be interpreted as an update to the newly adopted parameter values, whereas they were validated as an update to the previously adopted parameter values.

In summary, the following diagram tracks the progress of a proposal that's eventually adopted. For other proposals, the path short circuits to a "rejected/discarded" status at some point.

active proposal
    --> (sufficient votes)
confirmed proposal
    --> (2k slots later)
stably-confirmed proposal
    --> (sufficient endorsements)
candidate proposal
   --> (2k slots later)
stably-candidated proposal    (Frisby: stably-nominated?)
   --> (epoch transition)
adopted proposal

I have some concerns that maybe should become Issues.


The above has some implications for this Issue.

~I think having the test nodes restart at the beginning of every epoch seems reasonable, as long as I take precautions to not e.g. discard the contents of the mempool. However, to also restart in order to start endorsing seems like it's going to be quite confusing. I don't see an alternative right now, unless we some how choose to make the source of the headerProtocolVersion mutable.~

~Moreover, ouroboros-consensus currently neglects changes to the ppSlotDuration parameter; see Issue input-output-hk/ouroboros-network#1725 for more. There's a bit of a chicken-and-egg problem here: the ChainDB requires a BlockChainTime in order to open, but to have a correct BlockChainTime if ppSlotDuration has changed over time requires reading the latest ledger validation state with the current slot ticked, in case that's an epoch transition.~

Edit: As it turns out, since ppSlotDuration is not actually updatable and the use of a stale ppMaxBlockSize for the mempool size won't actually affect consensus (as long as the max block size was never essentially 0), the node does not need to restart when a protocol parameter update proposal is adopted.

Edit: The answer on Issue 1725 implies that ppSlotDuration has not and never will change via a protocol parameter update proposal. Also, Issue input-output-hk/ouroboros-network#1498 is to make the mempool's max size change dynamically in response to ppMaxBlockSize updates.

edsko commented 4 years ago

@dnadales , could you confirm https://github.com/input-output-hk/ouroboros-network/issues/1514#issuecomment-592277721 ?

dnadales commented 4 years ago

Adopted updates only take effect at the beginning of a new epoch.

Yes.

The slot length (which is passed to the realBlockchainTime call in Node.hs) must be updated to the new ppSlotDuration (adoptedProtocolParamters cvsUpdateState). (Maybe not: see input-output-hk/cardano-ledger#642 and input-output-hk/ouroboros-network#1725)

As far as I remember we wouldn't be changing the slot duration on protocol updates. At least not as a protocol parameter.

The maximum mempool size (which is chosen as 2 the max block size of the "current" ledger when the node starts) must be updated according to the new ppMaxBlockSize (adoptedProtocolParamters cvsUpdateState).

This sounds reasonable.

All other protocol parameter values (from what I've seen so far) are only read from adoptedProtocolParamters cvsUpdateState as-needed during the various validation routines in cardano-ledger.

Yup, seems right.

CC: @nfrisby

edsko commented 4 years ago

Thanks @dnadales !

As far as I remember we wouldn't be changing the slot duration on protocol updates. At least not as a protocol parameter.

This will happen, albeit indirectly: protocol updates determine the timing of the hard fork, and the hard fork can change the slot length. This is however the only place where slot lengths can change.

nfrisby commented 4 years ago

Thank you both.

protocol updates determine the timing of the hard fork

@edsko How does that happen? Are you referring to a software update proposal as opposed to a protocol (parameter) update proposal?

edsko commented 4 years ago

Uh, yes, I think that might be right. @dcoutts or @JaredCorduan can confirm.

nfrisby commented 4 years ago

I just bors'd PR 1747 (edit: it has now merged). It adds a somewhat trivial proof-by-example that it's possible for the ThreadNet tests to increment the Byron protocol version, which is the necessary precondition for the Byron-to-Shelley hard-fork. Every generated RealPBFT test case now includes a proposal and votes and endorsements that should cooperatively update the protocol version at the ~next~ second epoch.

Without retuning the other RealPBFT generators, it seems like half of the test cases successfully do so and half don't (e.g. they don't even reach the second epoch, or the proposal times out before enough votes, etc).

The test takes the most direct path:

@edsko Is that sufficient to close this Issue? If not, can we replace it with a lower priority/urgency Issue? Thanks.

edsko commented 4 years ago

Yes, we can close this. Opened input-output-hk/ouroboros-network#1768 for testing protocol parameter changes (medium priority), and input-output-hk/ouroboros-consensus#703 for testing non-ideal voting scenarios (low priority). There is also your existing input-output-hk/ouroboros-network#1733 about testing software version updates, which we think is still high priority; see https://github.com/input-output-hk/ouroboros-network/issues/1733#issuecomment-596961953.

We suggest to look at input-output-hk/ouroboros-network#1733 and then move on to to Shelley, and in particular, start looking at input-output-hk/ouroboros-network#1770 .

edsko commented 4 years ago

Let's first get https://github.com/input-output-hk/ouroboros-network/pull/1564 merged though, so that we can close input-output-hk/ouroboros-network#1297 (I guess that's almost done anyway).