IntersectMBO / ouroboros-consensus

Implementation of a Consensus Layer for the Ouroboros family of protocols
https://ouroboros-consensus.cardano.intersectmbo.org
Apache License 2.0
31 stars 22 forks source link

Only generate NodeJoinPlans that respect the corresponding Ouroboros paper's environment restrictions #695

Open nfrisby opened 4 years ago

nfrisby commented 4 years ago

This Issue is essentially to revisit the resolution of Issue input-output-hk/ouroboros-network#231 (let nodes join late).


Our ThreadNet test suite currently involves the following test groups.

I have recently realized that we should limit the join schedules differently for each test group, in accord with the environment restrictions listed in the relevant protocol's corresponding Ouroboros paper. See the table of protocol-paper pairs here https://github.com/input-output-hk/ouroboros-consensus/issues/797. (Note that Ouroboros Classic does not occur in our repository.)

The test groups currently share most of their infrastructure. In particular, they all let each node first join the net after some delay (see Issue input-output-hk/ouroboros-network#231). However, during my most recent reading (skimming, admittedly) of each paper, the protocols do not all support that.

Generally, if a node is (indirectly) mentioned in the genesis block, then it should be online at the onset of slot 0, since it is an "initial node" and the paper's either explicitly or implicitly assume every node is online when it is supposed to lead. Moreover, only the Praos paper considers having new nodes join the net at some point in the future. (Ouroboros Classic does too, but that protocol is only a historical concern for this repository.)


This Issue is a blocker for adding clock skew (Issue input-output-hk/ouroboros-consensus#753) and message latencies (Issue input-output-hk/ouroboros-consensus#802). The current tests pass even with nodes joining late because, in the context of the test suite's perfect synchrony (to be spoiled by clock skew and network latency), we're able to predict the net's behavior well enough to discard the cases that are disrupted by late joins. This Issue is to discard that extra complexity from the test suite and instead only challenge protocols in ways that their published analysis anticipates.

The late joins in particular can cause the net to unavoidably create a chain that does not meet the chain density invariant. That invariant is supposed to be ensured by the protocol itself (ie Chain Growth), but that's only true (only up to "with high probability" for Praos) when the test environment respects the analysis's listed restrictions. The chain density invariant is actually irrelevant for the mock ledger (it allows unrestricted anachronistic views and does not rely on a notion of "stable transactions"), but it can cause test failures for the real ledgers (since they assume a block at least 2k slots old is necessarily part of the immutable chain).

dnadales commented 9 months ago

@nfrisby is this issue still relevant?