This Issue is essentially to revisit the resolution of Issue input-output-hk/ouroboros-network#231 (let nodes join late).
Our ThreadNet test suite currently involves the following test groups.
BFT protocol with mock ledger
P(ermissive)BFT protocol with mock ledger
PBFT protocol with real ledger (Byron)
Praos protocol with mock ledger and static leader schedule
Praos protocol with mock ledger (and its natural leader schedule)
T(ransitional)Praos protocol with real ledger (Shelley -- but currently only with BFT overlay leader schedules, no stake pools yet as of master at bcf0f53a6c19b4e64cebd4e0f0ca35548f8985d5)
I have recently realized that we should limit the join schedules differently for each test group, in accord with the environment restrictions listed in the relevant protocol's corresponding Ouroboros paper. See the table of protocol-paper pairs here https://github.com/input-output-hk/ouroboros-consensus/issues/797. (Note that Ouroboros Classic does not occur in our repository.)
The test groups currently share most of their infrastructure. In particular, they all let each node first join the net after some delay (see Issue input-output-hk/ouroboros-network#231). However, during my most recent reading (skimming, admittedly) of each paper, the protocols do not all support that.
The BFT paper's analysis doesn't consider any nodes joining late.
The PBFT spec doesn't list its assumptions; for now I'll assume it inherits those of BFT, since PBFT describes itself as a variant of BFT.
The Praos paper assumes new nodes (ie stakeholders/honest parties) can join, but they must have artificially somehow already selected one of the chains that an existing node currently has selected.
Generally, if a node is (indirectly) mentioned in the genesis block, then it should be online at the onset of slot 0, since it is an "initial node" and the paper's either explicitly or implicitly assume every node is online when it is supposed to lead. Moreover, only the Praos paper considers having new nodes join the net at some point in the future. (Ouroboros Classic does too, but that protocol is only a historical concern for this repository.)
This Issue is a blocker for adding clock skew (Issue input-output-hk/ouroboros-consensus#753) and message latencies (Issue input-output-hk/ouroboros-consensus#802). The current tests pass even with nodes joining late because, in the context of the test suite's perfect synchrony (to be spoiled by clock skew and network latency), we're able to predict the net's behavior well enough to discard the cases that are disrupted by late joins. This Issue is to discard that extra complexity from the test suite and instead only challenge protocols in ways that their published analysis anticipates.
The late joins in particular can cause the net to unavoidably create a chain that does not meet the chain density invariant. That invariant is supposed to be ensured by the protocol itself (ie Chain Growth), but that's only true (only up to "with high probability" for Praos) when the test environment respects the analysis's listed restrictions. The chain density invariant is actually irrelevant for the mock ledger (it allows unrestricted anachronistic views and does not rely on a notion of "stable transactions"), but it can cause test failures for the real ledgers (since they assume a block at least 2k slots old is necessarily part of the immutable chain).
This Issue is essentially to revisit the resolution of Issue input-output-hk/ouroboros-network#231 (let nodes join late).
Our
ThreadNet
test suite currently involves the following test groups.master
at bcf0f53a6c19b4e64cebd4e0f0ca35548f8985d5)I have recently realized that we should limit the join schedules differently for each test group, in accord with the environment restrictions listed in the relevant protocol's corresponding Ouroboros paper. See the table of protocol-paper pairs here https://github.com/input-output-hk/ouroboros-consensus/issues/797. (Note that Ouroboros Classic does not occur in our repository.)
The test groups currently share most of their infrastructure. In particular, they all let each node first join the net after some delay (see Issue input-output-hk/ouroboros-network#231). However, during my most recent reading (skimming, admittedly) of each paper, the protocols do not all support that.
Generally, if a node is (indirectly) mentioned in the genesis block, then it should be online at the onset of slot 0, since it is an "initial node" and the paper's either explicitly or implicitly assume every node is online when it is supposed to lead. Moreover, only the Praos paper considers having new nodes join the net at some point in the future. (Ouroboros Classic does too, but that protocol is only a historical concern for this repository.)
This Issue is a blocker for adding clock skew (Issue input-output-hk/ouroboros-consensus#753) and message latencies (Issue input-output-hk/ouroboros-consensus#802). The current tests pass even with nodes joining late because, in the context of the test suite's perfect synchrony (to be spoiled by clock skew and network latency), we're able to predict the net's behavior well enough to discard the cases that are disrupted by late joins. This Issue is to discard that extra complexity from the test suite and instead only challenge protocols in ways that their published analysis anticipates.
The late joins in particular can cause the net to unavoidably create a chain that does not meet the chain density invariant. That invariant is supposed to be ensured by the protocol itself (ie Chain Growth), but that's only true (only up to "with high probability" for Praos) when the test environment respects the analysis's listed restrictions. The chain density invariant is actually irrelevant for the mock ledger (it allows unrestricted anachronistic views and does not rely on a notion of "stable transactions"), but it can cause test failures for the real ledgers (since they assume a block at least 2k slots old is necessarily part of the immutable chain).