Open edsko opened 4 years ago
there is no reason why we can't produce a block even if we are far ahead.
Given the typical relationships between various constants we've maintained so far, producing a block in this case would create a chain that directly violates the Chain Growth invariant. You and I have discussed this before, and you emphasized that the protocol is supposed to make such violations impossible. And since the protocol ensures the Chain Growth invariant, the node shouldn't have to.
However, when a Byron node is "catching up" (which the relevant comments in the source identify as the only scenario in which we expect this case to arise), it is beyond the scope of the analyses in the paper. Only Ouroboros Genesis is designed to support "catch up" -- everything else (Praos, (P)BFT, etc) is just supposed to "do its best" as I understand it. The Praos paper in particular assumes that any new stakeholder has, by the time it joins, somehow already selected a chain that some other pre-existing node has most recently selected. (Maybe stakepool registration is enough to ensure this?)
So I think -- until we have the Genesis rule -- the node should go willingly choose to not forge if it can clearly see doing so will create a chain that violates Chain Growth.
(I agree that the hard fork clock concerns might make this question moot, and you and I already have plans to discuss that further. I'm just recording this prompt here somewhat anachronistically.)
What do you think?
We looked over this in the Consensus call. I do not think we should merge this PR. The main reason:
The performance bug under recent scrutiny is that forecasting is way more expensive than it can-and-should-be. Once that's fixed, the current code's behavior will be exactly what we want.
I had an idea (along the lines of what Javier did in his PR) to "centralize and cache the ticking". Something like the ChainDB maintains a ledger state that has been ticked to the wallclock (usually that means: it ticks by one slot every one second). The node kernel could use that for the leadership check and the new block.
The catch is that I think the ChainDB receives the newest remote block more than a second after that block was minted. And so the "ledger state ticked to the wall clock" would often be ahead of the ledger state needed to validate the newest remote block. If instead the ChainDB tended to receive the block during the block's own slot, then both that "wallclock ledgerstate" would suffice for both the node kernel's leadership check and the ChainDB's validation of the next remote block.
Edit: so, since we don't really have an easy way to force blocks to propagate within 1s (:D), this idea is probably a dead-end.
Would this idea correspond to input-output-hk/ouroboros-network#4054 ?
I would say that idea is only partly related to Issue 4054. They both incrementally maintain a ticked ledger state as time passes. However, 4054 is just treating a symptom of a performance bug. My idea above (though I think it's a dead-end; see my edit above) would be trying to combine what are currently two separate ticking computations, and doing so would happen to remove the need for forecasting (which is the same as my comment on 4054).
Some comments on the above discussions, FWIW.
The window in which the HFC can translate the current wallclock to a slot is much larger than the window in which it can forecast. So there is still an artificial limit that could be removed.
This ledger PR made it so that the expensive epoch-boundary-crossing calculation only happens once by creating the thunk early enough that all later extensions will share it. So the forecast should be cheap every time except the first time that thunk is forced.
Theoretically, the current long-term general plan of Quantitative Timing Agreements should eventually prevent any surprise performance regression that would have been defended (ie _masked!) by a defense strategy a la Javier's caching.
The current Consensus API does not allow for incremental ticks: you can't tick a second time until you apply a block to the result of the first tick. There are first-principle reasons to relax this invariant, but it's not entirely unmotivated.
Generally, the only time we'd care about a node minting a block that would obviously violate the Chain Growth property is if there is a network-wide disaster that lasted several hours (~8hr or more) and it would perhaps be beneficial for blocks to mint anyway during that disaster. The anticipate disaster plan involves a off-chain-organized cooperative effort to retcon the chain during such a disaster, so the blocks minted during it probably don't matter anyway.
I would suggest that that last bullet point is pre-requisite for this Issue: is there even any scenario in which the block minted when the current slot's ledger view is not forecast-able would be relevant/useful?
Block production was using
anachronisticLedgerView
to request a ledger view for the leader check, and failing withTraceNoLedgerView
if that failed to produce a ledger view. However, as part of the work on input-output-hk/ouroboros-network#1933 we realized usinganachronisticLedgerView
here is wrong, and we should useapplyChainTick
instead: there is no reason why we can't produce a block even if we are far ahead. I have however left in the call toanachronisticLedgerView
(soon to beledgerViewForecastAt
), because the consensus tests are currently failing without it. I strongly suspect that this is because of a limitation of the test infrastructure (not quite setting parameters strictly enough) rather than a true bug. We should try to remove it and update the tests.Note however that once the hard fork timing stuff is merged, we will re-introduce a restriction: if our ledger is far behind, we can't even convert the current
UTCTime
to aSlotNo
, and so we can't do the leader check. That might in fact reintroduce the same limitation, and fix this problem without the current artificial check.