Open dnadales opened 2 years ago
Another (orthogonal) idea to tackle this issue is to introduce refined notion of revalidation, tentatively called "slot-only revalidation", that is significantly cheaper and could be used in most cases instead of "ordinary" revalidation. This idea has been floating around for some time (in particular in the context of UTxO HD, as revalidation looks at the disk there, while slot-only revalidation doesn't have to), and we revisited it today in our sync.
When we fetch a mempool snapshot in the forging loop, we call getSnapshotFor
https://github.com/IntersectMBO/ouroboros-consensus/blob/43e04df72c7fee3ffaf5a191c3c2680377e7e1be/ouroboros-consensus-diffusion/src/ouroboros-consensus-diffusion/Ouroboros/Consensus/NodeKernel.hs#L554-L556
which is implemented here:
https://github.com/IntersectMBO/ouroboros-consensus/blob/43e04df72c7fee3ffaf5a191c3c2680377e7e1be/ouroboros-consensus/src/ouroboros-consensus/Ouroboros/Consensus/Mempool/Impl/Common.hs#L467-L497
In particular, we revalidate the txs[^1] unless both of these equalities hold:
https://github.com/IntersectMBO/ouroboros-consensus/blob/43e04df72c7fee3ffaf5a191c3c2680377e7e1be/ouroboros-consensus/src/ouroboros-consensus/Ouroboros/Consensus/Mempool/Impl/Common.hs#L484-L485
isTip == castHash (getTipHash st')
will usually hold due to the background thread that syncs the mempool to the LedgerDB tip.
The scenario where it would fail to hold is if we adopt a block right around the time when we find out that we lead a slot. In this case, revalidating the txs is exactly the right tool for the job.
isSlotNo == slot
will usually not hold, as isSlot
is set to the successor of the tip slot of the ledger state the mempool was last synchronized against. The probability that two slots in a row are active is $f = 1/20$, and that doesn't account for the probability that the first block might not diffuse quickly enough to reach the block producer for the second slot.
The idea now is to improve performance in the case when isTip == castHash (getTipHash st')
but isSlotNo /= slot
by introducing a relaxed mode of revalidation ("slot-only revalidation"). In that case, the only way how a transaction can become invalid is if a slot-related check fails[^4] (most prominently, expiration of the tx validity interval[^3]), or if it depends on such a transaction. The main benefit is that we never need to look up anything in the big UTxO map, so it seems plausible to expect it to be significantly faster than revalidation.
The upside of this approach is that it would make mempool snapshotting quite fast in the common case[^2]. The downside is that it would require new non-trivial logic in the ledger, and ideally some argument/formalization that this in fact sufficient.
[^1]: Note that we store the result in a TxSeq
, a strict fingertree, so the fact that we later only take a prefix still means that we always revalidate all txs. Fixing that is the idea in the issue description, an orthogonal improvement.
[^2]: One aspect to consider here is average-case == worst-case
, but it seems hard for an adversary to do better than random here due to the privacy of the leader schedule.
[^3]: Rewards also change over time (e.g. at epoch boundaries), but this seems fine as the amount of rewards can only increase by letting time pass, so txs can't get invalid. Things like this would need careful consideration.
[^4]: One assumption here is that ticking doesn't remove entries from the UTxO map. The only exception we are currently aware of is the removal of AVVM entries at the transition to Allegra.
Right now, it seems like fetching a mempool snapshot for a ledger state will eagerly validate all transactions. When forging a block, it would be nice to only have it validate transactions until the block is full.
One idea would be to make sure that the returned list of transactions is lazily filtered by validity.
Also, the
TraceForgingMempoolSnapshot
event is currently forcing the length of the transactions, which has to be adapted for this to work.