[Soft fork] Modify validation control flow for Bitswap block propagation

nholland94 commented 3 years ago

Included in this, need to support partial breadcrumbs in the frontier (or another intermediate data structure in front of the frontier that tracks partial blocks).

georgeee commented 2 years ago

Some discussion (copying here from a DM thread):

George:

So, I am considering the following design: I think of introducing a header frontier (HF). This will be just the same as full frontier, only with headers instead of breadcrumbs. The invariant will be for each transition will be in one of the following states:

Part of full frontier (FF)
Part of catchup tree
Part of header frontier
Not known to the node

Each transition in HF has parent: a) a root transition b) a transition in HF c) a transition in FF. A transition is added to HF when it's received on the gossip and condition above applies. If condition above doesn't hold, a catch-up is triggered. For each transition in HF we request associated bitswap block tree to be retrieved. Each transition in HF has a timeout assigned. After the timeout the transition is removed from HF and is scheduled for catch-up. ——————————— Justification for such design is the following:

I don't want to implement a new catch-up procedure right now and want to leave super-catchup intact
I want to rely on super-catchup even in case the Bitswap fails
I want to abstain from using super-catchup in "average case"

Follow-up on this: I think it's a good idea to have gossip messages of 3-4 headers, representing a chain of a block and a few of its immediate ancestors This will cover all the short forks that regularly happen on the network This way I think we will almost never require catchup, bar the catchup on node's start

Nathan:

I finally had some time today to think through this some. After working through the problem in my head some, I believe I'm largely aligned with the strategy of introducing a new HeaderFrontier, though there are some complexities of this approach I am still thinking through. I'm diagramming and writing my thoughts up on this, will share with you this week. I'm thinking we can simplify some of the transition handler subsystem while we add this without having to dig into the guts of the frontier or the catchup subsystem.

One thing I'm struggling to come to grips with is how this will continue to add complexity in the management of all these different frontier views we have (catchup tree, full frontier, and now header frontier). The catchup tree and header frontier both need to be pruned based on root transitions to the full frontier. We also need to decide on exact invariants on when a transition is stored in the catchup tree vs the header frontier, and those 2 need to interact (catchup tree needs to know if we acquired a block body for a dependency over bitswap, and header frontier needs to know when catchup jobs succeed and fail). My current thought is that, while a larger change, we should consider folding the catchup tree and the header frontier into the same data structure. This would simplify the model a lot, I think, but would require more extensive updates to the catchup mechanism. It's probably possible to expose this data structure directly to catchup the same way that catchup works off of the catchup tree today, but it might make sense to rethink having a separate state machine for catchup altogether. All-in-all, I counted 9 types of notifications that need to be sent between these subsystems and data structures under this approach, which is a lot. Each of these notifications needs thought put into how they will fit into the concurrency model. The ideal model in my head is to have a singular state machine running for all "in-flight" transitions. All the pending transitions would be stored in the same data structure, with states such as WaitForParentValidation, WaitForBitswap, WaitForCatchup, BodyAvailable and BodyValidated. We could thus simplify the catchup mechanism to not operate off of a state machine model, but instead to be a simple worker pool that just downloads batches of blocks from other peers based on jobs we dispatch to it. The obvious downside of this is that it creates more work upfront, but I think it puts us in a better world and makes it less likely we introduce new bugs.

georgeee commented 2 years ago

bit_catchup.drawio (6).png

MinaProtocol / mina

[Soft fork] Modify validation control flow for Bitswap block propagation #9457