Open nholland94 opened 4 years ago
From my knowledge, this would only happens if there's a fork in the catchup scheduler which means the catchup took a really long time to be finished and it didn't fail. And new catchup jobs are blocked by previous ones and they stack on each other and therefore forms a tree with forks.
The breadcrumbs downloaded by ledger-catchup are always linear in turns of shape.
The Catchup mechanism works by collecting a series of missing intermediate blocks required to connect one or more target blocks to a node's local Transition Frontier. More specifically, Catchup will yield a result as a Rose Tree of blocks, which are then sent to the Breadcrumb Builder, where they are mapped into Breadcrumbs before being bulk inserted into the Transition Frontier. However, there is no guarantees currently about the length of this disconnected Rose Tree of blocks, nor the amount of time it takes for the results of a Catchup job to be inserted into the Transition Frontier. Because of this, it is possible for the result of a Catchup job to be a Rose Tree with a maximum length greater than K. If this happens, and there is more than one branch in this Rose Tree which is greater than K, then this Rose Tree represents a fork in the network. Furthermore, if we attempt to add this Rose Tree to our Transition Frontier (either DFS or BFS), at some point, the Transition Frontier we are accumulating these Breadcrumbs into will become incompatible with the remaining
> K
branches since the Transition Frontier will have a root which is not a member of these remaining branches.In order to fix this problem, we should introduce a new abstraction (yet to be named) which is essentially a Rose Tree which is aware of the possibility of these competing branches. It would hold the invariant that there would only ever be at most 1 branch which is
> K
in the Rose Tree. If another branch is created such that there would be 2> K
branches, this data structure should use the consensus rules to determine which branch should be kept, and should then fully prune the other branch. There will be some trickiness here with still tracking the pruned transitions in the Unprocess Transition Cache until the Rose Tree is consumed, but otherwise, this should not be too difficult to build.For what it is worth, this is an unlikely bug. We should patch it for adversarial resistance purposes, but this scenario should never occur on a network with correct security parameters. This is a good issue for a new protocol engineer to take up as it is fairly well contained yet still involves an understanding of the core participation state machine model and the components involved with local node state in general.