Closed frrist closed 4 years ago
Overall looks awesome, really excited to watch the performance accelerate with caching 🏎
Are there changes down the road that will negate the requirement that we must walk the entire chain (Gut says yes 🤞 )
Gut is correct!
For the GetRecentAncestors
case:
Proving period is likely going down by a factor of 20 when that parameter is set for mainnet and we should reduce it speculatively right now anyway. Additionally #2223 opens up possible designs that require traversing much less data (traverse ticket chain separately from blockchain in advance or random access of tickets from state machine).
For the calculate reorg case: I have a patch in the works that got delayed because of the release blocking message pool bug. This patch stops traversing really soon in the common case because it can find the common ancestor without stupidly walking the whole chain. It will take ~1 day to get this in. In the longer term once we have a finality bound this common ancestor traversal will be bounded to a reasonable traversal length (< 1000 blocks) even in the uncommon case where the two chains diverged as far back as genesis.
I did a bit of testing to compare GetBlock
performance with and without a cache, below is my setup and initial findings.
I added a TwoQueueCache around the GetBlock
method and set the cache size to 1000
. Next I added some metrics to track cache hits and misses. Lastly, some timers around calls to syncOne.
I then created a local network using the localnet fast binary and spawned off 2 different network:
Data Point | GetBlock With Cache | GetBlock Without Cache |
---|---|---|
Cache Hits | 1075717 | 0 |
Cache Miss | 25707 | 1101424 |
SyncOne Average | 25.56ms | 47.00 ms |
SyncOne Total | 27.78s | 49.16s |
Data Point | GetBlock With Cache | GetBlock Without Cache |
---|---|---|
Cache Hits | 6652743 | 0 |
Cache Miss | 18622184 | 25274927 |
SyncOne Average | 99.93ms | 117.18ms |
SyncOne Total | 501.79s | 588.84s |
Initial observations indicate that chain processing performs really well when everything fits in the cache.
Unfortunately this won't always be the case and the second table with chain length 5000 demonstrates this. Ideally the size of the cache could be configured as a function of AncestorRoundsNeeded
and minHeight
but the likelihood of that is unclear at the moment.
Next steps might be adding a metric that tracks average size of blocks. This would help to determine a reasonable size for the GetBlock cache if we decide to move forward with it. Another step could be looking at ways to combine the GetBlock cache with the tipIndex
cache as they contain similar pieces of data.
I'm moving this to backlog, as I think we've identified algorithmic changes to limit chain traversal that will initially be higher impact (by avoiding GetBlock altogether).
I recently made changes so that most chain traversals will hit the tipset cache (in memory) rather than the store. I expect this to have removed most of this decoding overhead. If profiling proves this out, I propose closing this issue for now.
Recent profiles taken against filecoin nodes on the testnet_0.3.1 indicate that this change is no longer needed. Closing.
Description
In #2675 we discovered that we spend ~42% of our cumulative execution time in the
GetBlock
method when performing the initial chain sync: https://github.com/filecoin-project/go-filecoin/blob/522cefefb2b6db8537d8742a4b86801b5ed4e939/chain/default_store.go#L283-L291 Lets looks at that callsGetBlock
the most:Tipset Iterator
Here is the definition of
TipsetIterator
. ItsNext()
method callsGetParentTipSet()
, which callsGetBlock
in a loop: https://github.com/filecoin-project/go-filecoin/blob/522cefefb2b6db8537d8742a4b86801b5ed4e939/chain/traversal.go#L16-L34 Now let's see how and where theTipsetIterator
is used.syncOne
syncOne
is called for each new tipset a node receives. Here we will show that it walks the entire blockchain twice indirectly usingGetBlock
to fetch the blocks as it traverses:syncOne
first call'sGetRecentAncestors
: https://github.com/filecoin-project/go-filecoin/blob/522cefefb2b6db8537d8742a4b86801b5ed4e939/chain/default_syncer.go#L206-L215AncestorRouncsNeeded
is currently set to20100
, it is the sum ofProvingPeriodBlocks
andGracePeriodBlocks
: https://github.com/filecoin-project/go-filecoin/blob/522cefefb2b6db8537d8742a4b86801b5ed4e939/consensus/expected.go#L70-L72Notice on line 65 in the below code snippet
IterAncestors
is called and returns aTipsetIterator
which we have defined above. The call toCollectTipSetsOfHeightAtLeast
will use the iterator when walking the chain.https://github.com/filecoin-project/go-filecoin/blob/522cefefb2b6db8537d8742a4b86801b5ed4e939/chain/get_ancestors.go#L51-L90
The above code shows that
GetRecentAncestors
will always walk back at most20100
, and when the chain is less than20100
blocks long,GetRecentAncestors
will walk the whole chain. We call this method everytime we receive a new block from the network. This method uses a TipsetIterator to walk the chain, and we have shown that the iterator makes calls toGetBlock
.Next, let's look at what
syncOne
does if the tipset we received is heavier than out current: https://github.com/filecoin-project/go-filecoin/blob/522cefefb2b6db8537d8742a4b86801b5ed4e939/chain/default_syncer.go#L258-L268 we callCollectTipSetsOfHeightAtLeast
passing in0
as the value of minHeight: https://github.com/filecoin-project/go-filecoin/blob/522cefefb2b6db8537d8742a4b86801b5ed4e939/chain/get_ancestors.go#L92-L112This means that
CollectTipSetsOfHeightAtLeast
will walk the entire chain since every block has a height >0
.CollectTipSetsOfHeightAtLeast
uses aIterAncestors
to traverse the chain. The means more calls toGetBlock
.Why is GetBlock so slow?
Calls to
GetBlock
are slow for 2 reasons:I believe caching the decoded blocks in a TwoQueueCache will improve performance of this operation. There are block that we access frequently (genesis, genesis+1, genesis+2, ...) and there are blocks that we access recently (head, head-1, head-2, ...), a TwoQueueCache is designed for this type of access.
Acceptance criteria
GetBlock
method such that it caches the decoded blocks.Risks + pitfalls
Related issues:
The above issue with chain performance have been previously outlined in #2151 and #966.