Closed wtdcode closed 1 month ago
thanks for reporting this. The data seems not match with our testing, anyway we will try to reproduce this.
thanks for reporting this. The data seems not match with our testing, anyway we will try to reproduce this.
An update: Adding --optimize.skip-state-root-validation
makes bsc-reth
catch up the head in 2 hour and 30 minutes by executing 20k blocks.
consensus::parlia: Fetching new header block_hash=Number(43483191)
is invalid log. The parlia engine is on standby when the node is doing stage sync.--optimize.skip-state-root-validation
is for fast-node. It will skip state root validation as it says.reth.toml
file in the datadir.
[stages.merkle]
clean_threshold
Change it from default 5000 to 20000 or more.
Reth will try to rebuild the merkle tree if the step size of stage sync is bigger than clean_threshold
. In that case, it will take ~10-20 hours to rebuild depending on machine spec. This processing time is not related to the synchronization height(20k in your case), but rather to the total chain height.
If the step size is smaller than clean_threshold
, reth will try to do state root calc in incremental way, which is far more fast and related to the synchronization height.
Reth will try to rebuild the merkle tree if the step size of stage sync is bigger than
clean_threshold
. In that case, it will take ~10-20 hours to rebuild depending on machine spec. This processing time is not related to the synchronization height(20k in your case), but rather to the total chain height. If the step size is smaller thanclean_threshold
, reth will try to do state root calc in incremental way, which is far more fast and related to the synchronization height.
Nice explanation. So I shall have a higher clean_threshold
for account_hashing
, storage_hashing
and merkel
stages to have inrcremental building?
Seems related to https://github.com/paradigmxyz/reth/pull/7364
Reth reduces the threshold to 5000 to avoid OOM. I can adjust this to much higher I believe.
Maybe we should adjust this for bsc-reth
?
Reth will try to rebuild the merkle tree if the step size of stage sync is bigger than
clean_threshold
. In that case, it will take ~10-20 hours to rebuild depending on machine spec. This processing time is not related to the synchronization height(20k in your case), but rather to the total chain height. If the step size is smaller thanclean_threshold
, reth will try to do state root calc in incremental way, which is far more fast and related to the synchronization height.Nice explanation. So I shall have a higher
clean_threshold
foraccount_hashing
,storage_hashing
andmerkel
stages to have inrcremental building?
Just merkle
is enough in my experience. And be careful of OOM.
Reth will try to rebuild the merkle tree if the step size of stage sync is bigger than
clean_threshold
. In that case, it will take ~10-20 hours to rebuild depending on machine spec. This processing time is not related to the synchronization height(20k in your case), but rather to the total chain height. If the step size is smaller thanclean_threshold
, reth will try to do state root calc in incremental way, which is far more fast and related to the synchronization height.Nice explanation. So I shall have a higher
clean_threshold
foraccount_hashing
,storage_hashing
andmerkel
stages to have inrcremental building?Just
merkle
is enough in my experience. And be careful of OOM.
Thanks for hints!
I have adjusted the clean_threshold to 250k in my case and allocated 512GB memory for bsc reth. Unfortunately, I truncated the tables when switching to fast-node. I think I need to re-download the snapshot (if that is faster). I will report if this works for my side in the following weeks.
Describe the bug
I believe the implementation of
MerkleExecute
has some issues. I started both bsc-erigon (e2) and bsc-reth while bsc-erigon lagged more than 40k blocks and bsc-reth lagged 20k blocks. After 2-3 days, bsc-erigon always sync-ed to the latest block while bsc-reth still gets stuck atMerkleExecute
.Inspecting logs, bsc-reth roughly
MerkleExecute
20k blocks on my machine per 12 hours (roughly 1 block / 2 seconds), which is not too much faster than current bsc mainnet block speed (roughly 1block / 3 seconds). This is even way much slower thanExecution
stage ofbsc-reth
. This makes bsc-reth node almost always lagging. Meanwhile, the CPU (less than 10% CPU), memory (less than 16G) and disk utilization (less than 3000 iops) are all rather low. Note the disks are exactly same (but different arrays to avoid competing) while I allocated 8x more CPU/memory resources (32C/256G) to bsc-reth compared to bsc-erigon (4C32G).Enabling
debug
logs suggest that duringMerkleExecute
, bsc-reth tries to fetch headers from consensus engine instead of DB. I guess this is the root cause of the slowdown but need confirmation from developers. I have no idea why calculating merkle tree needs information from parlia. I can also help draft a PR if I can understand the reason here.Steps to reproduce
Download the snapshots and spin up a mainnet bsc-reth node.
Node logs
Platform(s)
Linux (x86)
What version/commit are you on?
1.0.6-dev (6eddfde1)
What database version are you on?
-
Which chain / network are you on?
bsc
What type of node are you running?
Archive (default)
What prune config do you use, if any?
No response
If you've built Reth from source, provide the full command you used
No response
Code of Conduct