erigontech / erigon

Ethereum implementation on the efficiency frontier https://erigon.gitbook.io
GNU Lesser General Public License v3.0
3.12k stars 1.11k forks source link

Erigon (Polygon Mainnet Archive) syncing slow/stuck at Execution stage #7883

Open Jyoti-singh18 opened 1 year ago

Jyoti-singh18 commented 1 year ago

Our erigon node for Polygon mainnet archive is syncing very slow or is rather stuck on Execution stage [7/15 Execution] from few weeks before this it was stuck on Bodies stage. Below are some of the logs that shows its syncing but the Execution stage is continuing from more than couple weeks.

[INFO] [07-13|12:49:40.944] Fetching state updates from Heimdall fromID=2703876 to=2023-06-23T23:12:46Z [INFO] [07-13|12:49:40.947] StateSyncData number=44261792 lastStateID=2703875 total records=0 fetch time=2 process time=0 [INFO] [07-13|12:49:41.958] Fetching state updates from Heimdall fromID=2703876 to=2023-06-23T23:13:20Z [INFO] [07-13|12:49:41.961] StateSyncData number=44261808 lastStateID=2703875 total records=0 fetch time=3 process time=0 [INFO] [07-13|12:49:42.016] [txpool] stat pending=10000 baseFee=0 queued=30000 alloc=8.5GB sys=12.2GB [INFO] [07-13|12:49:42.570] Fetching state updates from Heimdall fromID=2703876 to=2023-06-23T23:13:54Z [INFO] [07-13|12:49:42.573] StateSyncData number=44261824 lastStateID=2703875 total records=0 fetch time=2 process time=0 [INFO] [07-13|12:49:43.591] Fetching state updates from Heimdall fromID=2703876 to=2023-06-23T23:14:28Z [INFO] [07-13|12:49:43.595] StateSyncData number=44261840 lastStateID=2703875 total records=0 fetch time=3 process time=0 [INFO] [07-13|12:49:44.522] [7/15 Execution] Executed blocks number=44261852 blk/s=18.1 tx/s=814.8 Mgas/s=274.9 gasState=0.22 batch=195.3MB alloc=8.6GB sys=12.2GB [INFO] [07-13|12:49:45.261] Fetching state updates from Heimdall fromID=2703876 to=2023-06-23T23:15:04Z [INFO] [07-13|12:49:45.264] StateSyncData number=44261856 lastStateID=2703875 total records=0 fetch time=3 process time=0 [INFO] [07-13|12:49:46.251] Fetching state updates from Heimdall fromID=2703876 to=2023-06-23T23:15:45Z

Below is the output from "eth_syncing" call shows different blocks for different stages.

{ "jsonrpc": "2.0", "id": 1, "result": { "currentBlock": "0x2a319f0", "highestBlock": "0x2aac6b9", "stages": [ { "stage_name": "Snapshots", "block_number": "0x2a319f0" }, { "stage_name": "Headers", "block_number": "0x2aac6b9" }, { "stage_name": "BlockHashes", "block_number": "0x2aac6b9" }, { "stage_name": "Bodies", "block_number": "0x2aac6b9" }, { "stage_name": "Senders", "block_number": "0x2aac6b9" }, { "stage_name": "Execution", "block_number": "0x2a341eb" }, { "stage_name": "Translation", "block_number": "0x0" }, { "stage_name": "HashState", "block_number": "0x2a319f0" }, { "stage_name": "IntermediateHashes", "block_number": "0x2a319f0" }, { "stage_name": "AccountHistoryIndex", "block_number": "0x2a319f0" }, { "stage_name": "StorageHistoryIndex", "block_number": "0x2a319f0" }, { "stage_name": "LogIndex", "block_number": "0x2a319f0" }, { "stage_name": "CallTraces", "block_number": "0x2a319f0" }, { "stage_name": "TxLookup", "block_number": "0x2a319f0" }, { "stage_name": "Finish", "block_number": "0x2a319f0" } ] } }

The erigon snapshot used is from Polygon official site @ https://snapshots.polygon.technology/

The disk used is 16TB SSD, RAM more than 120G. The current disk size is at below:

image

Another thing to note is we recently upgraded Erigon to [v0.0.8] ( for Polygon Indore HardFork) which is v2.48.0 on Erigon Repo

Could anyone suggest what may be causing the slowness ? .

AskAlexSharov commented 1 year ago

I think this is the reason: https://github.com/ledgerwatch/erigon/issues/7894

Jyoti-singh18 commented 1 year ago

But I do not see any such messages suggesting that erigon isnt able to connect to consensus layer or any kind of timeouts for that matter. Should I post more logs for more information ?

AskAlexSharov commented 1 year ago

Yep

Jyoti-singh18 commented 1 year ago

Yep Will be attaching them in parts. these are all the logs we have got after our recent 0.0.8 upgrade:

erigon_log_1.txt

Jyoti-singh18 commented 1 year ago

erigon_log_2.txt

Jyoti-singh18 commented 1 year ago

Another set of logs showing an error an execution stage,

Staged Sync err="[7/15 Execution] mdbx_txn_commit_ex: MDBX_MAP_FULL: Environment mapsize limit reached"

image

Jyoti-singh18 commented 1 year ago

Yep

Found this recent comment in an old issue. I will be trying that out and giving it a restart.

https://github.com/ledgerwatch/erigon/issues/2888#issuecomment-1624292413

Liamlu28 commented 1 year ago

FYI. I got the same issue on polygon mainnet

{
    "jsonrpc": "2.0",
    "id": 1,
    "result": {
        "currentBlock": "0x2b13a6d",
        "highestBlock": "0x2b13a97",
        "stages": [
            {
                "stage_name": "Snapshots",
                "block_number": "0x2b13a6d"
            },
            {
                "stage_name": "Headers",
                "block_number": "0x2b13a97"
            },
            {
                "stage_name": "BlockHashes",
                "block_number": "0x2b13a97"
            },
            {
                "stage_name": "Bodies",
                "block_number": "0x2b13a97"
            },
            {
                "stage_name": "Senders",
                "block_number": "0x2b13a97"
            },
            {
                "stage_name": "Execution",
                "block_number": "0x2b13a97"
            },
            {
                "stage_name": "Translation",
                "block_number": "0x0"
            },
            {
                "stage_name": "HashState",
                "block_number": "0x2b13a97"
            },
            {
                "stage_name": "IntermediateHashes",
                "block_number": "0x2b13a6d"
            },
            {
                "stage_name": "AccountHistoryIndex",
                "block_number": "0x2b13a6d"
            },
            {
                "stage_name": "StorageHistoryIndex",
                "block_number": "0x2b13a6d"
            },
            {
                "stage_name": "LogIndex",
                "block_number": "0x2b13a6d"
            },
            {
                "stage_name": "CallTraces",
                "block_number": "0x2b13a6d"
            },
            {
                "stage_name": "TxLookup",
                "block_number": "0x2b13a6d"
            },
            {
                "stage_name": "Finish",
                "block_number": "0x2b13a6d"
            }
        ]
    }
}
0xKrishna commented 1 year ago

If it's fetching same state-sync again and again then maybe your heimdall is not working, Please check that.

github-actions[bot] commented 1 year ago

This issue is stale because it has been open for 40 days with no activity. Remove stale label or comment, or this will be closed in 7 days.

1scrooge commented 12 months ago

Hello. I also have a problem with the archived snapshot. I dowloaded a snapshot and launched the node, but it looks the node started syncing from scratch. Launch command:

erigon --chain=bor-mainnet --datadir=/home/erigon/.local/share/erigon --port=30304 --http.vhosts=* --http.addr=0.0.0.0 --http.port=8545 --http --ws --http.api=eth,debug,net,trace,web3,erigon --bor.heimdall=https://heimdall-api.polygon.technology --maxpeers=500 --torrent.download.rate=300mb --bootnodes="enode://b8f1cc9c5d4403703fbf377116469667d2b1823c0daf16b7250aa576bacf399e42c3930ccfcb02c5df6879565a2b8931335565f0e8d3f8e72385ecf4a4bf160a@3.36.224.80:30303","enode://8729e0c825f3d9cad382555f3e46dcff21af323e89025a0e6312df541f4a9e73abfa562d64906f5e59c51fe6f0501b3e61b07979606c56329c020ed739910759@54.194.245.5:30303"

last log:

[INFO] [10-20|07:36:29.242] [7/15 Execution] Executed blocks         number=14676694 blk/s=19.9 tx/s=2613.3 Mgas/s=252.3 gasState=0.08 batch=42.3MB alloc=5.0GB sys=9.5GB
[INFO] [10-20|07:36:40.359] [p2p] GoodPeers                          eth66=87 eth67=47 eth68=68
[INFO] [10-20|07:36:40.981] [txpool] stat                            pending=2001 baseFee=0 queued=15158 alloc=6.3GB sys=9.5GB
[INFO] [10-20|07:36:49.240] [7/15 Execution] Executed blocks         number=14677122 blk/s=21.4 tx/s=2895.7 Mgas/s=262.6 gasState=0.09 batch=46.9MB alloc=7.1GB sys=9.5GB
[INFO] [10-20|07:37:09.212] [7/15 Execution] Executed blocks         number=14677519 blk/s=19.9 tx/s=2548.5 Mgas/s=266.4 gasState=0.10 batch=51.9MB alloc=5.6GB sys=9.5GB
[INFO] [10-20|07:37:21.253] Got new checkpoint from heimdall         start=48931750 end=48933029 rootHash=0xab4f13791d5507ab660207b46ccc9fc90259c27009dd61b54ea119ae58bb9333
[WARN] [10-20|07:37:21.253] Failed to whitelist checkpoint           err="missing blocks"
[WARN] [10-20|07:37:21.253] unable to handle whitelist checkpoint    err="missing blocks"
[INFO] [10-20|07:37:29.198] [7/15 Execution] Executed blocks         number=14677940 blk/s=21.1 tx/s=2443.3 Mgas/s=259.7 gasState=0.11 batch=56.3MB alloc=4.0GB sys=9.5GB
[INFO] [10-20|07:37:49.251] [7/15 Execution] Executed blocks         number=14678364 blk/s=21.1 tx/s=3210.1 Mgas/s=278.7 gasState=0.12 batch=61.2MB alloc=6.1GB sys=9.5GB
[INFO] [10-20|07:38:09.269] [7/15 Execution] Executed blocks         number=14678722 blk/s=17.9 tx/s=2446.1 Mgas/s=265.4 gasState=0.13 batch=67.1MB alloc=4.5GB sys=9.5GB
[INFO] [10-20|07:38:29.225] [7/15 Execution] Executed blocks         number=14679130 blk/s=20.4 tx/s=2936.5 Mgas/s=288.3 gasState=0.14 batch=74.2MB alloc=6.7GB sys=9.5GB
[INFO] [10-20|07:38:49.213] [7/15 Execution] Executed blocks         number=14679724 blk/s=29.7 tx/s=3953.1 Mgas/s=322.2 gasState=0.15 batch=84.1MB alloc=5.6GB sys=9.5GB
[INFO] [10-20|07:39:01.315] Got new checkpoint from heimdall         start=48931750 end=48933029 rootHash=0xab4f13791d5507ab660207b46ccc9fc90259c27009dd61b54ea119ae58bb9333
[WARN] [10-20|07:39:01.315] Failed to whitelist checkpoint           err="missing blocks"
[WARN] [10-20|07:39:01.315] unable to handle whitelist checkpoint    err="missing blocks"
[INFO] [10-20|07:39:09.233] [7/15 Execution] Executed blocks         number=14680328 blk/s=30.2 tx/s=3871.8 Mgas/s=330.5 gasState=0.16 batch=94.7MB alloc=4.5GB sys=9.5GB
[INFO] [10-20|07:39:29.253] [7/15 Execution] Executed blocks         number=14680758 blk/s=21.5 tx/s=4361.3 Mgas/s=336.1 gasState=0.17 batch=105.3MB alloc=7.1GB sys=9.5GB
[INFO] [10-20|07:39:40.358] [p2p] GoodPeers                          eth68=69 eth67=48 eth66=87
[INFO] [10-20|07:39:40.984] [txpool] stat                            pending=2003 baseFee=0 queued=15184 alloc=4.9GB sys=9.5GB
[INFO] [10-20|07:39:49.203] [7/15 Execution] Executed blocks         number=14681347 blk/s=29.5 tx/s=3704.6 Mgas/s=332.2 gasState=0.18 batch=115.7MB alloc=5.9GB sys=9.5GB
[INFO] [10-20|07:40:09.202] [7/15 Execution] Executed blocks         number=14681910 blk/s=28.2 tx/s=3921.1 Mgas/s=302.3 gasState=0.20 batch=125.8MB alloc=4.9GB sys=9.5GB
[INFO] [10-20|07:40:29.219] [7/15 Execution] Executed blocks         number=14682330 blk/s=21.0 tx/s=4224.7 Mgas/s=317.5 gasState=0.21 batch=135.2MB alloc=3.9GB sys=9.5GB
[INFO] [10-20|07:40:41.311] Got new checkpoint from heimdall         start=48931750 end=48933029 rootHash=0xab4f13791d5507ab660207b46ccc9fc90259c27009dd61b54ea119ae58bb9333
[WARN] [10-20|07:40:41.311] Failed to whitelist checkpoint           err="missing blocks"
[WARN] [10-20|07:40:41.311] unable to handle whitelist checkpoint    err="missing blocks"
[INFO] [10-20|07:40:49.233] [7/15 Execution] Executed blocks         number=14682653 blk/s=16.1 tx/s=4581.1 Mgas/s=311.4 gasState=0.22 batch=144.8MB alloc=6.6GB sys=9.5GB