erigontech / erigon

Ethereum implementation on the efficiency frontier https://erigon.gitbook.io
GNU Lesser General Public License v3.0
3.15k stars 1.13k forks source link

Ethereum Node Syncing got Stuck #12837

Open shivraj001 opened 3 days ago

shivraj001 commented 3 days ago

Erigon Version: v2.60.0

Network Type: Mainnet Node

Environment Details:

Operating System: Ubuntu 20.04 Machine Specs: CPU: Intel 8CPU RAM: 32 GB Disk: 30TB Deployment Method: Binary Description of the Issue: The Ethereum archive node using Erigon has stopped syncing at block number 18999999 on the Mainnet. This issue has persisted for several weeks. Syncing stalls without progressing beyond this block, even after restarting the node.

Steps to Reproduce:

Deploy Erigon v2.60.0 in an Ethereum archive node setup for the Mainnet. Start syncing the node. Observe syncing halts at block number 18999999.

Command used to run node: erigon --datadir /datadisk/node/erigon --chain=mainnet --http.api=eth,debug,net,trace,web3 --http.addr=0.0.0.0 --http.port=8545 --http.vhosts=* --metrics --metrics.addr=0.0.0.0 --log.console.verbosity=dbug --log.dir.path=/datadisk/node/logs --txpool.disable --internalcl

Expected Behavior: The node should continue syncing blocks beyond 18999999 without interruption.

Actual Behavior: The node remains stuck at block 18999999, with no further progress.

Logs: Logs captured during the issue are included below:

000000000000, Merge Netsplit: , Shanghai: 1681338455, Cancun: 1710338135, Prague: , Osaka: , Engine: ethash, NoPruneContracts: map[0x00000000219ab540356cBB839Cbe05303d7705Fa:true]}" genesis=0xd4e56740f876aef8c010b86a40d5f56745a118d0906a34e69aec8c0db1cb8fa3 DBUG[11-22|04:55:22.446] [db] open label=downloader sizeLimit=16GB pageSize=4096 DBUG[11-22|04:55:22.486] [downloader] local file hash does not match known file=v1-015000-015500-bodies.seg local=b73b367baaf013d8385a76a865bdbad2809b385f known=7255de209286291b92536dc39011bbc412b0a768 DBUG[11-22|04:55:22.486] [downloader] local file hash does not match known file=v1-014500-015000-bodies.seg local=7e952b86719fd8c795abb11c63503a9427f32c4f known=ee52f47d12eed45f7be6f1946e44e72902305a20 DBUG[11-22|04:55:22.486] [downloader] local file hash does not match known file=v1-018200-018300-bodies.seg local=d5043dfedec41bceb922468df50090b70b1c4c8a known=6cd5dbd8a51bc687362f178be3bb1a0700ca1151 DBUG[11-22|04:55:22.486] [downloader] local file hash does not match known file=v1-018300-018400-bodies.seg local=cf8eb6732cc1755293a6a02dcd92dc1dd9cf6cb2 known=44f4bb367c2794872ac62dd8637ef408572713a0 DBUG[11-22|04:55:22.486] [downloader] local file hash does not match known file=v1-018400-018500-bodies.seg local=fe3ac642940cb55367217ba6400b44fdadc40ed1 known=416a2274d0101e56c9e47b2e3bc5a6469de5df54 DBUG[11-22|04:55:22.486] [downloader] local file hash does not match known file=v1-016000-016500-bodies.seg local=11fb22b8b2415668b26841174e85727951772937 known=910ed8afd69ce09e1543487825f3a90f755f79df DBUG[11-22|04:55:22.486] [downloader] local file hash does not match known file=v1-017000-017500-bodies.seg local=eb0b538a5d7c1eacfc0bc8cf1214c2e381ce3087 known=0e0ea3be565df0821c70a78bfa6cbe135a67e64b DBUG[11-22|04:55:22.486] [downloader] local file hash does not match known file=v1-017500-018000-bodies.seg local=656969e6a2cf1e704b973e20af4382ac966835ae known=b725c77a8afdf536b0b72966b2dee1c1e2d88e0c DBUG[11-22|04:55:22.486] [downloader] local file hash does not match known file=v1-015500-016000-bodies.seg local=1f559cbc16d5e01ba1e31d892bdfc7baf25cac53 known=12ded631880b051c700abb700d4b48155e68b08e DBUG[11-22|04:55:22.486] [downloader] local file hash does not match known file=v1-016500-017000-bodies.seg local=b72b80e722e8c4dd8c393dc5d1f4e69d8be3d312 known=d098d8f7c9d3fdcff92feabe01cdc2f27f539c76 DBUG[11-22|04:55:22.486] [downloader] local file hash does not match known file=v1-018000-018100-bodies.seg local=99a4b4d8382c86c7c745c81f6647672b65d2db9b known=17693397d915b4aac755f11e54306e8966ce1c40 DBUG[11-22|04:55:22.486] [downloader] local file hash does not match known file=v1-018100-018200-bodies.seg local=dd1f25eaca5afdeb5f4b37d8da6f89d2106270eb known=9ba1712859a78aaabd934686c1b3934358ac88ff DBUG[11-22|04:55:22.486] [downloader] local file hash does not match known file=v1-018500-018600-bodies.seg local=73ad11574bdb5345373a9edab8b949ae1a564d48 known=7c4c95a98046e56ce0045a550c551592693d281c DBUG[11-22|04:55:22.486] [downloader] local file hash does not match known file=v1-018600-018700-bodies.seg local=731a3579ca9e86608ea1ba668aaffb2a107a081c known=c10c7138967149cea4a9da9eb779e0a71f223ead DBUG[11-22|04:55:22.486] [downloader] local file hash does not match known file=v1-018900-019000-bodies.seg local=6883d2501de0ca385cccaa1a2bab726e47176902 known=c94c56736466d2bdbee5c88d7bf027d3681c9110 DBUG[11-22|04:55:22.486] [downloader] local file hash does not match known file=v1-018700-018800-bodies.seg local=2501ab81654fd0b234c819fead66ab197f4c0438 known=2a008a1054c43fd6df76e6845d04d54d7cd2ba60 DBUG[11-22|04:55:22.486] [downloader] local file hash does not match known file=v1-018800-018900-bodies.seg local=55266ef559f6f0f188d398b9a77ee1e9dd7157de known=93ce47fc00a0d834d98080157d4f25cb8ce58fa9

DBUG[11-22|04:57:52.593] Received block via gossip slot=10452190 DBUG[11-22|04:57:52.593] Block scheduled for later processing block=10452190 DBUG[11-22|04:57:52.696] import operations time=1.272µs DBUG[11-22|04:57:53.623] Error in DNS random node sync tree=all.mainnet.ethdisco.net err="lookup MG6ATRV2DSOEX2QGPW7RHHRSPE.all.mainnet.ethdisco.net on 127.0.0.53:53: no such host" DBUG[11-22|04:58:01.358] Received blob sidecar via gossip index=3 size=128KB DBUG[11-22|04:58:01.358] Received block via gossip slot=10452288 DBUG[11-22|04:58:01.358] Block scheduled for later processing block=10452288 DBUG[11-22|04:58:01.359] Received blob sidecar via gossip index=2 size=128KB DBUG[11-22|04:58:01.385] Received blob sidecar via gossip index=0 size=128KB DBUG[11-22|04:58:01.647] Received blob sidecar via gossip index=1 size=128KB DBUG[11-22|04:58:06.666] Error in DNS random node sync tree=all.mainnet.ethdisco.net err="lookup YIK7J6QF5HLFLUGMDBCJWNU5RU.all.mainnet.ethdisco.net on 127.0.0.53:53: no such host" DBUG[11-22|04:58:12.440] Received block via gossip slot=10452289 DBUG[11-22|04:58:12.440] Block scheduled for later processing block=104522

DBUG[11-22|05:02:21.552] [downloader] Collecting... from=21241053 to=21241053 len=1 DBUG[11-22|05:02:21.552] [downloader] posAnchor is nil INFO[11-22|05:02:21.974] Node is still syncing... downloading past blocks app=caplin stage=DownloadHistoricalBlocks slot=10447100 blockNumber=21235864 blk/sec=7.8 snapshots=0 DBUG[11-22|05:02:22.463] [downloader] Collecting... from=21241053 to=21241053 len=1 DBUG[11-22|05:02:22.463] [downloader] posAnchor is nil DBUG[11-22|05:02:24.365] Received block via gossip slot=10452310 DBUG[11-22|05:02:24.365] Block scheduled for later processing block=10452310 DBUG[11-22|05:02:31.174] Error in DNS random node sync tree=all.mainnet.ethdisco.net err="lookup YIXI2ANEAWCPNDH2ZMDXDUANEI.all.mainnet.ethdisco.net on 127.0.0.53:53: no such host" DBUG[11-22|05:02:37.727] Received blob sidecar via gossip index=5 size=128KB DBUG[11-22|05:02:37.746] Received block via gossip slot=10452311 DBUG[11-22|05:02:37.746] Block scheduled for later processing block=10452311 DBUG[11-22|05:02:37.786] Received blob sidecar via gossip index=0 size=128KB DBUG[11-22|05:02:37.809] Received blob sidecar via gossip index=1 size=128KB DBUG[11-22|05:02:37.839] Received blob sidecar via gossip index=4 size=128KB DBUG[11-22|05:02:37.866] [downloader] Collecting... from=13773036 to=137

Any kind of support is appreciated.

AskAlexSharov commented 3 days ago

add --internalcl

shivraj001 commented 3 days ago

@AskAlexSharov --internalcl flag is included already.

erigon --datadir /datadisk/node/erigon --chain=mainnet --http.api=eth,debug,net,trace,web3 --http.addr=0.0.0.0 --http.port=8545 --http.vhosts=* --metrics --metrics.addr=0.0.0.0 --log.console.verbosity=dbug --log.dir.path=/datadisk/node/logs --txpool.disable --internalcl

AskAlexSharov commented 3 days ago

grep -v DBUG

lystopad commented 3 days ago

@shivraj001 , could you, please, clarify erigon version? Also, could you, please, try with latest 2.60.10 ?

shivraj001 commented 2 days ago

@AskAlexSharov @lystopad for some reason everyday it gets Killed with below log.

DBUG[11-22|22:33:36.699] Received blob sidecar via gossip index=2 size=128KB DBUG[11-22|22:33:46.692] [p2p] Dial scheduler protocol=68 peers=84/33 tried=26978 static=0 i/o timeout=4151 connect: connection refused=196 connect: no route to host=67 connect: connection reset by peer=10 DBUG[11-22|22:33:46.613] [p2p] Server protocol=67 peers=32 trusted=0 inbound=0 too many peers=62992 EOF=6446 closed by remote=10650 i/o timeout=7443 already connected=256 DBUG[11-22|22:33:47.130] [p2p] Discovery table protocol=68 version=v4 len=180 live=170 unsol=500 ips=279 db=0 reval=12662 RPC timeout=412 invalid ID in response record=9 invalid IP in response record: LAN address from WAN host=39 unknown node=18 unsolicited reply=197 expired=3 DBUG[11-22|22:33:48.029] [p2p] Discovery table protocol=67 version=v4 len=189 live=182 unsol=500 ips=298 db=0 reval=12665 RPC timeout=364 invalid IP in response record: loopback address from non-loopback host=2 invalid IP in response record: LAN address from WAN host=34 invalid ID in response record=7 unsolicited reply=1244 unknown node=165 expired=20 DBUG[11-22|22:33:49.470] [p2p] Server protocol=68 peers=85 trusted=0 inbound=53 i/o timeout=493 ecies: invalid message=120 already connected=9 unexpected EOF=3 invalid node identity=1 too many peers=13521 EOF=1284 closed by remote=2380 DBUG[11-22|22:33:49.741] [p2p] Dial scheduler protocol=67 peers=32/33 tried=127715 static=0 i/o timeout=16022 connect: connection refused=927 connect: no route to host=238 connect: connection reset by peer=1 INFO[11-22|22:33:52.385] Node is still syncing... downloading past blocks app=caplin stage=DownloadHistoricalBlocks slot=9006359 blockNumber=19802918 blk/sec=2.1 snapshots=0 DBUG[11-22|22:34:16.595] Received blob sidecar via gossip index=0 size=128KB DBUG[11-22|22:34:16.809] Received blob sidecar via gossip index=2 size=128KB DBUG[11-22|22:34:18.587] Received blob sidecar via gossip index=2 size=128KB DBUG[11-22|22:34:18.817] [p2p] Discovery table protocol=any version=v5 len=185 live=181 unsol=0 ips=269 db=0 reval=12672 RPC timeout=808 0 nodes in response for distance zero=2 INFO[11-22|22:34:18.845] P2P app=caplin peers=68 DBUG[11-22|22:34:19.917] Received blob sidecar via gossip index=3 size=128KB DBUG[11-22|22:34:21.343] Received blob sidecar via gossip index=3 size=128KB INFO[11-22|22:34:43.718] [p2p] GoodPeers eth67=30 eth68=83 eth66=2 DBUG[11-22|22:34:46.640] [p2p] Server protocol=68 peers=83 trusted=0 inbound=52 too many peers=13526 EOF=1292 closed by remote=2380 i/o timeout=494 ecies: invalid message=120 already connected=9 unexpected EOF=3 invalid node identity=1 DBUG[11-22|22:34:46.932] [p2p] Discovery table protocol=68 version=v4 len=185 live=168 unsol=500 ips=284 db=0 reval=12666 RPC timeout=414 invalid ID in response record=9 invalid IP in response record: LAN address from WAN host=39 unsolicited reply=205 expired=3 unknown node=18 DBUG[11-22|22:34:46.671] [p2p] Discovery table protocol=67 version=v4 len=189 live=180 unsol=500 ips=297 db=0 reval=12668 invalid IP in response record: LAN address from WAN host=34 invalid ID in response record=7 RPC timeout=366 invalid IP in response record: loopback address from non-loopback host=2 unsolicited reply=1257 unknown node=165 expired=27 DBUG[11-22|22:34:46.689] [p2p] Server protocol=67 peers=32 trusted=0 inbound=0 too many peers=62996 EOF=6447 closed by remote=10651 i/o timeout=7444 already connected=256 DBUG[11-22|22:34:46.768] [p2p] Dial scheduler protocol=68 peers=83/33 tried=27000 static=0 i/o timeout=4155 connect: connection refused=196 connect: no route to host=67 connect: connection reset by peer=13 DBUG[11-22|22:34:48.083] [p2p] Dial scheduler protocol=67 peers=32/33 tried=127725 static=0 i/o timeout=16023 connect: connection refused=927 connect: no route to host=238 connect: connection reset by peer=2 INFO[11-22|22:34:53.116] [mem] memory stats Rss=14.3GB Size=0B Pss=14.3GB SharedClean=4.0KB SharedDirty=0B PrivateClean=4.3MB PrivateDirty=14.3GB Referenced=14.1GB Anonymous=14.3GB Swap=0B alloc=13.8GB sys=14.5GB DBUG[11-22|22:35:10.862] Received block via gossip slot=10457567 DBUG[11-22|22:35:12.464] Block scheduled for later processing block=10457567 DBUG[11-22|22:35:19.966] [p2p] Discovery table protocol=any version=v5 len=184 live=180 unsol=0 ips=266 db=0 reval=12676 RPC timeout=811 0 nodes in response for distance zero=2 INFO[11-22|22:35:22.933] P2P app=caplin peers=60 Killed

shivraj001 commented 19 hours ago

@AskAlexSharov @lystopad I have updated to the latest version 2.60.10 but still I'm facing syncing issue.

AskAlexSharov commented 13 hours ago

grep -v DBUG

AskAlexSharov commented 13 hours ago

try GOGC=50

AskAlexSharov commented 13 hours ago

also can show: go tool pprof -alloc_objects -png http://127.0.0.1:6060/debug/pprof/heap > mem.png