erigontech / erigon

Ethereum implementation on the efficiency frontier https://erigon.gitbook.io
GNU Lesser General Public License v3.0
3.09k stars 1.08k forks source link

e3: amoy: mem leak when `No block headers to write in this log period` #10734

Open AskAlexSharov opened 2 months ago

AskAlexSharov commented 2 months ago
[INFO] [06-12|05:04:54.090] [2/15 Headers] No block headers to write in this log period number=8140100 blk/second=0.000 alloc=110.0GB sys=124.1GB invalidHeaders=0 rejectedBadHeaders=1844
[INFO] [06-12|05:04:54.090] Req/resp stats                           req=4331 reqMin=8139909 reqMax=8173939 skel=29 skelMin=8140099 skelMax=8177155 resp=11908 respMin=8139910 respMax=8173942 dups=1398907 alloc=110.0GB sys=124.1GB

alloc=110.0GB

last logs:

[DBUG] [06-12|05:05:12.861] [2/15 Headers] Requested header          from=8140291 length=192
[WARN] [06-12|05:05:12.864] [downloader] Rejected header marked as bad hash=0x0134869cbf3e179d6d3503d91fe5f76c3b298a4d82ae1a591ca8f525240228d7 height=8140101
[WARN] [06-12|05:05:12.869] [downloader] Rejected header marked as bad hash=0x0134869cbf3e179d6d3503d91fe5f76c3b298a4d82ae1a591ca8f525240228d7 height=8140101
[WARN] [06-12|05:05:12.870] [downloader] Rejected header marked as bad hash=0x0134869cbf3e179d6d3503d91fe5f76c3b298a4d82ae1a591ca8f525240228d7 height=8140101
[WARN] [06-12|05:05:12.873] [downloader] Rejected header marked as bad hash=0x0134869cbf3e179d6d3503d91fe5f76c3b298a4d82ae1a591ca8f525240228d7 height=8140101
[WARN] [06-12|05:05:12.874] [downloader] Rejected header marked as bad hash=0x0134869cbf3e179d6d3503d91fe5f76c3b298a4d82ae1a591ca8f525240228d7 height=8140101
[WARN] [06-12|05:05:12.901] [downloader] Rejected header marked as bad hash=0x0134869cbf3e179d6d3503d91fe5f76c3b298a4d82ae1a591ca8f525240228d7 height=8140101
[WARN] [06-12|05:05:12.907] [downloader] Rejected header marked as bad hash=0x0134869cbf3e179d6d3503d91fe5f76c3b298a4d82ae1a591ca8f525240228d7 height=8140101
[DBUG] [06-12|05:05:12.915] [2/15 Headers] Requested header          from=8140292 length=192
[DBUG] [06-12|05:05:12.918] slow peer or too many requests, dropping its old requests name=erigon/v2.59.3-088fd...
[DBUG] [06-12|05:05:12.974] [2/15 Headers] Requested header          from=8155538 length=192
[DBUG] [06-12|05:05:13.566] [2/15 Headers] Requested skeleton        from=8140099 length=192
[DBUG] [06-12|05:05:14.278] [bor.heimdall] Got new milestone         start=8173890 end=8173912
[DBUG] [06-12|05:05:14.544] [bor] current head block behind incoming block=milestone head=8140100 end block=8173912
[DBUG] [06-12|05:05:18.061] [2/15 Headers] Requested skeleton        from=8140099 length=192
[DBUG] [06-12|05:05:22.150] [2/15 Headers] Requested header          from=8140292 length=192
[WARN] [06-12|05:05:25.469] [bor.heimdall] an error while fetching   path=/milestone/lastNoAck queryParams= attempt=1 err="Get \"https://heimdall-api-amoy.polygon.technology/milestone/lastNoAck\": context deadline exceeded"
[DBUG] [06-12|05:05:36.445] [bor.heimdall] request canceled          reason="context deadline exceeded" path=/milestone/lastNoAck queryParams= attempt=1
[WARN] [06-12|05:05:40.199] [bor.heimdall] Failed to fetch latest no-ack milestone err="context deadline exceeded"
[WARN] [06-12|05:05:40.551] [bor.heimdall] an error while fetching   path=/milestone/latest queryParams= attempt=1 err="Get \"https://heimdall-api-amoy.polygon.technology/milestone/latest\": context deadline exceeded"
[WARN] [06-12|05:05:43.345] [bor] unable to handle no-ack-milestone service err="failed to fetch latest milestone"
[DBUG] [06-12|05:05:50.722] [p2p] Discovery table                    protocol=67 version=v4 len=197 live=182 unsol=258 ips=284 db=0 reval=513 invalid IP in response record: LAN address from WAN host=4 RPC timeout=29 unsolicited reply=25 unknown node=68 expired=126
[DBUG] [06-12|05:05:49.965] [p2p] Server                             protocol=68 peers=7 trusted=0 inbound=0 write tcp 51.210.214.106:30303->84.247.174.229:57032: write: broken pipe=1 write tcp 51.210.214.106:30303->38.242.206.150:37512: write: broken pipe=1 write tcp 51.210.214.106:30303->5.182.33.115:47960: write: broken pipe=1 write tcp 51.210.214.106:30303->152.53.1.39:43570: write: broken pipe=1 write tcp 51.210.214.106:30303->198.24.170.214:49964: write: broken pipe=1 EOF=2580 write tcp 51.210.214.106:30303->74.201.193.158:31566: write: broken pipe=1 write tcp 51.210.214.106:30303->152.53.1.39:34850: write: broken pipe=1 write tcp 51.210.214.106:30303->72.251.237.4:50144: write: broken pipe=1 write tcp 51.210.214.106:30303->65.109.235.111:47292: write: broken pipe=1 write tcp 51.210.214.106:30303->74.201.193.229:50182: write: broken pipe=1 useless peer=1 write tcp 51.210.214.106:30303->74.201.193.244:43114: write: broken pipe=1 write tcp 51.210.214.106:30303->131.153.169.78:49396: write: broken pipe=1 write tcp 51.210.214.106:30303->143.198.84.113:48460: write: broken pipe=1 write tcp 51.210.214.106:30303->136.243.102.211:53328: write: broken pipe=1 write tcp 51.210.214.106:30303->194.233.75.136:41556: write: broken pipe=1 write tcp 51.210.214.106:30303->65.21.57.251:45136: write: broken pipe=1 write tcp 51.210.214.106:30303->51.138.200.10:3137: write: broken pipe=1 write tcp 51.210.214.106:30303->54.214.33.132:9407: write: broken pipe=1 write tcp 51.210.214.106:30303->122.202.14.37:4302: write: broken pipe=1 write tcp 51.210.214.106:30303->120.133.26.20:24570: write: broken pipe=1 write tcp 51.210.214.106:30303->144.91.87.17:49222: write: broken pipe=1 ecies: invalid message=54 write tcp 51.210.214.106:30303->194.163.163.11:55204: write: broken pipe=1 write tcp 51.210.214.106:30303->142.215.209.8:31410: write: broken pipe=1 write tcp 51.210.214.106:30303->31.220.99.133:56978: write: broken pipe=1 write tcp 51.210.214.106:30303->194.233.75.139:55926: write: broken pipe=1 write tcp 51.210.214.106:30303->194.163.160.188:59010: write: broken pipe=1 write tcp 51.210.214.106:30303->72.251.237.13:42320: write: broken pipe=1 write tcp 51.210.214.106:30303->72.251.237.18:40936: write: broken pipe=1 write tcp 51.210.214.106:30303->128.199.42.64:43276: write: broken pipe=1 write tcp 51.210.214.106:30303->91.210.101.51:29316: write: broken pipe=1 too many peers=610 closed by remote=21271 i/o timeout=2026
[DBUG] [06-12|05:05:50.722] [p2p] Dial scheduler                     protocol=67 peers=0/33 tried=31461 static=0 connect: connection refused=70 i/o timeout=338 connect: no route to host=3 connect: connection reset by peer=21
[DBUG] [06-12|05:05:51.602] [p2p] Discovery table                    protocol=68 version=v4 len=160 live=157 unsol=79 ips=233 db=0 reval=511 RPC timeout=12 expired=120 unsolicited reply=12 unknown node=23
[DBUG] [06-12|05:05:51.631] [p2p] Server                             protocol=67 peers=0 trusted=0 inbound=0 i/o timeout=1886 write tcp 51.210.214.106:30304->95.216.159.246:39678: write: broken pipe=1 write tcp 51.210.214.106:30304->194.163.160.191:43482: write: broken pipe=1 write tcp 51.210.214.106:30304->117.27.228.180:50038: write: broken pipe=1 closed by remote=23794 ecies: invalid message=46 write tcp 51.210.214.106:30304->138.201.249.97:52522: write: broken pipe=1 write tcp 51.210.214.106:30304->46.4.40.44:57286: write: broken pipe=1 write tcp 51.210.214.106:30304->62.171.163.167:41694: write: broken pipe=1 EOF=3509 write tcp 51.210.214.106:30304->65.21.244.80:32802: write: broken pipe=1 write tcp 51.210.214.106:30304->23.88.99.44:57748: write: broken pipe=1 write tcp 51.210.214.106:30304->38.242.234.2:53274: write: broken pipe=1 write tcp 51.210.214.106:30304->209.126.84.5:55992: write: broken pipe=1 write tcp 51.210.214.106:30304->65.109.224.35:42574: write: broken pipe=1 too many peers=385 already connected=31 write tcp 51.210.214.106:30304->173.212.236.16:60638: write: broken pipe=1 write tcp 51.210.214.106:30304->124.220.166.178:63268: write: broken pipe=1 write tcp 51.210.214.106:30304->207.246.127.166:38518: write: broken pipe=1
[DBUG] [06-12|05:05:51.548] [p2p] Dial scheduler                     protocol=68 peers=7/33 tried=29009 static=0 i/o timeout=844 connect: connection refused=83 connect: no route to host=7 connect: connection reset by peer=14
[WARN] [06-12|05:05:55.745] [bor.heimdall] an error while fetching   path=/milestone/lastNoAck queryParams= attempt=1 err="Get \"https://heimdall-api-amoy.polygon.technology/milestone/lastNoAck\": context deadline exceeded"
[WARN] [06-12|05:05:59.577] [bor.heimdall] an error while fetching   path=/milestone/latest queryParams= attempt=2 err="Get \"https://heimdall-api-amoy.polygon.technology/milestone/latest\": context deadline exceeded"
[DBUG] [06-12|05:06:42.340] [bor.heimdall] request canceled          reason="context deadline exceeded" path=/milestone/lastNoAck queryParams= attempt=1
[WARN] [06-12|05:06:47.950] [bor.heimdall] an error while fetching   path=/checkpoints/latest queryParams= attempt=1 err="Get \"https://heimdall-api-amoy.polygon.technology/checkpoints/latest\": context deadline exceeded"
[WARN] [06-12|05:06:50.191] [bor.heimdall] Failed to fetch latest no-ack milestone err="context deadline exceeded"
[DBUG] [06-12|05:06:50.765] [p2p] Server                             protocol=68 peers=7 trusted=0 inbound=0 i/o timeout=2031 write tcp 51.210.214.106:30303->128.199.42.64:43276: write: broken pipe=1 write tcp 51.210.214.106:30303->91.210.101.51:29316: write: broken pipe=1 too many peers=610 closed by remote=21271 write tcp 51.210.214.106:30303->152.53.1.39:34850: write: broken pipe=1 write tcp 51.210.214.106:30303->84.247.174.229:57032: write: broken pipe=1 write tcp 51.210.214.106:30303->38.242.206.150:37512: write: broken pipe=1 write tcp 51.210.214.106:30303->5.182.33.115:47960: write: broken pipe=1 write tcp 51.210.214.106:30303->152.53.1.39:43570: write: broken pipe=1 write tcp 51.210.214.106:30303->198.24.170.214:49964: write: broken pipe=1 EOF=2580 write tcp 51.210.214.106:30303->74.201.193.158:31566: write: broken pipe=1 write tcp 51.210.214.106:30303->72.251.237.4:50144: write: broken pipe=1 write tcp 51.210.214.106:30303->65.109.235.111:47292: write: broken pipe=1 write tcp 51.210.214.106:30303->131.153.169.78:49396: write: broken pipe=1 write tcp 51.210.214.106:30303->74.201.193.229:50182: write: broken pipe=1 useless peer=1 write tcp 51.210.214.106:30303->74.201.193.244:43114: write: broken pipe=1 write tcp 51.210.214.106:30303->65.21.57.251:45136: write: broken pipe=1 write tcp 51.210.214.106:30303->143.198.84.113:48460: write: broken pipe=1 write tcp 51.210.214.106:30303->136.243.102.211:53328: write: broken pipe=1 write tcp 51.210.214.106:30303->194.233.75.136:41556: write: broken pipe=1 write tcp 51.210.214.106:30303->51.138.200.10:3137: write: broken pipe=1 write tcp 51.210.214.106:30303->142.215.209.8:31410: write: broken pipe=1 write tcp 51.210.214.106:30303->54.214.33.132:9407: write: broken pipe=1 write tcp 51.210.214.106:30303->122.202.14.37:4302: write: broken pipe=1 write tcp 51.210.214.106:30303->120.133.26.20:24570: write: broken pipe=1 write tcp 51.210.214.106:30303->144.91.87.17:49222: write: broken pipe=1 ecies: invalid message=55 write tcp 51.210.214.106:30303->194.163.163.11:55204: write: broken pipe=1 write tcp 51.210.214.106:30303->194.163.160.188:59010: write: broken pipe=1 write tcp 51.210.214.106:30303->31.220.99.133:56978: write: broken pipe=1 write tcp 51.210.214.106:30303->194.233.75.139:55926: write: broken pipe=1 write tcp 51.210.214.106:30303->72.251.237.13:42320: write: broken pipe=1 write tcp 51.210.214.106:30303->72.251.237.18:40936: write: broken pipe=1
[DBUG] [06-12|05:06:52.317] [p2p] Server                             protocol=67 peers=0 trusted=0 inbound=0 i/o timeout=1889 write tcp 51.210.214.106:30304->95.216.159.246:39678: write: broken pipe=1 write tcp 51.210.214.106:30304->194.163.160.191:43482: write: broken pipe=1 write tcp 51.210.214.106:30304->117.27.228.180:50038: write: broken pipe=1 closed by remote=23794 ecies: invalid message=46 write tcp 51.210.214.106:30304->138.201.249.97:52522: write: broken pipe=1 write tcp 51.210.214.106:30304->46.4.40.44:57286: write: broken pipe=1 write tcp 51.210.214.106:30304->62.171.163.167:41694: write: broken pipe=1 EOF=3509 write tcp 51.210.214.106:30304->65.21.244.80:32802: write: broken pipe=1 write tcp 51.210.214.106:30304->23.88.99.44:57748: write: broken pipe=1 write tcp 51.210.214.106:30304->38.242.234.2:53274: write: broken pipe=1 write tcp 51.210.214.106:30304->209.126.84.5:55992: write: broken pipe=1 write tcp 51.210.214.106:30304->65.109.224.35:42574: write: broken pipe=1 too many peers=385 already connected=31 write tcp 51.210.214.106:30304->173.212.236.16:60638: write: broken pipe=1 write tcp 51.210.214.106:30304->124.220.166.178:63268: write: broken pipe=1 write tcp 51.210.214.106:30304->207.246.127.166:38518: write: broken pipe=1
[DBUG] [06-12|05:06:52.234] [p2p] Dial scheduler                     protocol=68 peers=7/33 tried=29009 static=0 i/o timeout=844 connect: connection refused=83 connect: no route to host=7 connect: connection reset by peer=14
[DBUG] [06-12|05:06:53.776] [p2p] Dial scheduler                     protocol=67 peers=0/33 tried=31461 static=0 connect: connection refused=70 i/o timeout=338 connect: no route to host=3 connect: connection reset by peer=21
[INFO] [06-12|05:06:53.786] [p2p] GoodPeers                          eth68=7
[DBUG] [06-12|05:06:54.324] [p2p] Discovery table                    protocol=67 version=v4 len=197 live=181 unsol=258 ips=282 db=0 reval=515 invalid IP in response record: LAN address from WAN host=4 RPC timeout=31 unsolicited reply=25 unknown node=68 expired=136
[WARN] [06-12|05:07:12.086] [bor.heimdall] an error while fetching   path=/milestone/latest queryParams= attempt=3 err="Get \"https://heimdall-api-amoy.polygon.technology/milestone/latest\": context deadline exceeded"
AskAlexSharov commented 2 months ago
Screenshot 2024-06-13 at 17 10 48
taratorio commented 4 weeks ago

@AskAlexSharov the screenshot only shows 4 GB - I imagine it was not taken at the right time?

I added https://github.com/erigontech/erigon/pull/11549 for that. If you have some time can you restart your Erigon instance with latest code on main and with export SAVE_HEAP_PROFILE=true env var please? I'll run an Amoy instance now too and let it run in the hopes I can reproduce.

AskAlexSharov commented 1 week ago

Saw it on Amoy snapshotter today. here is what in logs if grep -v this warning:

[WARN] [08-24|01:07:09.774] Sanitizing invalid bor miner gas price   provided=1000000000 updated=25000000000
[WARN] [08-24|01:07:09.774] Sanitizing invalid bor min fee cap       provided=1 updated=25000000000.000
[WARN] [08-24|01:09:08.880] [7/9 Execution] Execution failed         block=10954598 txNum=36126721 hash=0xbec0aa1033826a45a1e5faba3319445f2ac85849d131f4c0a169c0dfa9baabb6 err="invalid block, txnIdx=1, receiptHash mismatch: 2e8eabdeaf20429d2175676ba2e6400f19a5701a0646c9b62422f4ebff09412b != 51478baf47ac21f86ec338e2e187b1b989787c19e0caea2469b964a54a48fbcf, headerNum=10954598, bec0aa1033826a45a1e5faba3319445f2ac85849d131f4c0a169c0dfa9baabb6"
[EROR] [08-24|01:09:08.908] Staged Sync                              err="bad block unwinding"

So, the reason is: unwind happened and after unwind - node can't function correctly. So, it bug of unwind.

Recent logs:

[WARN] [08-24|02:27:02.283] [downloader] InsertHeader: Rejected header marked as bad hash=0xbec0aa1033826a45a1e5faba3319445f2ac85849d131f4c0a169c0dfa9baabb6 height=10954598
[WARN] [08-24|02:27:02.428] [downloader] InsertHeader: Rejected header marked as bad hash=0xbec0aa1033826a45a1e5faba3319445f2ac85849d131f4c0a169c0dfa9baabb6 height=10954598
[INFO] [08-24|02:28:52.479] [p2p] GoodPeers                          eth68=7 eth66=1 eth67=4
[INFO] [08-24|02:32:47.710] [p2p] GoodPeers                          eth68=3 eth67=4 eth66=1
[INFO] [08-24|02:35:03.373] [p2p] GoodPeers                          eth68=2 eth67=2 eth66=1
[WARN] [08-24|02:37:14.898] [p2p] No GoodPeers
[INFO] [08-24|02:38:03.573] [2/9 Headers] No block headers to write in this log period number=10954597 blk/second=0.000 alloc=115.1GB sys=123.5GB invalidHeaders=0 rejectedBadHeaders=2546
[INFO] [08-24|02:38:49.178] [mem] memory stats                       Rss=124.0GB Size=0B Pss=124.0GB SharedClean=2.4MB SharedDirty=0B PrivateClean=24.0KB PrivateDirty=124.0GB Referenced=124.0GB Anonymous=124.0GB Swap=0B alloc=115.1GB sys=123.5GB
[INFO] [08-24|02:39:39.324] Req/resp stats                           req=5311 reqMin=10954406 reqMax=11094481 skel=25 skelMin=10954596 skelMax=10991652 resp=17034 respMin=10954407 respMax=11094490 dups=2210572 alloc=115.1GB sys=123.5GB
[WARN] [08-24|02:40:16.299] [p2p] No GoodPeers
[WARN] [08-24|02:43:14.321] [p2p] No GoodPeers
taratorio commented 1 week ago

@AskAlexSharov are you running the snapshotter with SAVE_HEAP_PROFILE=true? Do you see any logs [Experiment] heap profile threshold check and [Experiment] saving heap profile as near OOM?

AskAlexSharov commented 1 week ago

SAVE_HEAP_PROFILE=true - no. but i can add.

taratorio commented 1 week ago

yes please! it will be good to get the heap profile at the right time