erigontech / erigon

Ethereum implementation on the efficiency frontier https://erigon.gitbook.io
GNU Lesser General Public License v3.0
3.1k stars 1.09k forks source link

Polygon: OOM Rejected Headers #11387

Open MrFreezeDZ opened 1 month ago

MrFreezeDZ commented 1 month ago

System information

Erigon version from the logs:

{
git_branch: "heads/v2.60.1"
git_commit: "9471c44e04f43a735dc9f40b74f552d706d05f78"
git_tag: "v2.60.1-dirty"
lvl: "info"
msg: "Build info"
t: "2024-07-29T02:30:51.285312853Z"
}

OS & Version: Kubernetes, Image from here: https://hub.docker.com/r/thorax/erigon The Erigon container has resource limits of 48 CPUs and 208GiB Memory.

Erigon Command (with flags/config):

erigon --chain=bor-mainnet
--datadir=/data/
--log.json=true
--http.addr=0.0.0.0
--http.vhosts=*
--http
--http.api=eth,admin,debug,net,trace,web3,erigon,txpool
--ws
--bor.heimdall=http://heimdallrest
--authrpc.vhosts=*
--authrpc.jwtsecret=/secret/jwt.hex
--authrpc.addr=0.0.0.0
--db.size.limit=7TB
--db.pagesize=16k
--metrics
--metrics.addr=0.0.0.0
--prune.h.before=48896162
--prune.r.before=48896162
--prune.t.before=48896162
--prune.c.before=48896162
--maxpeers=500
--torrent.download.rate=300mb
--bootnodes=enode://bdcd4786a616a853b8a041f53496d853c68d99d54ff305615cd91c03cd56895e0a7f6e9f35dbf89131044e2114a9a782b792b5661e3aff07faf125a98606a071@43.200.206.40:30303,enode://209aaf7ed549cf4a5700fd833da25413f80a1248bd3aa7fe2a87203e3f7b236dd729579e5c8df61c97bf508281bae4969d6de76a7393bcbd04a0af70270333b3@54.216.248.9:30303

Consensus Layer: Heimdall 1.0.7

Consensus Layer Command (with flags/config): /usr/bin/heimdalld start --home=/heimdall-home

Chain/Network: bor-mainnet

Expected behaviour

When Erigon realizes that a header is rejected an the same header will be present again and again, it should unwind itself some blocks, I guess.

Actual behaviour

Erigon starts spamming log messages and will use more memory then the container has as memory limit. Then Kubernetes will OOMKill the container and the Container is restarted. After Erigon's restart the same messages will occur until the next OOMKill. Erigon logs lines similar to this with only a few milliseconds between each line until it is OOMKilled:

{hash: 0x6035a4866ff16fd8079640701ea82230dcce59ccd7259f69981c26eb85ebe62e, height: 59891025, lvl: warn, msg: [downloader] Rejected header marked as bad, t: 2024-07-29T07:57:58.985590688Z}
{hash: 0x6035a4866ff16fd8079640701ea82230dcce59ccd7259f69981c26eb85ebe62e, height: 59891025, lvl: warn, msg: [downloader] Rejected header marked as bad, t: 2024-07-29T07:57:58.99246926Z}
{hash: 0x6035a4866ff16fd8079640701ea82230dcce59ccd7259f69981c26eb85ebe62e, height: 59891025, lvl: warn, msg: [downloader] Rejected header marked as bad, t: 2024-07-29T07:57:58.994019486Z}
{hash: 0x6035a4866ff16fd8079640701ea82230dcce59ccd7259f69981c26eb85ebe62e, height: 59891025, lvl: warn, msg: [downloader] Rejected header marked as bad, t: 2024-07-29T07:57:58.99991692Z}
{hash: 0x6035a4866ff16fd8079640701ea82230dcce59ccd7259f69981c26eb85ebe62e, height: 59891025, lvl: warn, msg: [downloader] Rejected header marked as bad, t: 2024-07-29T07:57:59.001144471Z}
{hash: 0x6035a4866ff16fd8079640701ea82230dcce59ccd7259f69981c26eb85ebe62e, height: 59891025, lvl: warn, msg: [downloader] Rejected header marked as bad, t: 2024-07-29T07:57:59.002410234Z}
{hash: 0x6035a4866ff16fd8079640701ea82230dcce59ccd7259f69981c26eb85ebe62e, height: 59891025, lvl: warn, msg: [downloader] Rejected header marked as bad, t: 2024-07-29T07:57:59.003713997Z}
{hash: 0x6035a4866ff16fd8079640701ea82230dcce59ccd7259f69981c26eb85ebe62e, height: 59891025, lvl: warn, msg: [downloader] Rejected header marked as bad, t: 2024-07-29T07:57:59.004939701Z}
{hash: 0x6035a4866ff16fd8079640701ea82230dcce59ccd7259f69981c26eb85ebe62e, height: 59891025, lvl: warn, msg: [downloader] Rejected header marked as bad, t: 2024-07-29T07:57:59.007611193Z}
{hash: 0x6035a4866ff16fd8079640701ea82230dcce59ccd7259f69981c26eb85ebe62e, height: 59891025, lvl: warn, msg: [downloader] Rejected header marked as bad, t: 2024-07-29T07:57:59.025658567Z}
{hash: 0x6035a4866ff16fd8079640701ea82230dcce59ccd7259f69981c26eb85ebe62e, height: 59891025, lvl: warn, msg: [downloader] Rejected header marked as bad, t: 2024-07-29T07:57:59.033892633Z}
{hash: 0x6035a4866ff16fd8079640701ea82230dcce59ccd7259f69981c26eb85ebe62e, height: 59891025, lvl: warn, msg: [downloader] Rejected header marked as bad, t: 2024-07-29T07:57:59.034955129Z}

Steps to reproduce the behaviour

I do not know how to actively reproduce this behavior. It occurred on the last three weekends.

Backtrace

These are just the logs copied from Googles LogExplorer. Here one can see the restart and the beginning of the "rejected header" logs.

ERROR 2024-07-29T02:30:51.285192185Z {"file prefix":"erigon", "json":false, "log dir":"/data/logs", "log level":"info", "lvl":"info", "msg":"logging to file system", "t":"2024-07-29T02:30:51.284924089Z"}
ERROR 2024-07-29T02:30:51.285360559Z {"lvl":"info", "msg":"Enabling metrics export to prometheus", "path":"http://0.0.0.0:6060/debug/metrics/prometheus", "t":"2024-07-29T02:30:51.285281497Z"}
ERROR 2024-07-29T02:30:51.285380361Z {"git_branch":"heads/v2.60.1", "git_commit":"9471c44e04f43a735dc9f40b74f552d706d05f78", "git_tag":"v2.60.1-dirty", "lvl":"info", "msg":"Build info", "t":"2024-07-29T02:30:51.285312853Z"}
ERROR 2024-07-29T02:30:51.285388465Z {"lvl":"info", "msg":"Starting Erigon on Bor Mainnet...", "t":"2024-07-29T02:30:51.285341217Z"}
ERROR 2024-07-29T02:30:51.294152120Z {"ETH":500, "lvl":"info", "msg":"Maximum peer count", "t":"2024-07-29T02:30:51.294044065Z", "total":500}
ERROR 2024-07-29T02:30:51.294231859Z {"APIs":"eth,admin,debug,net,trace,web3,erigon,txpool", "lvl":"info", "msg":"starting HTTP APIs", "port":8545, "t":"2024-07-29T02:30:51.294176693Z"}
DEFAULT 2024-07-29T02:30:51.294631252Z {"level":"WRN", "lvl":"info", "msg":"torrent verbosity", "t":"2024-07-29T02:30:51.294577647Z"}
ERROR 2024-07-29T02:30:53.398830626Z {"cap":5.0E7, "lvl":"info", "msg":"Set global gas cap", "t":"2024-07-29T02:30:53.398566491Z"}
ERROR 2024-07-29T02:30:53.400168956Z {"download.rate":"300mb", "ipv4-enabled":true, "ipv6-enabled":true, "lvl":"info", "msg":"[Downloader] Running with", "t":"2024-07-29T02:30:53.40006929Z", "upload.rate":"4mb"}
ERROR 2024-07-29T02:30:53.402730340Z {"label":"chaindata", "lvl":"info", "msg":"Opening Database", "path":"/data/chaindata", "t":"2024-07-29T02:30:53.40262083Z"}
ERROR 2024-07-29T02:30:53.413283853Z {"label":"chaindata", "lvl":"info", "msg":"[db] open", "pageSize":16384, "sizeLimit":"7TB", "t":"2024-07-29T02:30:53.413165095Z"}
ERROR 2024-07-29T02:30:53.445312770Z {"lvl":"warn", "msg":"Sanitizing invalid bor miner gas price", "provided":"1000000000", "t":"2024-07-29T02:30:53.445187632Z", "updated":"30000000000"}
ERROR 2024-07-29T02:30:53.445325789Z {"lvl":"warn", "msg":"Sanitizing invalid bor min fee cap", "provided":1, "t":"2024-07-29T02:30:53.445247681Z", "updated":3.0E10}
ERROR 2024-07-29T02:30:53.445390269Z {"config":"{ChainID: 137, Homestead: 0, DAO: <nil>, Tangerine Whistle: 0, Spurious Dragon: 0, Byzantium: 0, Constantinople: 0, Petersburg: 0, Istanbul: 3395000, Muir Glacier: 3395000, Berlin: 14750000, London: 23850000, Arrow Glacier: <nil>, Gray Glacier: <nil>, Terminal Total Difficulty: <nil>, Merge Netsplit: <nil>, Shanghai: <nil>, Cancun: <nil>, Prague: <nil>, Osaka: <nil>, Engine: bor, NoPruneContracts: map[]}", "genesis":"0xa9c28ce2141b56c474f1dc504bee9b01eb1bd7d1a507580d5519d4437a97de1b", "lvl":"info", "msg":"Initialised chain configuration", "t":"2024-07-29T02:30:53.445300789Z"}
ERROR 2024-07-29T02:31:47.820764791Z {"lvl":"info", "msg":"Initialising Ethereum protocol", "network":137, "t":"2024-07-29T02:31:47.820580651Z"}
ERROR 2024-07-29T02:31:47.820787601Z {"label":"bor", "lvl":"info", "msg":"Opening Database", "path":"/data/bor", "t":"2024-07-29T02:31:47.820691593Z"}
ERROR 2024-07-29T02:31:55.694311894Z mdbx_setup_dxb:16114 opening after an unclean shutdown, but boot-id(847d4bd58b8c010e-ac661d66623646f0) is MATCH: rollback NOT needed, steady-sync NEEDED
ERROR 2024-07-29T02:31:55.697926161Z {"lvl":"info", "msg":"Starting private RPC server", "on":"127.0.0.1:9090", "t":"2024-07-29T02:31:55.6978114Z"}
ERROR 2024-07-29T02:31:55.698055581Z {"lvl":"info", "msg":"new subscription to logs established", "t":"2024-07-29T02:31:55.69792753Z"}
ERROR 2024-07-29T02:31:55.698176089Z {"lvl":"info", "msg":"rpc filters: subscribing to Erigon events", "t":"2024-07-29T02:31:55.698102688Z"}
ERROR 2024-07-29T02:31:55.698284592Z {"lvl":"info", "msg":"New txs subscriber joined", "t":"2024-07-29T02:31:55.698193201Z"}
ERROR 2024-07-29T02:31:55.698316791Z {"lvl":"info", "msg":"new subscription to newHeaders established", "t":"2024-07-29T02:31:55.698262819Z"}
ERROR 2024-07-29T02:31:55.699142531Z {"grpc":false, "http.url":"[::]:8545", "lvl":"info", "msg":"JsonRpc endpoint opened", "t":"2024-07-29T02:31:55.699027753Z", "websocket.url":"[::]:8546", "ws":true, "ws.compression":true}
ERROR 2024-07-29T02:31:55.704554504Z {"lvl":"info", "msg":"[txpool] Started", "t":"2024-07-29T02:31:55.704457876Z"}
ERROR 2024-07-29T02:31:55.707848100Z mdbx_setup_dxb:16114 opening after an unclean shutdown, but boot-id(847d4bd58b8c010e-ac661d66623646f0) is MATCH: rollback NOT needed, steady-sync NEEDED
ERROR 2024-07-29T02:31:55.713259023Z {"err":"no UPnP or NAT-PMP router discovered", "lvl":"warn", "msg":"NAT ExternalIP resolution has failed, try to pass a different --nat option", "t":"2024-07-29T02:31:55.713128377Z"}
ERROR 2024-07-29T02:31:55.714388499Z {"lvl":"info", "msg":"Started P2P networking", "name":"erigon/v2.60.1-9471c44e/linux-amd64/go1.21.5", "self":"enode://6b7469e82e1699591029807cdce3cd372098340f122de38c912cfeb8ff2a35839cb22d99f6593998fadf4fa733e4c12b104367894506bcf55bf9bb8cf815c469@127.0.0.1:30304", "t":"2024-07-29T02:31:55.714288741Z", "version":67}
ERROR 2024-07-29T02:31:55.715816455Z mdbx_setup_dxb:16114 opening after an unclean shutdown, but boot-id(847d4bd58b8c010e-ac661d66623646f0) is MATCH: rollback NOT needed, steady-sync NEEDED
ERROR 2024-07-29T02:31:55.721069959Z {"err":"no UPnP or NAT-PMP router discovered", "lvl":"warn", "msg":"NAT ExternalIP resolution has failed, try to pass a different --nat option", "t":"2024-07-29T02:31:55.720950266Z"}
ERROR 2024-07-29T02:31:55.722180434Z {"lvl":"info", "msg":"Started P2P networking", "name":"erigon/v2.60.1-9471c44e/linux-amd64/go1.21.5", "self":"enode://6b7469e82e1699591029807cdce3cd372098340f122de38c912cfeb8ff2a35839cb22d99f6593998fadf4fa733e4c12b104367894506bcf55bf9bb8cf815c469@127.0.0.1:30303", "t":"2024-07-29T02:31:55.722074372Z", "version":68}
ERROR 2024-07-29T02:31:55.760600630Z {"lvl":"info", "msg":"[1/15 Snapshots] Requesting downloads", "t":"2024-07-29T02:31:55.760444836Z"}
ERROR 2024-07-29T02:31:59.071241522Z {"alloc":"4.3GB", "blocks":"59800k", "indices":"59800k", "lvl":"info", "msg":"[snapshots:download] Stat", "sys":"6.9GB", "t":"2024-07-29T02:31:59.071057632Z"}
ERROR 2024-07-29T02:31:59.071829784Z {"from":5.9891024E7, "lvl":"info", "msg":"[2/15 Headers] Waiting for headers...", "t":"2024-07-29T02:31:59.071719647Z"}
ERROR 2024-07-29T02:32:15.544290838Z {"age":"8s", "blk/sec":2516, "headers":41460, "highest":5.9932484E7, "in":16.472392297, "lvl":"info", "msg":"[2/15 Headers] Processed", "t":"2024-07-29T02:32:15.544109115Z"}
ERROR 2024-07-29T02:32:15.545458573Z {"from":9359, "lvl":"info", "msg":"[3/15 BorHeimdall] Processing spans...", "t":"2024-07-29T02:32:15.54535721Z", "to":9365}
ERROR 2024-07-29T02:32:15.578701931Z {"from":5.9891025E7, "lvl":"info", "msg":"[3/15 BorHeimdall] Processing sync events...", "t":"2024-07-29T02:32:15.578537126Z", "to":5.9932484E7}
ERROR 2024-07-29T02:32:19.369843815Z {"fetch time":"1.475624082s", "lastCheckpointId":0, "lastMilestoneId":0, "lastSpanID":9365, "lastStateSyncEventID":2940431, "lvl":"info", "msg":"[3/15 BorHeimdall] Sync events processed", "process time":"3.825342255s", "progress":5.9932484E7, "snap time":"3.189122541s", "sync event time":"1.70059327s", "t":"2024-07-29T02:32:19.369585203Z", "total records":820, "waypoint time":"0s"}
ERROR 2024-07-29T02:32:49.414018597Z {"current_prefix":"937c8bf5", "into":"HeaderNumber", "lvl":"info", "msg":"[4/15 BlockHashes] ETL [2/2] Loading", "t":"2024-07-29T02:32:49.413873177Z"}
ERROR 2024-07-29T02:33:11.836453400Z {"from":5.9891024E7, "lvl":"info", "msg":"[5/15 Bodies] Processing bodies...", "t":"2024-07-29T02:33:11.836264264Z", "to":5.9932484E7}
ERROR 2024-07-29T02:33:35.732227721Z {"end":5.9930789E7, "lvl":"info", "msg":"[bor.heimdall] Got new checkpoint", "rootHash":"0x6a8f8c2e5c99586cbfc7d182d37ba49e16b47f3ef209301ffe1061f3eb7ec8a3", "start":5.9930022E7, "t":"2024-07-29T02:33:35.732089634Z"}
ERROR 2024-07-29T02:34:47.801928583Z {"eth67":"3", "eth68":"15", "lvl":"info", "msg":"[p2p] GoodPeers", "t":"2024-07-29T02:34:47.801694009Z"}
ERROR 2024-07-29T02:34:48.139835525Z {"Anonymous":"8.3GB", "PrivateClean":"4.0GB", "PrivateDirty":"8.3GB", "Pss":"12.3GB", "Referenced":"12.2GB", "Rss":"12.3GB", "SharedClean":"4.0KB", "SharedDirty":"0B", "Size":"0B", "Swap":"0B", "alloc":"4.9GB", "lvl":"info", "msg":"[mem] memory stats", "sys":"8.2GB", "t":"2024-07-29T02:34:48.139641879Z"}
ERROR 2024-07-29T02:34:55.699178871Z {"alloc":"5.4GB", "baseFee":0, "lvl":"info", "msg":"[txpool] stat", "pending":0, "queued":7670, "sys":"8.2GB", "t":"2024-07-29T02:34:55.698978804Z"}
ERROR 2024-07-29T02:35:15.720206133Z {"end":5.9930789E7, "lvl":"info", "msg":"[bor.heimdall] Got new checkpoint", "rootHash":"0x6a8f8c2e5c99586cbfc7d182d37ba49e16b47f3ef209301ffe1061f3eb7ec8a3", "start":5.9930022E7, "t":"2024-07-29T02:35:15.720051854Z"}
ERROR 2024-07-29T02:35:57.280200680Z {"alloc":"5.5GB", "block_num":5.9891024E7, "lvl":"info", "msg":"[5/15 Bodies] Writing block bodies", "remaining":41460, "sys":"8.3GB", "t":"2024-07-29T02:35:57.27997419Z"}
ERROR 2024-07-29T02:36:12.883640945Z {"alloc":"4.6GB", "block_num":5.9901335E7, "lvl":"info", "msg":"[5/15 Bodies] Writing block bodies", "remaining":31149, "sys":"8.6GB", "t":"2024-07-29T02:36:12.883419322Z"}
ERROR 2024-07-29T02:36:41.772258159Z {"blk/sec":197, "blocks":41460, "highest":5.9932484E7, "in":209.935968206, "lvl":"info", "msg":"[5/15 Bodies] Processed", "t":"2024-07-29T02:36:41.772054037Z"}
ERROR 2024-07-29T02:36:41.772295196Z {"in":"3m29.936104521s", "lvl":"info", "msg":"[5/15 Bodies] DONE", "t":"2024-07-29T02:36:41.772142299Z"}
ERROR 2024-07-29T02:36:41.772379103Z {"from":5.9891024E7, "lvl":"info", "msg":"[6/15 Senders] Started", "t":"2024-07-29T02:36:41.772295035Z", "to":5.9932484E7}
ERROR 2024-07-29T02:36:42.074922778Z {"from":5.9891024E7, "lvl":"info", "msg":"[7/15 Execution] Blocks execution", "t":"2024-07-29T02:36:42.074742392Z", "to":5.9932484E7}
ERROR 2024-07-29T02:36:44.079172106Z {"block":5.9891025E7, "err":"invalid block: mismatched receipt headers for block 59891025 (0xf273f745cdbb0d48e208d3864d838130531770319fb0160996a0159dbd17da81 != 0x42a0be310674e0f5dbdf6afa6726b8eda1a19bb97bb335e38089d01c2168d246)", "hash":"0x6035a4866ff16fd8079640701ea82230dcce59ccd7259f69981c26eb85ebe62e", "lvl":"warn", "msg":"[7/15 Execution] Execution failed", "t":"2024-07-29T02:36:44.078968458Z"}
ERROR 2024-07-29T02:36:44.084391968Z {"block":5.9891024E7, "lvl":"info", "msg":"[7/15 Execution] Completed on", "t":"2024-07-29T02:36:44.084265281Z"}
ERROR 2024-07-29T02:36:49.174644419Z {"amount":1024, "from":5.9889967E7, "hash":"0xfc1ea0f5adbc25d9962e454091007cd845cf4c76a349b1479391acbb2624fec0", "header sending":"7.418631ms", "log sending":"287ns", "lvl":"info", "msg":"RPC Daemon notified of new headers", "t":"2024-07-29T02:36:49.174440316Z", "to":5.9890991E7}
ERROR 2024-07-29T02:36:49.175663131Z {"BlockHashes":"52.466s", "Bodies":"3m29.936s", "BorHeimdall":"3.825s", "Execution":"2.009s", "Headers":"16.472s", "Senders":"302ms", "Snapshots":"3.361s", "Unwind Headers":"5.061s", "lvl":"info", "msg":"Timings (slower than 50ms)", "t":"2024-07-29T02:36:49.175550806Z"}
ERROR 2024-07-29T02:36:49.198690629Z {"from":5.9891024E7, "lvl":"info", "msg":"[2/15 Headers] Waiting for headers...", "t":"2024-07-29T02:36:49.198541947Z"}
ERROR 2024-07-29T02:36:49.261542730Z {"hash":"0x6035a4866ff16fd8079640701ea82230dcce59ccd7259f69981c26eb85ebe62e", "height":5.9891025E7, "lvl":"warn", "msg":"[downloader] Rejected header marked as bad", "t":"2024-07-29T02:36:49.261358825Z"}
ERROR 2024-07-29T02:36:49.261966448Z {"hash":"0x6035a4866ff16fd8079640701ea82230dcce59ccd7259f69981c26eb85ebe62e", "height":5.9891025E7, "lvl":"warn", "msg":"[downloader] Rejected header marked as bad", "t":"2024-07-29T02:36:49.261890351Z"}
ERROR 2024-07-29T02:36:49.328662150Z {"hash":"0x6035a4866ff16fd8079640701ea82230dcce59ccd7259f69981c26eb85ebe62e", "height":5.9891025E7, "lvl":"warn", "msg":"[downloader] Rejected header marked as bad", "t":"2024-07-29T02:36:49.32846549Z"}
ERROR 2024-07-29T02:36:49.386449225Z {"hash":"0x6035a4866ff16fd8079640701ea82230dcce59ccd7259f69981c26eb85ebe62e", "height":5.9891025E7, "lvl":"warn", "msg":"[downloader] Rejected header marked as bad", "t":"2024-07-29T02:36:49.3862906Z"}
ERROR 2024-07-29T02:36:49.790972966Z {"hash":"0x6035a4866ff16fd8079640701ea82230dcce59ccd7259f69981c26eb85ebe62e", "height":5.9891025E7, "lvl":"warn", "msg":"[downloader] Rejected header marked as bad", "t":"2024-07-29T02:36:49.790800634Z"}
ERROR 2024-07-29T02:36:49.911756960Z {"hash":"0x6035a4866ff16fd8079640701ea82230dcce59ccd7259f69981c26eb85ebe62e", "height":5.9891025E7, "lvl":"warn", "msg":"[downloader] Rejected header marked as bad", "t":"2024-07-29T02:36:49.911611888Z"}
ERROR 2024-07-29T02:36:50.206767534Z {"hash":"0x6035a4866ff16fd8079640701ea82230dcce59ccd7259f69981c26eb85ebe62e", "height":5.9891025E7, "lvl":"warn", "msg":"[downloader] Rejected header marked as bad", "t":"2024-07-29T02:36:50.206576862Z"}
ERROR 2024-07-29T02:36:50.206821975Z {"hash":"0x6035a4866ff16fd8079640701ea82230dcce59ccd7259f69981c26eb85ebe62e", "height":5.9891025E7, "lvl":"warn", "msg":"[downloader] Rejected header marked as bad", "t":"2024-07-29T02:36:50.206764102Z"}
ERROR 2024-07-29T02:36:50.219485885Z {"hash":"0x6035a4866ff16fd8079640701ea82230dcce59ccd7259f69981c26eb85ebe62e", "height":5.9891025E7, "lvl":"warn", "msg":"[downloader] Rejected header marked as bad", "t":"2024-07-29T02:36:50.219392892Z"}
ERROR 2024-07-29T02:36:50.230286687Z {"hash":"0x6035a4866ff16fd8079640701ea82230dcce59ccd7259f69981c26eb85ebe62e", "height":5.9891025E7, "lvl":"warn", "msg":"[downloader] Rejected header marked as bad", "t":"2024-07-29T02:36:50.2301451Z"}
ERROR 2024-07-29T02:36:50.234570253Z {"hash":"0x6035a4866ff16fd8079640701ea82230dcce59ccd7259f69981c26eb85ebe62e", "height":5.9891025E7, "lvl":"warn", "msg":"[downloader] Rejected header marked as bad", "t":"2024-07-29T02:36:50.234439624Z"}
ERROR 2024-07-29T02:36:50.563872297Z {"hash":"0x6035a4866ff16fd8079640701ea82230dcce59ccd7259f69981c26eb85ebe62e", "height":5.9891025E7, "lvl":"warn", "msg":"[downloader] Rejected header marked as bad", "t":"2024-07-29T02:36:50.56371213Z"}
ERROR 2024-07-29T02:36:51.207357544Z {"hash":"0x6035a4866ff16fd8079640701ea82230dcce59ccd7259f69981c26eb85ebe62e", "height":5.9891025E7, "lvl":"warn", "msg":"[downloader] Rejected header marked as bad", "t":"2024-07-29T02:36:51.207194009Z"}
ERROR 2024-07-29T02:36:51.207440633Z {"hash":"0x6035a4866ff16fd8079640701ea82230dcce59ccd7259f69981c26eb85ebe62e", "height":5.9891025E7, "lvl":"warn", "msg":"[downloader] Rejected header marked as bad", "t":"2024-07-29T02:36:51.207378046Z"}
ERROR 2024-07-29T02:36:51.207823654Z {"hash":"0x6035a4866ff16fd8079640701ea82230dcce59ccd7259f69981c26eb85ebe62e", "height":5.9891025E7, "lvl":"warn", "msg":"[downloader] Rejected header marked as bad", "t":"2024-07-29T02:36:51.207755826Z"}
ERROR 2024-07-29T02:36:51.209211246Z {"hash":"0x6035a4866ff16fd8079640701ea82230dcce59ccd7259f69981c26eb85ebe62e", "height":5.9891025E7, "lvl":"warn", "msg":"[downloader] Rejected header marked as bad", "t":"2024-07-29T02:36:51.209125422Z"}
ERROR 2024-07-29T02:36:51.209293379Z {"hash":"0x6035a4866ff16fd8079640701ea82230dcce59ccd7259f69981c26eb85ebe62e", "height":5.9891025E7, "lvl":"warn", "msg":"[downloader] Rejected header marked as bad", "t":"2024-07-29T02:36:51.209239268Z"}
ERROR 2024-07-29T02:36:51.303280502Z {"hash":"0x6035a4866ff16fd8079640701ea82230dcce59ccd7259f69981c26eb85ebe62e", "height":5.9891025E7, "lvl":"warn", "msg":"[downloader] Rejected header marked as bad", "t":"2024-07-29T02:36:51.30313245Z"}
MrFreezeDZ commented 1 month ago

This was our issue before with Erigon in version 2.59.3: https://github.com/erigontech/erigon/issues/10873

AskAlexSharov commented 1 month ago

similar to https://github.com/erigontech/erigon/issues/10734

taratorio commented 1 month ago

@MrFreezeDZ hey, to help us troubleshoot this can you please:

This branch will try to capture the heap profile when near OOM when we face the stage headers "Rejected header marked as bad" situation and will save it in $TMPDIR/erigon-mem.prof. You can do echo $TMPDIR to find your OS tmpdir.

When this issue re-occurs please send us that file. Or can send us a png after doing go tool pprof -png $TMPDIR/erigon-mem.prof > erigon-mem-prof.png

taratorio commented 3 weeks ago

@MrFreezeDZ this change is now in v2.60.6 so you can update to that and won't need to use the branch I mentioned in my previous message.

Were you able to run Erigon with export SAVE_HEAP_PROFILE=true && ./build/bin/ergion and reproduce the OOM again?

MrFreezeDZ commented 2 weeks ago

Hi, I just came back from holidays :) Right now we were not able to reproduce the behavior with the enabled flag. Just to be clear, we are not able to actively reproduce the behavior. We just have the problem from time to time. We had it four weeks in a row on the weekends. Then we had two weeks without any problem and on the last weekend it happened again. I will use the environment variable SAVE_HEAP_PROFILE=true in our next deployment with version 2.60.6, but this will take some time.

AskAlexSharov commented 2 weeks ago

Recently i saw:

[WARN] [08-30|02:08:46.608] [7/9 Execution] Execution failed         block=61101954 txNum=4495312603 hash=0x24079d49bc63a8dfed171ec5d26f381ad5e7d08849e0eed51eabcf6caa5d8413 err="invalid block, txnIdx=59, gas used by execution: 10771434, in header: 18160412, headerNum=61101954, 24079d49bc63a8dfed171ec5d26f381ad5e7d08849e0eed51eabcf6caa5d8413"
[INFO] [08-30|02:08:46.608] [7/9 Execution] Done                     blk=61101953 blks=1 blk/s=7.0 txs=104 tx/s=729 gas/s=123.37M buf=114.8KB/2.0GB stepsInDB=0.00 step=2877.0 alloc=22.3GB sys=34.5GB
[EROR] [08-30|02:08:46.640] Staged Sync                              err="bad block unwinding"
[INFO] [08-30|02:08:47.141] [2/9 Headers] Waiting for headers...     from=61101953 hash=0xab87f6a3a5a1c8c400a22c9fef0f0c6c54e99764f496b32768af78c7661f00bb
[WARN] [08-30|02:08:47.382] [downloader] InsertHeader: Rejected header marked as bad hash=0x24079d49bc63a8dfed171ec5d26f381ad5e7d08849e0eed51eabcf6caa5d8413 height=61101954
[WARN] [08-30|02:08:47.383] [downloader] InsertHeader: Rejected header marked as bad hash=0x24079d49bc63a8dfed171ec5d26f381ad5e7d08849e0eed51eabcf6caa5d8413 height=61101954
[WARN] [08-30|02:08:47.383] [downloader] InsertHeader: Rejected header marked as bad hash=0x24079d49bc63a8dfed171ec5d26f381ad5e7d08849e0eed51eabcf6caa5d8413 height=61101954
[WARN] [08-30|02:08:47.407] [downloader] InsertHeader: Rejected header marked as bad hash=0x24079d49bc63a8dfed171ec5d26f381ad5e7d08849e0eed51eabcf6caa5d8413 height=61101954
[WARN] [08-30|02:08:47.423] [downloader] InsertHeader: Rejected header marked as bad hash=0x24079d49bc63a8dfed171ec5d26f381ad5e7d08849e0eed51eabcf6caa5d8413 height=61101954
[WARN] [08-30|02:08:47.425] [downloader] InsertHeader: Rejected header marked as bad hash=0x24079d49bc63a8dfed171ec5d26f381ad5e7d08849e0eed51eabcf6caa5d8413 height=61101954
[WARN] [08-30|02:08:47.451] [downloader] InsertHeader: Rejected header marked as bad hash=0x24079d49bc63a8dfed171ec5d26f381ad5e7d08849e0eed51eabcf6caa5d8413 height=61101954
[WARN] [08-30|02:08:47.465] [downloader] InsertHeader: Rejected header marked as bad hash=0x24079d49bc63a8dfed171ec5d26f381ad5e7d08849e0eed51eabcf6caa5d8413 height=61101954
[WARN] [08-30|02:08:47.517] [downloader] InsertHeader: Rejected header marked as bad hash=0x24079d49bc63a8dfed171ec5d26f381ad5e7d08849e0eed51eabcf6caa5d8413 height=61101954
[WARN] [08-30|02:08:47.551] [downloader] InsertHeader: Rejected header marked as bad hash=0x24079d49bc63a8dfed171ec5d26f381ad5e7d08849e0eed51eabcf6caa5d8413 height=61101954
[WARN] [08-30|02:08:47.571] [downloader] InsertHeader: Rejected header marked as bad hash=0x24079d49bc63a8dfed171ec5d26f381ad5e7d08849e0eed51eabcf6caa5d8413 height=61101954
[WARN] [08-30|02:08:47.571] [downloader] InsertHeader: Rejected header marked as bad hash=0x24079d49bc63a8dfed171ec5d26f381ad5e7d08849e0eed51eabcf6caa5d8413 height=61101954
[WARN] [08-30|02:08:47.572] [downloader] InsertHeader: Rejected header marked as bad hash=0x24079d49bc63a8dfed171ec5d26f381ad5e7d08849e0eed51eabcf6caa5d8413 height=61101954

@taratorio so, to reproduce - you just need something like this: https://github.com/erigontech/erigon/pull/11799