erigontech / erigon

Ethereum implementation on the efficiency frontier https://erigon.gitbook.io
GNU Lesser General Public License v3.0
3.12k stars 1.11k forks source link

Polygon Erigon Out of Memory on Mainnet and Mumbai #7589

Closed yfl92 closed 1 year ago

yfl92 commented 1 year ago

System information

Erigon version: ./erigon --version Reproduced on both:

OS & Version: Windows/Linux/OSX Ubuntu 20-04-lts

Commit hash: d9c5a01103912a64b31c01afac05ab62dfa5c65d OR https://github.com/maticnetwork/erigon/releases/tag/v0.0.6

Erigon Command (with flags/config):

Polygon Mainnet

erigon --datadir=/data \ --chain=bor-mainnet \ --bor.heimdall=some_heimdall_url \ --snapshots=false \ --graphql \ --http.addr=0.0.0.0 \ --http.api=eth,erigon,web3,net,debug,trace,txpool,admin,bor \ --http.corsdomain= \ --http.vhosts= \ --rpc.batch.limit=0 \ --rpc.returndata.limit=0 \ --downloader.disable.ipv6=true \ --log.console.verbosity=info

Polygon Mumbai

erigon --datadir=/data \ --chain=mumbai \ --bor.heimdall=some_heimdall_url \ --snapshots=false \ --graphql \ --http.addr=0.0.0.0 \ --http.api=eth,erigon,web3,net,debug,trace,txpool,admin,bor \ --http.corsdomain= \ --http.vhosts= \ --rpc.batch.limit=0 \ --rpc.returndata.limit=0 \ --downloader.disable.ipv6=true \ --log.console.verbosity=info

Concensus Layer: Heimdall

Concensus Layer Command (with flags/config): We use public heimdall

Chain/Network: Polygon/Mainnet & Mumbai

Expected behaviour

Erigon to sync blocks w/o running out of memory

Actual behaviour

Erigon spammed with following logs

[WARN] [05-26|18:40:01.944] [downloader] Rejected header marked as bad hash=0x791fd299f316b2a982ba5974db459a29f82d12e2128548cd75655da82b0d75c1 height=36066954
[WARN] [05-26|18:40:01.945] [downloader] Rejected header marked as bad hash=0x791fd299f316b2a982ba5974db459a29f82d12e2128548cd75655da82b0d75c1 height=36066954
[WARN] [05-26|18:40:01.946] [downloader] Rejected header marked as bad hash=0x791fd299f316b2a982ba5974db459a29f82d12e2128548cd75655da82b0d75c1 height=36066954

As soon as the ^ log starts spamming, I see a steep memory spike until the it OOM'd, consuming 128 GB of memory. During the process, Erigon stops syncing block.

Steps to reproduce the behaviour

Download mumbai snapshot from https://snapshots.polygon.technology/, specifically https://snapshot-download.polygon.technology/erigon-mumbai-archive-2023-05-24.tar.zst

Then run the node using the args above. I recommend running multiple as once. We saw roughly 50% of the nodes end up OOM'd


Backtrace

[backtrace]
AskAlexSharov commented 1 year ago
  1. show logs when it gets killed
  2. to understand where wasted RAM can do: add flag --pprof go tool pprof -inuse_space -png http://127.0.0.1:6060/debug/pprof/heap > mem.png show this file
  3. maybe related to https://github.com/ledgerwatch/erigon/pull/7545
github-actions[bot] commented 1 year ago

This issue is stale because it has been open for 40 days with no activity. Remove stale label or comment, or this will be closed in 7 days.

github-actions[bot] commented 1 year ago

This issue was closed because it has been stalled for 7 days with no activity.