Open mdominoni opened 10 months ago
This mem.png shows - everything is good: using expected 3gb
Ok, but OOM is still happening, is there anything else I can do to prevent this happening all the time?
dmesg
shows:
[210146.815414] oom-kill:constraint=CONSTRAINT_MEMCG,nodemask=(null),cpuset=eth1.service,mems_allowed=0,oom_memcg=/system.slice/eth1.service,task_memcg=/system.slice/eth1.service,task=erigon,pid=7926,uid=0 [210146.815570] Memory cgroup out of memory: Killed process 7926 (erigon) total-vm:5312414528kB, anon-rss:20685544kB, file-rss:2650224kB, shmem-rss:0kB, UID:0 pgtables:4081092kB oom_score_adj:-100 [210148.956419] oom_reaper: reaped process 7926 (erigon), now anon-rss:0kB, file-rss:1958520kB, shmem-rss:0kB
and what shows alloc
in logs before kill?
try get profiling when alloc
> 5g
[txpool] stat pending=9964 baseFee=0 queued=5125 alloc=3.1GB sys=7.5GB
Unfortunately this pic is healthy
Just to clarify, is it normal that 64 GB are not enought to run Erigon?
System information erigon version 2.53.4
OS & Version: Linux / Ubuntu on AWS with 64 GB RAM
Commit hash: tag - v2.53.4
Erigon Service:
[Unit] Description=Erigon Execution Layer Client service (Mainet) Wants=network-online.target After=network-online.target
[Service] Environment="GOGC=50 GOMEMLIMIT=24GiB GOMAXPROCS=2" MemoryLimit=24G OOMScoreAdjust=-100 Type=simple User=root Restart=allways RestartSec=5 KillSignal=SIGINT TimeoutStopSec=300 ExecStart=/opt/erigon/build/bin/erigon \ --datadir /opt/data/erigon \ --chain mainnet \ --port "30303" \ --metrics \ --pprof \ --authrpc.jwtsecret "/opt/secrets/jwt.hex" \ --http \ --ws \ --http.vhosts="" \ --http.corsdomain="" \ --http.addr="0.0.0.0" \ --http.port "8545" \ --http.api "eth,erigon,personal,db,admin,web3,net,trace,rpc,debug,txpool" \ --txpool.api.addr "0.0.0.0:9094" \ --private.api.addr "0.0.0.0:9090" \ --batchSize=1G [Install] WantedBy=multi-user.target
Consensus Layer: lighthouse Lighthouse v4.5.0-441fc16
Consensus Service:
[Unit] Description=Lighthouse Consensus Layer Client BN (Mainet) Wants=network-online.target After=network-online.target
[Service] Type=simple User=root Restart=allways RestartSec=5 KillSignal=SIGINT TimeoutStopSec=300 ExecStart=/usr/local/bin/lighthouse bn \ --network mainnet \ --datadir "/opt/data/lighthouse" \ --execution-endpoint http://localhost:8551 \ --execution-jwt "/opt/secrets/jwt.hex" \ --checkpoint-sync-url https://mainnet.checkpoint.sigp.io \ --disable-deposit-contract-sync \ --reconstruct-historic-states \ --metrics
[Install] WantedBy=multi-user.target
Chain/Network: mainnet
Expected behaviour Node properly syncs after version upgrarde
Actual behaviour After a couple of hours synchronized, erigon get's killed by OOM
Steps to reproduce the behaviour Full sync on v2.51.0, then upgrade to v2.53.4
Backtrace N/A
Executed go tool pprof -inuse_space -png http://127.0.0.1:6060/debug/pprof/heap > mem.png