Open battlmonstr opened 6 months ago
@AskAlexSharov is there something like --batchSize for LogIndex?
One more crash around block 12.5M:
[INFO] [02-27|15:22:31.915] [10/12 LogIndex] Progress number=12568622 alloc=10.7GB sys=13.9GB
[INFO] [02-27|15:23:01.912] [10/12 LogIndex] Progress number=12577585 alloc=10.3GB sys=13.9GB
Heap dump before the crash:
This is a dump 5 minutes before the crash for comparison:
They look very similar. Maybe the problem is on the mdbx side, not in Go heap?
--internalcl
- I see on your picture: SpawnHistoryDownload - seems it happening in background and eating ~1G. I guess it can eat less or improve it's mem-limit, or adapt to total ram on machine.
you can proof it by running stage_log_index without other erigon parts: integration stage_log_index
@AskAlexSharov Yeah, at the time of the crash I've seen something in the logs about the history downloading. I've ran the integration stage offline successfully. After erigon restarted, it went to 12/12 Finish 🎉 .
@Giulio2002 hi, plz take a look if possible put stricter ram limit to history download.
Also seeing an OOM kill during LogIndex stage with 16GB memory and GOMEMLIMIT = 13GiB
From journalctl:
Mar 18 03:50:01 ethnode kernel: oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/system.slice/supervisor.service,task=erigon,pid=952888,uid=1001 Mar 18 03:50:01 ethnode kernel: Out of memory: Killed process 952888 (erigon) total-vm:17215695868kB, anon-rss:11155356kB, file-rss:0kB, shmem-rss:0kB, UID:1001 pgtables:3872040kB oom_score_adj:0 Mar 18 03:50:01 ethnode systemd[1]: supervisor.service: A process of this unit has been killed by the OOM killer.
@Giulio2002 hi, plz take a look if possible put stricter ram limit to history download.
What is the command option for this? Couldn't find in the manual.
this PR may help: https://github.com/ledgerwatch/erigon/pull/9814
System information
Erigon version:
./erigon --version
v2.57.1
OS & Version: Windows/Linux/OSX
Linux
Commit hash:
9f1cd651f0b1b443b4bd96eaed84502c149fdca2
Erigon Command (with flags/config):
Consensus Layer:
caplin
Consensus Layer Command (with flags/config):
--internalcl
Chain/Network:
mainnet
Expected behaviour
No crash.
Actual behaviour
Crash.
Steps to reproduce the behaviour
Sync from scratch until stage 10/12 LogIndex.
Backtrace
Latest DEBUG log lines before the crash: