erigontech / erigon

Ethereum implementation on the efficiency frontier https://erigon.gitbook.io
GNU Lesser General Public License v3.0
3.12k stars 1.11k forks source link

MDBX_MAP_FULL: Environment mapsize limit reached #8441

Closed crypt0m1nd3r closed 3 months ago

crypt0m1nd3r commented 12 months ago

System information

Erigon version: ./erigon --version

$. /erigon --version
erigon version 2.52.0

OS & Version: Windows/Linux/OSX

Debian 11.7 on ARM64

Commit hash:

Erigon Command (with flags/config):

 --datadir=/home/erigon/.local/share/erigon \
 --chain=mainnet \
 --port=31303 \
 --p2p.allowed-ports=31303,31304 \
 --torrent.upload.rate=1mb \
 --downloader.disable.ipv6 \
 --metrics \
 --metrics.addr=0.0.0.0 \
 --metrics.port=6060 \
 --prune=htc \
 --prune.r.before=11052984 \
 --authrpc.jwtsecret=-------\
 --http \
 --http.api=engine,eth,erigon,web3,net,debug,trace,txpool,db \
 --ws \
 --private.api.addr= \
 --sentry.log-peer-info

Concensus Layer: Lighthouse

Chain/Network: main

Expected behaviour

No MDBX_MAP_FULL and SIGSEV....

Actual behaviour

A MDBX_MAP_FULL error

Steps to reproduce the behaviour

Happen a few hours after upgrading from v2.48.2 to v2.52.0 .

Backtrace

Oct 11 09:39:49 x-host-x erigon[31329]: [INFO] [10-11|09:39:49.900] [snapshots] Dumping txs                  block num=17257994 alloc=4.7GB sys=7.8GB
Oct 11 09:40:08 x-host-x erigon[31329]: [INFO] [10-11|09:40:08.744] [snapshots] Dumping txs                  block num=17258780 alloc=3.7GB sys=7.8GB
Oct 11 09:40:16 x-host-x erigon[31329]: [INFO] [10-11|09:40:16.479] [] ETL [2/2] Loading                     into=PlainState current_prefix=822cd0ce
Oct 11 09:40:23 x-host-x erigon[31329]: [EROR] [10-11|09:40:23.216] [txpool] flush is local history          err="mdbx_cursor_del: MDBX_MAP_FULL: Environment mapsize limit reached"
Oct 11 09:40:28 x-host-x erigon[31329]: [INFO] [10-11|09:40:28.751] [snapshots] Dumping txs                  block num=17259813 alloc=4.3GB sys=7.8GB
Oct 11 09:40:38 x-host-x erigon[31329]: panic: runtime error: invalid memory address or nil pointer dereference
Oct 11 09:40:38 x-host-x erigon[31329]: [signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x15012c0]
Oct 11 09:40:38 x-host-x erigon[31329]: goroutine 2475 [running]:
Oct 11 09:40:38 x-host-x erigon[31329]: github.com/ledgerwatch/erigon-lib/txpool.(*TxPool).flushLocked(0x4000f7cd80, {0x2c9d5b8?, 0x405c0ee3c0})
Oct 11 09:40:38 x-host-x erigon[31329]:         github.com/ledgerwatch/erigon-lib@v1.0.0/txpool/pool.go:1769 +0x70
Oct 11 09:40:38 x-host-x erigon[31329]: github.com/ledgerwatch/erigon-lib/txpool.(*TxPool).flushNoFsync.func1({0x2c9d5b8?, 0x405c0ee3c0})
Oct 11 09:40:38 x-host-x erigon[31329]:         github.com/ledgerwatch/erigon-lib@v1.0.0/txpool/pool.go:1737 +0x48
Oct 11 09:40:38 x-host-x erigon[31329]: github.com/ledgerwatch/erigon-lib/kv/mdbx.(*MdbxKV).UpdateNosync(0x4083637e50?, {0x2c70538?, 0x4000fba0f0?}, 0x405a388580)
Oct 11 09:40:38 x-host-x erigon[31329]:         github.com/ledgerwatch/erigon-lib@v1.0.0/kv/mdbx/kv_mdbx.go:657 +0x88
Oct 11 09:40:38 x-host-x erigon[31329]: github.com/ledgerwatch/erigon-lib/txpool.(*TxPool).flushNoFsync(0x4000f7cd80, {0x2c70538, 0x4000fba0f0}, {0x2c82958, 0x4099e272b0})
Oct 11 09:40:38 x-host-x erigon[31329]:         github.com/ledgerwatch/erigon-lib@v1.0.0/txpool/pool.go:1736 +0x16c
Oct 11 09:40:38 x-host-x erigon[31329]: github.com/ledgerwatch/erigon-lib/txpool.(*TxPool).flush(0x4005fb9ed8?, {0x2c70538, 0x4000fba0f0}, {0x2c82958?, 0x4099e272b0?})
Oct 11 09:40:38 x-host-x erigon[31329]:         github.com/ledgerwatch/erigon-lib@v1.0.0/txpool/pool.go:1756 +0xac
Oct 11 09:40:38 x-host-x erigon[31329]: github.com/ledgerwatch/erigon-lib/txpool.MainLoop({0x2c70538?, 0x4000fba0f0}, {0x2c82958, 0x4099e272b0}, {0x0?, 0x493198?}, 0x4000f7cd80, 0x40436ea360, 0x408dc94d70, 0x4083674018, ...)
Oct 11 09:40:38 x-host-x erigon[31329]:         github.com/ledgerwatch/erigon-lib@v1.0.0/txpool/pool.go:1623 +0x24c
Oct 11 09:40:38 x-host-x erigon[31329]: created by github.com/ledgerwatch/erigon/eth.New
Oct 11 09:40:38 x-host-x erigon[31329]:         github.com/ledgerwatch/erigon/eth/backend.go:681 +0x3350
Oct 11 09:40:40 x-host-x systemd[1]: erigon.service: Main process exited, code=exited, status=2/INVALIDARGUMENT
Oct 11 09:40:40 x-host-x systemd[1]: erigon.service: Failed with result 'exit-code'.
Oct 11 09:40:40 x-host-x systemd[1]: erigon.service: Consumed 4h 58min 29.384s CPU time.
Oct 11 09:40:43 x-host-x systemd[1]: erigon.service: Scheduled restart job, restart counter is at 1.
Oct 11 09:40:43 x-host-x systemd[1]: Stopped Erigon Execution Layer Client service (Ethereum MainNet).
Oct 11 09:40:43 x-host-x systemd[1]: erigon.service: Consumed 4h 58min 29.384s CPU time.
Oct 11 09:40:43 x-host-x systemd[1]: Started Erigon Execution Layer Client service (Ethereum MainNet).

Extra Info

$ pwd && du -Sh .
/home/erigon/.local/share/erigon
1.2T    ./chaindata
7.3G    ./temp
7.5M    ./nodes/eth67
13M     ./nodes/eth68
4.0K    ./nodes
1.1G    ./txpool
4.0K    ./snapshots/domain
4.0K    ./snapshots/accessor
1008K   ./snapshots/db
4.0K    ./snapshots/history
4.0K    ./snapshots/warm
4.0K    ./snapshots/idx
339G    ./snapshots
14M     ./downloader
18M     ./logs
8.0K    .
sokolalec commented 12 months ago

Same error happened to me after I upgraded from 2.49.1 -> 2.50.2

AskAlexSharov commented 12 months ago

Fixed by https://github.com/ledgerwatch/erigon/pull/8434

yperbasis commented 6 months ago

Please re-open this issue if it still happens with v2.59.2 or later.

insider89 commented 4 months ago

Have the issue with v2.60.0:

[INFO] [06-04|11:05:21.990] [txpool] stat pending=0 baseFee=0 queued=30000 alloc=558.7MB sys=3.8GB                                                               page_alloc_slowpath:10501 unable alloc 42 pages, flags 0x3, errcode -30792
│ [WARN] [06-04|11:08:10.495] [7/15 Execution] Execution failed        block=57600865 hash=0x64a6f9efc98889a1210737be5972cdc9f9c23d3417a7a71fd15403453cf69a1f err="writing receipts for block 57600865: label: chaindata, bucket: TransactionLog, mdbx_cursor_put: MDBX_MAP_FULL: Environment mapsize limit reached"
│ [EROR] [06-04|11:08:10.553] Staged Sync err="[7/15 Execution] label: chaindata, table: SyncStage, err: mdbx_cursor_put: MDBX_BAD_TXN: Transaction is not valid for requested operation, e.g. had errored and be must aborted, has a child, or is invalid"

The flags passed to startup command:

Command:
erigon
Args:
--chain=bor-mainnet
--snapshots=true
--txpool.nolocals
--db.pagesize=16k
--private.api.addr=0.0.0.0:9090
--nat=extip:IP_ADDRESS
--authrpc.vhosts=*
--authrpc.jwtsecret=/home/erigon/.local/share/erigon/jwtsecret
--authrpc.addr=0.0.0.0
--datadir=/home/erigon/.local/share/erigon
--internalcl
--db.size.limit=12TB
--metrics
--port=30833
--p2p.allowed-ports=30833
--p2p.allowed-ports=30834
--p2p.allowed-ports=30835
--p2p.allowed-ports=30836
--metrics.addr=0.0.0.0
--metrics.port=6060
--torrent.download.rate=1000mb
--http.api=admin,net,eth,erigon,web3,net,debug,trace,txpool,engine,ots
--maxpeers=400
--http.addr=0.0.0.0
--http.vhosts=*
--http.corsdomain=*
--rpc.batch.limit=1000
--bodies.cache=5G
--ws
--db.read.concurrency=1024
--rpc.batch.concurrency=64
--txpool.pricelimit=30000000000
--bor.milestone=false
--bootnodes=enode://b8f1cc9c5d4403703fbf377116469667d2b1823c0daf16b7250aa576bacf399e42c3930ccfcb02c5df6879565a2b8931335565f0e8d3f8e72385ecf4a4bf160a@3.36.224.80:30303,enode://8729e0c825f3d9cad382555f3e4
6dcff21af323e89025a0e6312df541f4a9e73abfa562d64906f5e59c51fe6f0501b3e61b07979606c56329c020ed739910759@54.194.245.5:30303,enode://0cb82b395094ee4a2915e9714894627de9ed8498fb881cec6db7c65e8b9a5bd7f2f25cc84e71e89
d0947e51c76e85d0847de848c7782b13c0255247a6758178c@44.232.55.71:30303,enode://88116f4295f5a31538ae409e4d44ad40d22e44ee9342869e7d68bdec55b0f83c1530355ce8b41fbec0928a7d75a5745d528450d30aec92066ab6ba1ee351d710@15
9.203.9.164:30303,enode://a1d2af06659b080df1537490c04ef139f7cf71d3f1652011b722134b8f10361c69a445000809fadd6c1ad34f4a0ed58d72b5c1346d62ab536fae563f27fe2bba@142.132.136.31:30833,enode://ea3c4032b95d57b96dd482cf
4fa986f491cf587244e81ebd6bf37eda116ccaf37233414529a6a86115e42b24b69a07d98036e4f991de6df48e88bc86e86f9069@142.132.136.31:30843,enode://fd10175c237537b11b359bbcd06d93a8595c0e77de05019bd2dfe22999d3aba1383cd99d1e
be81a0cc17111b911a3639869d68407b105e806017c395c4e45125@157.90.90.89:30803,enode://bb9fb6a0da0dcf52af4d89046ba257c8bdce40ff792f1eed55b363f72f9ff12fefe04180608e19ed9c2f5ee5f5c3385eb37bb76d3831bf23302cf522ebed6c
92@168.119.70.250:30865,enode://667a3a764c33b7919b92fdb77db3a4736845d953b27c7384d15a60aeaa7b33b5d64ea4e17c38be62e4af52e82db43beffc9e8f2992085e673cb2cf2891c9964f@168.119.70.250:30875,enode://2e6fa77c5f66c0313a
62177e0077bf1a3178adb41e4fa60352ba295e8aa9e26cf0074ba2d55f17cca8e5c7abfad766b6fc9e1eeb6586a762f43cfe63d3d6ddf7@67.235.115.91:30885,enode://e1b0767d1756a950f5fdab659d1292cabd303c5c92e8cf8865937d42ef61a0b5df4df
88974db01ff21317bbeea88b9a3c299238e4e3ee6f42ed3fa3e730d9d79@65.108.228.152:30865
--staticpeers=enode://b8f1cc9c5d4403703fbf377116469667d2b1823c0daf16b7250aa576bacf399e42c3930ccfcb02c5df6879565a2b8931335565f0e8d3f8e72385ecf4a4bf160a@3.36.224.80:30303,enode://8729e0c825f3d9cad382555f3
e46dcff21af323e89025a0e6312df541f4a9e73abfa562d64906f5e59c51fe6f0501b3e61b07979606c56329c020ed739910759@54.194.245.5:30303,enode://0cb82b395094ee4a2915e9714894627de9ed8498fb881cec6db7c65e8b9a5bd7f2f25cc84e71e
89d0947e51c76e85d0847de848c7782b13c0255247a6758178c@44.232.55.71:30303,enode://88116f4295f5a31538ae409e4d44ad40d22e44ee9342869e7d68bdec55b0f83c1530355ce8b41fbec0928a7d75a5745d528450d30aec92066ab6ba1ee351d710@
159.203.9.164:30303,enode://a1d2af06659b080df1537490c04ef139f7cf71d3f1652011b722134b8f10361c69a445000809fadd6c1ad34f4a0ed58d72b5c1346d62ab536fae563f27fe2bba@142.132.136.31:30833,enode://ea3c4032b95d57b96dd482
cf4fa986f491cf587244e81ebd6bf37eda116ccaf37233414529a6a86115e42b24b69a07d98036e4f991de6df48e88bc86e86f9069@142.132.136.31:30843,enode://fd10175c237537b11b359bbcd06d93a8595c0e77de05019bd2dfe22999d3aba1383cd99d
1ebe81a0cc17111b911a3639869d68407b105e806017c395c4e45125@157.90.90.89:30803,enode://bb9fb6a0da0dcf52af4d89046ba257c8bdce40ff792f1eed55b363f72f9ff12fefe04180608e19ed9c2f5ee5f5c3385eb37bb76d3831bf23302cf522ebed
6c92@168.119.70.250:30865,enode://667a3a764c33b7919b92fdb77db3a4736845d953b27c7384d15a60aeaa7b33b5d64ea4e17c38be62e4af52e82db43beffc9e8f2992085e673cb2cf2891c9964f@168.119.70.250:30875,enode://2e6fa77c5f66c031
3a62177e0077bf1a3178adb41e4fa60352ba295e8aa9e26cf0074ba2d55f17cca8e5c7abfad766b6fc9e1eeb6586a762f43cfe63d3d6ddf7@67.235.115.91:30885,enode://e1b0767d1756a950f5fdab659d1292cabd303c5c92e8cf8865937d42ef61a0b5df4
df88974db01ff21317bbeea88b9a3c299238e4e3ee6f42ed3fa3e730d9d79@65.108.228.152:30865,enode://b8f1cc9c5d4403703fbf377116469667d2b1823c0daf16b7250aa576bacf399e42c3930ccfcb02c5df6879565a2b8931335565f0e8d3f8e72385e
cf4a4bf160a@3.36.224.80:30303,enode://8729e0c825f3d9cad382555f3e46dcff21af323e89025a0e6312df541f4a9e73abfa562d64906f5e59c51fe6f0501b3e61b07979606c56329c020ed739910759@54.194.245.5:30303,enode://0cb82b395094ee
4a2915e9714894627de9ed8498fb881cec6db7c65e8b9a5bd7f2f25cc84e71e89d0947e51c76e85d0847de848c7782b13c0255247a6758178c@44.232.55.71:30303,enode://88116f4295f5a31538ae409e4d44ad40d22e44ee9342869e7d68bdec55b0f83c15
30355ce8b41fbec0928a7d75a5745d528450d30aec92066ab6ba1ee351d710@159.203.9.164:30303,enode://a1d2af06659b080df1537490c04ef139f7cf71d3f1652011b722134b8f10361c69a445000809fadd6c1ad34f4a0ed58d72b5c1346d62ab536fae5
63f27fe2bba@142.132.136.31:30833,enode://ea3c4032b95d57b96dd482cf4fa986f491cf587244e81ebd6bf37eda116ccaf37233414529a6a86115e42b24b69a07d98036e4f991de6df48e88bc86e86f9069@142.132.136.31:30843,enode://fd10175c2
37537b11b359bbcd06d93a8595c0e77de05019bd2dfe22999d3aba1383cd99d1ebe81a0cc17111b911a3639869d68407b105e806017c395c4e45125@157.90.90.89:30803,enode://bb9fb6a0da0dcf52af4d89046ba257c8bdce40ff792f1eed55b363f72f9ff
12fefe04180608e19ed9c2f5ee5f5c3385eb37bb76d3831bf23302cf522ebed6c92@168.119.70.250:30865,enode://667a3a764c33b7919b92fdb77db3a4736845d953b27c7384d15a60aeaa7b33b5d64ea4e17c38be62e4af52e82db43beffc9e8f2992085e6
73cb2cf2891c9964f@168.119.70.250:30875,enode://2e6fa77c5f66c0313a62177e0077bf1a3178adb41e4fa60352ba295e8aa9e26cf0074ba2d55f17cca8e5c7abfad766b6fc9e1eeb6586a762f43cfe63d3d6ddf7@67.235.115.91:30885,enode://e1b0
767d1756a950f5fdab659d1292cabd303c5c92e8cf8865937d42ef61a0b5df4df88974db01ff21317bbeea88b9a3c299238e4e3ee6f42ed3fa3e730d9d79@65.108.228.152:30865
--bor.heimdall=http://polygon-heimdall-rpc:1317/
--log.dir.path=/home/erigon/.local/share/erigon/logs

Disk size consumed:

100M    bor
2.0K    caplin
5.2T    chaindata
14K downloader
512 erigon-mainnet-failures.txt
8.5K    erigon-mainnet-parts.txt
512 jwt.hex
512 LOCK
91M logs
512 nodekey
48M nodes
15K snapshots
8.5K    temp
726M    txpool
5.2T    total

mdbx_stat:

~/.local/share/erigon $ mdbx_stat -ef chaindata/
mdbx_stat v0.12.0-71-g1cac6536 (2022-07-28T09:57:31+07:00, T-9a6d7e5b917e5fbd14dc51835fa749d092aa1d72)
Running for chaindata/...
Environment Info
  Pagesize: 16384
  Dynamic datafile: 49152..13194139533312 bytes (+16777216/-33554432), 3..805306368 pages (+1024/-2048)
  Current mapsize: 13194139533312 bytes, 805306368 pages
  Current datafile: 13194139533312 bytes, 805306368 pages
  Last transaction ID: 23669991
  Latter reader transaction ID: 23669991 (0)
  Max readers: 32116
  Number of reader slots uses: 8
Garbage Collection
  Pagesize: 16384
  Tree depth: 3
  Branch pages: 3
  Leaf pages: 901
  Overflow pages: 14052
  Entries: 22173
Page Usage
  Total: 805306368 100%
  Backed: 805306368 100.0%
  Allocated: 805306341 100.0%
  Remained: 27 0.0%
  Used: 747984451 92.9%
  GC: 57321890 7.1%
  Retained: 8 0.0%
  Reclaimable: 57321882 7.1%
  Available: 57321909 7.1%
Status of Main DB
  Pagesize: 16384
  Tree depth: 1
  Branch pages: 0
  Leaf pages: 1
  Overflow pages: 0
  Entries: 153
insider89 commented 4 months ago

After I set db.size.limit=15Tb, it starts syncing.