Closed du5 closed 10 months ago
If you capture relevant metrics information, please collect it and feedback it to the BSC team
AFIK, v1.3.7 does not have any memory related changes, no idea why v1.3.7 has the OOM issue while v1.3.6 not.
According to feedback from a community user, he experienced an OOM after running the pbss node for two days and could no longer start it. The same problem also occurred after I upgraded to 1.3.7, but the running time was different. It seemed that there was no specific pattern. Positioning The problem may be more troublesome
This is the stdout information of restart after oom
root@snap-helper /opt # ./pbss.sh
INFO [01-02|15:39:12.848] Starting Geth on BSC mainnet...
INFO [01-02|15:39:12.848] Bumping default cache on mainnet provided=1024 updated=4096
INFO [01-02|15:39:12.849] Maximum peer count ETH=256 LES=0 total=256
INFO [01-02|15:39:12.850] Using pebble as db engine
INFO [01-02|15:39:12.925] Using pebble as the backing database
INFO [01-02|15:39:12.925] Allocated cache and file handles database=/opt/geth.pbss/geth/chaindata cache=1.60GiB handles=524,288 "memory table"=409.50MiB
INFO [01-02|15:39:13.073] Found legacy ancient chain path location=/opt/geth.pbss/geth/chaindata/ancient
INFO [01-02|15:39:13.076] Opened ancient database database=/opt/geth.pbss/geth/chaindata/ancient readonly=false frozen=34,629,314
INFO [01-02|15:39:13.078] All are provided, state scheme set to already existing scheme=path
INFO [01-02|15:39:13.084] Set global gas cap cap=50,000,000
INFO [01-02|15:39:13.084] Initializing the KZG library backend=gokzg
INFO [01-02|15:39:13.141] Capped dirty cache size provided=1024.00MiB adjusted=256.00MiB
INFO [01-02|15:39:13.141] Clean cache size provided=614.00MiB
INFO [01-02|15:39:13.142] Allocated trie memory caches clean=614.00MiB dirty=256.00MiB
INFO [01-02|15:39:13.160] Using pebble as the backing database
INFO [01-02|15:39:13.160] Allocated cache and file handles database=/opt/geth.pbss/geth/chaindata cache=1.60GiB handles=524,288 "memory table"=409.50MiB
INFO [01-02|15:39:13.281] Found legacy ancient chain path location=/opt/geth.pbss/geth/chaindata/ancient
INFO [01-02|15:39:13.282] Read ancientdb item counts items=0
INFO [01-02|15:39:13.283] Opened ancientdb with nodata mode database=/opt/geth.pbss/geth/chaindata/ancient frozen=34,629,314
INFO [01-02|15:39:13.285] Parlia chainConfig="{ChainID: 56 Homestead: 0 DAO: <nil> DAOSupport: false EIP150: 0 EIP155: 0 EIP158: 0 Byzantium: 0 Constantinople: 0 Petersburg: 0 Istanbul: 0, Muir Glacier: 0, Ramanujan: 0, Niels: 0, MirrorSync: 5184000, Bruno: 13082000, Berlin: 31302048, YOLO v3: <nil>, CatalystBlock: <nil>, London: 31302048, ArrowGlacier: <nil>, MergeFork:<nil>, Euler: 18907621, Gibbs: 23846001, Nano: 21962149, Moran: 22107423, Planck: 27281024,Luban: 29020050, Plato: 30720096, Hertz: 31302048, Hertzfix: 34140700, ShanghaiTime: 1705996800, KeplerTime: 1705996800, Engine: parlia}"
INFO [01-02|15:39:13.481] Initialising Ethereum protocol network=56 dbversion=8
INFO [01-02|15:39:14.253] new async node buffer limit=256.00MiB layers=74
WARN [01-02|15:39:15.329] Path-based state scheme is an experimental feature sync=false
INFO [01-02|15:39:15.509] Initialised chain configuration config="{ChainID: 56 Homestead: 0 DAO: <nil> DAOSupport: false EIP150: 0 EIP155: 0 EIP158: 0 Byzantium: 0 Constantinople: 0 Petersburg: 0 Istanbul: 0, Muir Glacier: 0, Ramanujan: 0, Niels: 0, MirrorSync: 5184000, Bruno: 13082000, Berlin: 31302048, YOLO v3: <nil>, CatalystBlock: <nil>, London: 31302048, ArrowGlacier: <nil>, MergeFork:<nil>, Euler: 18907621, Gibbs: 23846001, Nano: 21962149, Moran: 22107423, Planck: 27281024,Luban: 29020050, Plato: 30720096, Hertz: 31302048, Hertzfix: 34140700, ShanghaiTime: 1705996800, KeplerTime: 1705996800, Engine: parlia}"
INFO [01-02|15:39:16.205] Loaded most recent local block number=34,719,314 hash=69a3b8..11f2bd root=5257e5..b31cd1 td=68,977,079 age=5d19h40m
INFO [01-02|15:39:16.283] Loaded most recent local finalized block number=34,719,312 hash=9eb38f..84e2a5 root=74bf91..38355e td=68,977,075 age=5d19h40m
INFO [01-02|15:39:16.363] Loaded last snap-sync pivot marker number=34,580,824
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x30 pc=0xcd121c]
goroutine 1 [running]:
github.com/ethereum/go-ethereum/core/rawdb.(*ResettableFreezer).AncientRange(0xcfee40?, {0x28e40c2?, 0xc03b2298d8?}, 0xc03b2299a8?, 0x248caa0?, 0xc03efd6ed0?)
/opt/bsc/core/rawdb/freezer_resettable.go:126 +0x5c
github.com/ethereum/go-ethereum/core/rawdb.ReadStateHistoryMetaList(...)
/opt/bsc/core/rawdb/accessors_state.go:180
github.com/ethereum/go-ethereum/trie/triedb/pathdb.checkHistories(0x0, 0x1b8872ccbeaa9682?, 0xc591320e457d591f?, 0xc03349d750)
/opt/bsc/trie/triedb/pathdb/history.go:548 +0x85
github.com/ethereum/go-ethereum/trie/triedb/pathdb.(*Database).Recoverable(0xc002910050, {0x3, 0x77, 0xc4, 0x5, 0xd2, 0xe5, 0x36, 0x52, 0x82, ...})
/opt/bsc/trie/triedb/pathdb/database.go:363 +0x205
github.com/ethereum/go-ethereum/trie.(*Database).Recoverable(0x7faf11c48890?, {0x3, 0x77, 0xc4, 0x5, 0xd2, 0xe5, 0x36, 0x52, 0x82, ...})
/opt/bsc/trie/database.go:320 +0x45
github.com/ethereum/go-ethereum/core.NewBlockChain({0x3376ad8?, 0xc000126600}, 0x0?, 0x7ffffffe805afca8?, 0x0?, {0x33653c0?, 0xc0012b1100?}, {{0x0, 0x0}, 0x0, ...}, ...)
/opt/bsc/core/blockchain.go:403 +0x14b0
github.com/ethereum/go-ethereum/eth.New(0xc0001aac40, 0xc00171f800)
/opt/bsc/eth/backend.go:252 +0x170f
github.com/ethereum/go-ethereum/cmd/utils.RegisterEthService(0x0?, 0xc00171f800)
/opt/bsc/cmd/utils/flags.go:2154 +0x167
main.makeFullNode(0xc00153fbf0?)
/opt/bsc/cmd/geth/config.go:181 +0x255
main.geth(0xc001729b80)
/opt/bsc/cmd/geth/main.go:341 +0xf3
github.com/urfave/cli/v2.(*Command).Run(0xc0017ffb80, 0xc001729b80, {0xc0001aa000, 0xe, 0xe})
/home/runner/go/pkg/mod/github.com/urfave/cli/v2@v2.25.7/command.go:274 +0x9eb
github.com/urfave/cli/v2.(*App).RunContext(0xc0006ab2c0, {0x334ef10?, 0xc0001ac000}, {0xc0001aa000, 0xe, 0xe})
/home/runner/go/pkg/mod/github.com/urfave/cli/v2@v2.25.7/app.go:332 +0x616
github.com/urfave/cli/v2.(*App).Run(...)
/home/runner/go/pkg/mod/github.com/urfave/cli/v2@v2.25.7/app.go:309
main.main()
/opt/bsc/cmd/geth/main.go:284 +0x47
root@snap-helper /opt # cat pbss.sh
geth --datadir=geth.pbss --history.transactions=0 --tries-verify-mode=local --db.engine=pebble --maxpeers=256 --syncmode=full --ipcpath=/opt/ipc.ipc --port=30311 --discovery.port=30311 --disablesnapprotocol=true --pruneancient=true --config=config.toml --state.scheme=path
root@snap-helper /opt # cat config.toml
[Eth]
NetworkId = 56
LightPeers = 100
TrieTimeout = 150000000000
StateScheme = "path"
[Eth.Miner]
GasCeil = 140000000
GasPrice = 3000000000
Recommit = 10000000000
[Eth.TxPool]
Locals = []
NoLocals = true
Journal = "transactions.rlp"
Rejournal = 3600000000000
PriceLimit = 3000000000
PriceBump = 10
AccountSlots = 200
GlobalSlots = 8000
AccountQueue = 200
GlobalQueue = 4000
[Eth.GPO]
Blocks = 20
Percentile = 60
OracleThreshold = 1000
[Node]
IPCPath = "geth.ipc"
HTTPHost = "localhost"
InsecureUnlockAllowed = false
HTTPPort = 8545
HTTPVirtualHosts = ["localhost"]
HTTPModules = ["eth", "net", "web3", "txpool", "parlia"]
WSPort = 8546
WSModules = ["net", "web3", "eth"]
[Node.P2P]
MaxPeers = 200
NoDiscovery = false
ListenAddr = ":30311"
EnableMsgEvents = false
Other users have reported that version 1.3.6 has the same problem and they may need to use version 1.3.5.
@du5 which snapshot do you use? Do you do any operations before panic happens, such as restart geth?
@du5 which snapshot do you use? Do you do any operations before panic happens, such as restart geth?
The version I found the problem with is https://snapshots.48.club/geth.pbss.34712063.tar.zst, geth was launched without any warning, without any Panic error, it ended directly, based on these symptoms I identified it as "oom".
When "oom" occurs and you start again, there will be a panic log.
What I did after that was to re-unzip and synchronize using v1.3.6, which worked for me, and it has been working normally until now. But I observed that v1.3.6 also had users reporting this problem, and it was solved in v1.3.5. This problem will occur with snapshots built using BSCTeam or 48Club. It seems that the snapshot itself is not damaged.
btw, the geth.pbss.34712063.tar.zst snapshot has been deleted, but the latest snapshot is obtained after synchronizing this snapshot.
I find this issue has completed, so what's the version we should use with PBSS? The README still display we need use v1.3.5
the latest v1.3.7 is ok to run PBSS, but may have some issue with some snapshot provided by 48Club, due to the --pruneancient compatible issue. Meanwhile, there will be another release v1.3.8, likely next week. You'd better to try v1.3.8 once it is ready
I find this issue has completed, so what's the version we should use with PBSS? The README still display we need use v1.3.5
Due to a series of problems caused by pruneancient, we have decided not to use this tag in the future. There are many problems with the future of bsc-geth. Turning this tag on in version 1.3.x will not prune the database, and the database size continues to grow.
I have multiple nodes where pruneancient is also turned on. The minimum database size is 1.1tb and the maximum is 1.9tb. I think there is a problem with the pruneancient function logic and it is not a problem with the snapshot.
Regarding the conflict between pbss and pruneancient, I still recommend using version v1.3.5
I use v1.3.5 version bsc-geth and this snapshot "https://snapshots.48.club/geth.pbss.35485953.tar.zst", still oom.
and restart the process report "panic: runtime error: invalid memory address or nil pointer dereference":
so I think we should not use the snapshot with pbss flag.
@xux1217 If the database is damaged, downgrading cannot repair it at this time. You need to download it again. Downgrading must be done before the database is damaged.
I am sure that I first download the snapshot, and then just use the v1.3.5 bsc-geth to start, not downgrade action.
my start cmd: ./geth --config ./config.toml --datadir /data/geth.full --syncmode=full --db.engine=pebble --cache 8000 --rpc.allow-unprotected-txs --history.transactions=0 --tries-verify-mode=local --diffblock=5000 --http --http.corsdomain=* --http.vhosts=* --pruneancient --state.scheme path
and the config.yaml:
[Eth]
NetworkId = 56
LightPeers = 100
TrieTimeout = 150000000000
StateScheme = "path"
[Eth.Miner]
GasCeil = 140000000
GasPrice = 3000000000
Recommit = 10000000000
[Eth.TxPool]
Locals = []
NoLocals = true
Journal = "transactions.rlp"
Rejournal = 3600000000000
PriceLimit = 3000000000
PriceBump = 10
AccountSlots = 200
GlobalSlots = 8000
AccountQueue = 200
GlobalQueue = 4000
[Eth.GPO]
Blocks = 20
Percentile = 60
OracleThreshold = 1000
[Node]
IPCPath = "geth.ipc"
HTTPHost = "0.0.0.0"
InsecureUnlockAllowed = false
HTTPPort = 8545
HTTPVirtualHosts = ["*"]
HTTPModules = ["eth", "net", "web3", "txpool", "parlia","debug"]
WSPort = 8546
WSModules = ["net", "web3", "eth"]
[Node.P2P]
MaxPeers = 200
NoDiscovery = false
StaticNodes = []
ListenAddr = ":30311"
EnableMsgEvents = false
[Node.LogConfig]
FilePath = "bsc.log"
MaxBytesSize = 10485760
Level = "info"
FileRoot = ""
@xux1217 If the database is damaged, downgrading cannot repair it at this time. You need to download it again. Downgrading must be done before the database is damaged.
i have the same issue, fresh download with 1.3.5 Jan 25 20:43:18 orangepi5 bash[194799]: panic: runtime error: invalid memory address or nil pointer dereference Jan 25 20:43:18 orangepi5 bash[194799]: [signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0xbc4664] Jan 25 20:43:18 orangepi5 bash[194799]: goroutine 1 [running]: Jan 25 20:43:18 orangepi5 bash[194799]: github.com/ethereum/go-ethereum/core/rawdb.(ResettableFreezer).AncientRange(0x400052c230?, {0x25a47b7?, 0x4011d54d00?}, 0x40001031e8?, 0x40001032e8?, 0x214da80?) Jan 25 20:43:18 orangepi5 bash[194799]: /home/runner/work/bsc/bsc/core/rawdb/freezer_resettable.go:125 +0x34 Jan 25 20:43:18 orangepi5 bash[194799]: github.com/ethereum/go-ethereum/core/rawdb.ReadStateHistoryMetaList(...) Jan 25 20:43:18 orangepi5 bash[194799]: /home/runner/work/bsc/bsc/core/rawdb/accessors_state.go:180 Jan 25 20:43:18 orangepi5 bash[194799]: github.com/ethereum/go-ethereum/trie/triedb/pathdb.checkHistories(0x0, 0xdfacbb468a60fb71?, 0xc9c7e760d93b58c2?, 0x40144b3728) Jan 25 20:43:18 orangepi5 bash[194799]: /home/runner/work/bsc/bsc/trie/triedb/pathdb/history.go:548 +0x70 Jan 25 20:43:18 orangepi5 bash[194799]: github.com/ethereum/go-ethereum/trie/triedb/pathdb.(Database).Recoverable(0x4011976140, {0xf2, 0x63, 0x8c, 0x88, 0x1b, 0x4f, 0xe9, 0x8a, 0x71, ...}) Jan 25 20:43:18 orangepi5 bash[194799]: /home/runner/work/bsc/bsc/trie/triedb/pathdb/database.go:363 +0x17c Jan 25 20:43:18 orangepi5 bash[194799]: github.com/ethereum/go-ethereum/trie.(Database).Recoverable(0x7f57c525e8?, {0xf2, 0x63, 0x8c, 0x88, 0x1b, 0x4f, 0xe9, 0x8a, 0x71, ...}) Jan 25 20:43:18 orangepi5 bash[194799]: /home/runner/work/bsc/bsc/trie/database.go:320 +0x44 Jan 25 20:43:18 orangepi5 bash[194799]: github.com/ethereum/go-ethereum/core.NewBlockChain({0x3065038?, 0x40016fa7c8}, 0x4?, 0x0?, 0x0?, {0x3053c20?, 0x4000410700?}, {{0x0, 0x0}, 0x0, ...}, ...) Jan 25 20:43:18 orangepi5 bash[194799]: /home/runner/work/bsc/bsc/core/blockchain.go:403 +0x1174 Jan 25 20:43:18 orangepi5 bash[194799]: github.com/ethereum/go-ethereum/eth.New(0x400019e8c0, 0x40013d0000) Jan 25 20:43:18 orangepi5 bash[194799]: /home/runner/work/bsc/bsc/eth/backend.go:252 +0x1234 Jan 25 20:43:18 orangepi5 bash[194799]: github.com/ethereum/go-ethereum/cmd/utils.RegisterEthService(0x0?, 0x40013d0000) Jan 25 20:43:18 orangepi5 bash[194799]: /home/runner/work/bsc/bsc/cmd/utils/flags.go:2156 +0x120 Jan 25 20:43:18 orangepi5 bash[194799]: main.makeFullNode(0x4001a3fbf8?) Jan 25 20:43:18 orangepi5 bash[194799]: /home/runner/work/bsc/bsc/cmd/geth/config.go:175 +0x208 Jan 25 20:43:18 orangepi5 bash[194799]: main.geth(0x40016a35c0) Jan 25 20:43:18 orangepi5 bash[194799]: /home/runner/work/bsc/bsc/cmd/geth/main.go:341 +0xbc Jan 25 20:43:18 orangepi5 bash[194799]: github.com/urfave/cli/v2.(Command).Run(0x400077e160, 0x40016a35c0, {0x40001a6000, 0x15, 0x16}) Jan 25 20:43:18 orangepi5 bash[194799]: /home/runner/go/pkg/mod/github.com/urfave/cli/v2@v2.25.7/command.go:274 +0x73c Jan 25 20:43:18 orangepi5 bash[194799]: github.com/urfave/cli/v2.(App).RunContext(0x40013ca000, {0x303d7c0?, 0x40001a0020}, {0x40001a6000, 0x15, 0x16}) Jan 25 20:43:18 orangepi5 bash[194799]: /home/runner/go/pkg/mod/github.com/urfave/cli/v2@v2.25.7/app.go:332 +0x568 Jan 25 20:43:18 orangepi5 bash[194799]: github.com/urfave/cli/v2.(App).Run(...) Jan 25 20:43:18 orangepi5 bash[194799]: /home/runner/go/pkg/mod/github.com/urfave/cli/v2@v2.25.7/app.go:309
PBSS + v1.3.6/v1.3.7 OOM, please keep v1.3.5 version
Due to the update of https://github.com/bnb-chain/bsc/pull/2155 code, v1.3.8 cannot start this snapshot, please wait for the new snapshot to be released