48Club / bsc-snapshots

bsc daily snapshot
https://www.48.club
157 stars 21 forks source link

PBSS + v1.3.6/v1.3.7 OOM, please keep v1.3.5 version #132

Closed du5 closed 10 months ago

du5 commented 10 months ago

PBSS + v1.3.6/v1.3.7 OOM, please keep v1.3.5 version

Due to the update of https://github.com/bnb-chain/bsc/pull/2155 code, v1.3.8 cannot start this snapshot, please wait for the new snapshot to be released

du5 commented 10 months ago

If you capture relevant metrics information, please collect it and feedback it to the BSC team

https://github.com/bnb-chain/bsc/issues/new

zzzckck commented 10 months ago

AFIK, v1.3.7 does not have any memory related changes, no idea why v1.3.7 has the OOM issue while v1.3.6 not.

du5 commented 10 months ago

According to feedback from a community user, he experienced an OOM after running the pbss node for two days and could no longer start it. The same problem also occurred after I upgraded to 1.3.7, but the running time was different. It seemed that there was no specific pattern. Positioning The problem may be more troublesome

du5 commented 10 months ago

This is the stdout information of restart after oom

root@snap-helper /opt # ./pbss.sh
INFO [01-02|15:39:12.848] Starting Geth on BSC mainnet...
INFO [01-02|15:39:12.848] Bumping default cache on mainnet         provided=1024 updated=4096
INFO [01-02|15:39:12.849] Maximum peer count                       ETH=256 LES=0 total=256
INFO [01-02|15:39:12.850] Using pebble as db engine
INFO [01-02|15:39:12.925] Using pebble as the backing database
INFO [01-02|15:39:12.925] Allocated cache and file handles         database=/opt/geth.pbss/geth/chaindata cache=1.60GiB handles=524,288 "memory table"=409.50MiB
INFO [01-02|15:39:13.073] Found legacy ancient chain path          location=/opt/geth.pbss/geth/chaindata/ancient
INFO [01-02|15:39:13.076] Opened ancient database                  database=/opt/geth.pbss/geth/chaindata/ancient readonly=false frozen=34,629,314
INFO [01-02|15:39:13.078] All are provided, state scheme set to already existing scheme=path
INFO [01-02|15:39:13.084] Set global gas cap                       cap=50,000,000
INFO [01-02|15:39:13.084] Initializing the KZG library             backend=gokzg
INFO [01-02|15:39:13.141] Capped dirty cache size                  provided=1024.00MiB adjusted=256.00MiB
INFO [01-02|15:39:13.141] Clean cache size                         provided=614.00MiB
INFO [01-02|15:39:13.142] Allocated trie memory caches             clean=614.00MiB dirty=256.00MiB
INFO [01-02|15:39:13.160] Using pebble as the backing database
INFO [01-02|15:39:13.160] Allocated cache and file handles         database=/opt/geth.pbss/geth/chaindata         cache=1.60GiB handles=524,288 "memory table"=409.50MiB
INFO [01-02|15:39:13.281] Found legacy ancient chain path          location=/opt/geth.pbss/geth/chaindata/ancient
INFO [01-02|15:39:13.282] Read ancientdb item counts               items=0
INFO [01-02|15:39:13.283] Opened ancientdb with nodata mode        database=/opt/geth.pbss/geth/chaindata/ancient frozen=34,629,314
INFO [01-02|15:39:13.285] Parlia                                   chainConfig="{ChainID: 56 Homestead: 0 DAO: <nil> DAOSupport: false EIP150: 0 EIP155: 0 EIP158: 0 Byzantium: 0 Constantinople: 0 Petersburg: 0 Istanbul: 0, Muir Glacier: 0, Ramanujan: 0, Niels: 0, MirrorSync: 5184000, Bruno: 13082000, Berlin: 31302048, YOLO v3: <nil>, CatalystBlock: <nil>, London: 31302048, ArrowGlacier: <nil>, MergeFork:<nil>, Euler: 18907621, Gibbs: 23846001, Nano: 21962149, Moran: 22107423, Planck: 27281024,Luban: 29020050, Plato: 30720096, Hertz: 31302048, Hertzfix: 34140700, ShanghaiTime: 1705996800, KeplerTime: 1705996800, Engine: parlia}"
INFO [01-02|15:39:13.481] Initialising Ethereum protocol           network=56 dbversion=8
INFO [01-02|15:39:14.253] new async node buffer                    limit=256.00MiB layers=74
WARN [01-02|15:39:15.329] Path-based state scheme is an experimental feature sync=false
INFO [01-02|15:39:15.509] Initialised chain configuration          config="{ChainID: 56 Homestead: 0 DAO: <nil> DAOSupport: false EIP150: 0 EIP155: 0 EIP158: 0 Byzantium: 0 Constantinople: 0 Petersburg: 0 Istanbul: 0, Muir Glacier: 0, Ramanujan: 0, Niels: 0, MirrorSync: 5184000, Bruno: 13082000, Berlin: 31302048, YOLO v3: <nil>, CatalystBlock: <nil>, London: 31302048, ArrowGlacier: <nil>, MergeFork:<nil>, Euler: 18907621, Gibbs: 23846001, Nano: 21962149, Moran: 22107423, Planck: 27281024,Luban: 29020050, Plato: 30720096, Hertz: 31302048, Hertzfix: 34140700, ShanghaiTime: 1705996800, KeplerTime: 1705996800, Engine: parlia}"
INFO [01-02|15:39:16.205] Loaded most recent local block           number=34,719,314 hash=69a3b8..11f2bd root=5257e5..b31cd1 td=68,977,079 age=5d19h40m
INFO [01-02|15:39:16.283] Loaded most recent local finalized block number=34,719,312 hash=9eb38f..84e2a5 root=74bf91..38355e td=68,977,075 age=5d19h40m
INFO [01-02|15:39:16.363] Loaded last snap-sync pivot marker       number=34,580,824
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x30 pc=0xcd121c]

goroutine 1 [running]:
github.com/ethereum/go-ethereum/core/rawdb.(*ResettableFreezer).AncientRange(0xcfee40?, {0x28e40c2?, 0xc03b2298d8?}, 0xc03b2299a8?, 0x248caa0?, 0xc03efd6ed0?)
        /opt/bsc/core/rawdb/freezer_resettable.go:126 +0x5c
github.com/ethereum/go-ethereum/core/rawdb.ReadStateHistoryMetaList(...)
        /opt/bsc/core/rawdb/accessors_state.go:180
github.com/ethereum/go-ethereum/trie/triedb/pathdb.checkHistories(0x0, 0x1b8872ccbeaa9682?, 0xc591320e457d591f?, 0xc03349d750)
        /opt/bsc/trie/triedb/pathdb/history.go:548 +0x85
github.com/ethereum/go-ethereum/trie/triedb/pathdb.(*Database).Recoverable(0xc002910050, {0x3, 0x77, 0xc4, 0x5, 0xd2, 0xe5, 0x36, 0x52, 0x82, ...})
        /opt/bsc/trie/triedb/pathdb/database.go:363 +0x205
github.com/ethereum/go-ethereum/trie.(*Database).Recoverable(0x7faf11c48890?, {0x3, 0x77, 0xc4, 0x5, 0xd2, 0xe5, 0x36, 0x52, 0x82, ...})
        /opt/bsc/trie/database.go:320 +0x45
github.com/ethereum/go-ethereum/core.NewBlockChain({0x3376ad8?, 0xc000126600}, 0x0?, 0x7ffffffe805afca8?, 0x0?, {0x33653c0?, 0xc0012b1100?}, {{0x0, 0x0}, 0x0, ...}, ...)
        /opt/bsc/core/blockchain.go:403 +0x14b0
github.com/ethereum/go-ethereum/eth.New(0xc0001aac40, 0xc00171f800)
        /opt/bsc/eth/backend.go:252 +0x170f
github.com/ethereum/go-ethereum/cmd/utils.RegisterEthService(0x0?, 0xc00171f800)
        /opt/bsc/cmd/utils/flags.go:2154 +0x167
main.makeFullNode(0xc00153fbf0?)
        /opt/bsc/cmd/geth/config.go:181 +0x255
main.geth(0xc001729b80)
        /opt/bsc/cmd/geth/main.go:341 +0xf3
github.com/urfave/cli/v2.(*Command).Run(0xc0017ffb80, 0xc001729b80, {0xc0001aa000, 0xe, 0xe})
        /home/runner/go/pkg/mod/github.com/urfave/cli/v2@v2.25.7/command.go:274 +0x9eb
github.com/urfave/cli/v2.(*App).RunContext(0xc0006ab2c0, {0x334ef10?, 0xc0001ac000}, {0xc0001aa000, 0xe, 0xe})
        /home/runner/go/pkg/mod/github.com/urfave/cli/v2@v2.25.7/app.go:332 +0x616
github.com/urfave/cli/v2.(*App).Run(...)
        /home/runner/go/pkg/mod/github.com/urfave/cli/v2@v2.25.7/app.go:309
main.main()
        /opt/bsc/cmd/geth/main.go:284 +0x47

root@snap-helper /opt # cat pbss.sh 
geth --datadir=geth.pbss --history.transactions=0 --tries-verify-mode=local --db.engine=pebble --maxpeers=256 --syncmode=full --ipcpath=/opt/ipc.ipc --port=30311 --discovery.port=30311 --disablesnapprotocol=true --pruneancient=true --config=config.toml --state.scheme=path
root@snap-helper /opt # cat config.toml 
[Eth]
NetworkId = 56
LightPeers = 100
TrieTimeout = 150000000000
StateScheme = "path"

[Eth.Miner]
GasCeil = 140000000
GasPrice = 3000000000
Recommit = 10000000000

[Eth.TxPool]
Locals = []
NoLocals = true
Journal = "transactions.rlp"
Rejournal = 3600000000000
PriceLimit = 3000000000
PriceBump = 10
AccountSlots = 200
GlobalSlots = 8000
AccountQueue = 200
GlobalQueue = 4000

[Eth.GPO]
Blocks = 20
Percentile = 60
OracleThreshold = 1000

[Node]
IPCPath = "geth.ipc"
HTTPHost = "localhost"
InsecureUnlockAllowed = false
HTTPPort = 8545
HTTPVirtualHosts = ["localhost"]
HTTPModules = ["eth", "net", "web3", "txpool", "parlia"]
WSPort = 8546
WSModules = ["net", "web3", "eth"]

[Node.P2P]
MaxPeers = 200
NoDiscovery = false
ListenAddr = ":30311"
EnableMsgEvents = false
du5 commented 10 months ago

Other users have reported that version 1.3.6 has the same problem and they may need to use version 1.3.5.

https://github.com/bnb-chain/bsc/issues/2141

sysvm commented 10 months ago

@du5 which snapshot do you use? Do you do any operations before panic happens, such as restart geth?

du5 commented 10 months ago

@du5 which snapshot do you use? Do you do any operations before panic happens, such as restart geth?

The version I found the problem with is https://snapshots.48.club/geth.pbss.34712063.tar.zst, geth was launched without any warning, without any Panic error, it ended directly, based on these symptoms I identified it as "oom".

When "oom" occurs and you start again, there will be a panic log.

What I did after that was to re-unzip and synchronize using v1.3.6, which worked for me, and it has been working normally until now. But I observed that v1.3.6 also had users reporting this problem, and it was solved in v1.3.5. This problem will occur with snapshots built using BSCTeam or 48Club. It seems that the snapshot itself is not damaged.

btw, the geth.pbss.34712063.tar.zst snapshot has been deleted, but the latest snapshot is obtained after synchronizing this snapshot.

xux1217 commented 9 months ago

I find this issue has completed, so what's the version we should use with PBSS? The README still display we need use v1.3.5

zzzckck commented 9 months ago

the latest v1.3.7 is ok to run PBSS, but may have some issue with some snapshot provided by 48Club, due to the --pruneancient compatible issue. Meanwhile, there will be another release v1.3.8, likely next week. You'd better to try v1.3.8 once it is ready

du5 commented 9 months ago

I find this issue has completed, so what's the version we should use with PBSS? The README still display we need use v1.3.5

Due to a series of problems caused by pruneancient, we have decided not to use this tag in the future. There are many problems with the future of bsc-geth. Turning this tag on in version 1.3.x will not prune the database, and the database size continues to grow.

I have multiple nodes where pruneancient is also turned on. The minimum database size is 1.1tb and the maximum is 1.9tb. I think there is a problem with the pruneancient function logic and it is not a problem with the snapshot.

Regarding the conflict between pbss and pruneancient, I still recommend using version v1.3.5

xux1217 commented 9 months ago

I use v1.3.5 version bsc-geth and this snapshot "https://snapshots.48.club/geth.pbss.35485953.tar.zst", still oom.

and restart the process report "panic: runtime error: invalid memory address or nil pointer dereference":


goroutine 1 [running]: github.com/ethereum/go-ethereum/core/rawdb.(ResettableFreezer).AncientRange(0xcfe340?, {0x28e1341?, 0xc017550b68?}, 0xc017550c68?, 0x248a500?, 0xc01872a420?) /home/runner/work/bsc/bsc/core/rawdb/freezer_resettable.go:125 +0x5c github.com/ethereum/go-ethereum/core/rawdb.ReadStateHistoryMetaList(...) /home/runner/work/bsc/bsc/core/rawdb/accessors_state.go:180 github.com/ethereum/go-ethereum/trie/triedb/pathdb.checkHistories(0x0, 0x13206fae5cdc8042?, 0xbd42451522faaccd?, 0xc01349d750) /home/runner/work/bsc/bsc/trie/triedb/pathdb/history.go:548 +0x85 github.com/ethereum/go-ethereum/trie/triedb/pathdb.(Database).Recoverable(0xc0113eb450, {0xa3, 0x1a, 0x76, 0xb8, 0x13, 0xe6, 0x1d, 0x22, 0x42, ...}) /home/runner/work/bsc/bsc/trie/triedb/pathdb/database.go:363 +0x205 github.com/ethereum/go-ethereum/trie.(Database).Recoverable(0x7faef645daa8?, {0xa3, 0x1a, 0x76, 0xb8, 0x13, 0xe6, 0x1d, 0x22, 0x42, ...}) /home/runner/work/bsc/bsc/trie/database.go:320 +0x45 github.com/ethereum/go-ethereum/core.NewBlockChain({0x33a14d8?, 0xc0134763c0}, 0x0?, 0x0?, 0x0?, {0x338fdc0?, 0xc00127f100?}, {{0x0, 0x0}, 0x0, ...}, ...) /home/runner/work/bsc/bsc/core/blockchain.go:403 +0x14b0 github.com/ethereum/go-ethereum/eth.New(0xc0010520e0, 0xc0014b1000) /home/runner/work/bsc/bsc/eth/backend.go:252 +0x170f github.com/ethereum/go-ethereum/cmd/utils.RegisterEthService(0x0?, 0xc0014b1000) /home/runner/work/bsc/bsc/cmd/utils/flags.go:2156 +0x167 main.makeFullNode(0xc001c3fbf0?) /home/runner/work/bsc/bsc/cmd/geth/config.go:175 +0x255 main.geth(0xc001a21340) /home/runner/work/bsc/bsc/cmd/geth/main.go:341 +0xf3 github.com/urfave/cli/v2.(Command).Run(0xc001aac000, 0xc001a21340, {0xc000134000, 0x12, 0x12}) /home/runner/go/pkg/mod/github.com/urfave/cli/v2@v2.25.7/command.go:274 +0x9eb github.com/urfave/cli/v2.(App).RunContext(0xc0013a0f00, {0x3379910?, 0xc000130010}, {0xc000134000, 0x12, 0x12}) /home/runner/go/pkg/mod/github.com/urfave/cli/v2@v2.25.7/app.go:332 +0x616 github.com/urfave/cli/v2.(App).Run(...) /home/runner/go/pkg/mod/github.com/urfave/cli/v2@v2.25.7/app.go:309 main.main() /home/runner/work/bsc/bsc/cmd/geth/main.go:284 +0x47

so I think we should not use the snapshot with pbss flag.

du5 commented 9 months ago

@xux1217 If the database is damaged, downgrading cannot repair it at this time. You need to download it again. Downgrading must be done before the database is damaged.

xux1217 commented 9 months ago

I am sure that I first download the snapshot, and then just use the v1.3.5 bsc-geth to start, not downgrade action.

my start cmd: ./geth --config ./config.toml --datadir /data/geth.full --syncmode=full --db.engine=pebble --cache 8000 --rpc.allow-unprotected-txs --history.transactions=0 --tries-verify-mode=local --diffblock=5000 --http --http.corsdomain=* --http.vhosts=* --pruneancient --state.scheme path

and the config.yaml:

[Eth]
NetworkId = 56
LightPeers = 100
TrieTimeout = 150000000000
StateScheme = "path"

[Eth.Miner]
GasCeil = 140000000
GasPrice = 3000000000
Recommit = 10000000000

[Eth.TxPool]
Locals = []
NoLocals = true
Journal = "transactions.rlp"
Rejournal = 3600000000000
PriceLimit = 3000000000
PriceBump = 10
AccountSlots = 200
GlobalSlots = 8000
AccountQueue = 200
GlobalQueue = 4000

[Eth.GPO]
Blocks = 20
Percentile = 60
OracleThreshold = 1000

[Node]
IPCPath = "geth.ipc"
HTTPHost = "0.0.0.0"
InsecureUnlockAllowed = false
HTTPPort = 8545
HTTPVirtualHosts = ["*"]
HTTPModules = ["eth", "net", "web3", "txpool", "parlia","debug"]
WSPort = 8546
WSModules = ["net", "web3", "eth"]

[Node.P2P]
MaxPeers = 200
NoDiscovery = false
StaticNodes = []
ListenAddr = ":30311"
EnableMsgEvents = false

[Node.LogConfig]
FilePath = "bsc.log"
MaxBytesSize = 10485760
Level = "info"
FileRoot = ""

@xux1217 If the database is damaged, downgrading cannot repair it at this time. You need to download it again. Downgrading must be done before the database is damaged.

SECTOR-1 commented 9 months ago

i have the same issue, fresh download with 1.3.5 Jan 25 20:43:18 orangepi5 bash[194799]: panic: runtime error: invalid memory address or nil pointer dereference Jan 25 20:43:18 orangepi5 bash[194799]: [signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0xbc4664] Jan 25 20:43:18 orangepi5 bash[194799]: goroutine 1 [running]: Jan 25 20:43:18 orangepi5 bash[194799]: github.com/ethereum/go-ethereum/core/rawdb.(ResettableFreezer).AncientRange(0x400052c230?, {0x25a47b7?, 0x4011d54d00?}, 0x40001031e8?, 0x40001032e8?, 0x214da80?) Jan 25 20:43:18 orangepi5 bash[194799]: /home/runner/work/bsc/bsc/core/rawdb/freezer_resettable.go:125 +0x34 Jan 25 20:43:18 orangepi5 bash[194799]: github.com/ethereum/go-ethereum/core/rawdb.ReadStateHistoryMetaList(...) Jan 25 20:43:18 orangepi5 bash[194799]: /home/runner/work/bsc/bsc/core/rawdb/accessors_state.go:180 Jan 25 20:43:18 orangepi5 bash[194799]: github.com/ethereum/go-ethereum/trie/triedb/pathdb.checkHistories(0x0, 0xdfacbb468a60fb71?, 0xc9c7e760d93b58c2?, 0x40144b3728) Jan 25 20:43:18 orangepi5 bash[194799]: /home/runner/work/bsc/bsc/trie/triedb/pathdb/history.go:548 +0x70 Jan 25 20:43:18 orangepi5 bash[194799]: github.com/ethereum/go-ethereum/trie/triedb/pathdb.(Database).Recoverable(0x4011976140, {0xf2, 0x63, 0x8c, 0x88, 0x1b, 0x4f, 0xe9, 0x8a, 0x71, ...}) Jan 25 20:43:18 orangepi5 bash[194799]: /home/runner/work/bsc/bsc/trie/triedb/pathdb/database.go:363 +0x17c Jan 25 20:43:18 orangepi5 bash[194799]: github.com/ethereum/go-ethereum/trie.(Database).Recoverable(0x7f57c525e8?, {0xf2, 0x63, 0x8c, 0x88, 0x1b, 0x4f, 0xe9, 0x8a, 0x71, ...}) Jan 25 20:43:18 orangepi5 bash[194799]: /home/runner/work/bsc/bsc/trie/database.go:320 +0x44 Jan 25 20:43:18 orangepi5 bash[194799]: github.com/ethereum/go-ethereum/core.NewBlockChain({0x3065038?, 0x40016fa7c8}, 0x4?, 0x0?, 0x0?, {0x3053c20?, 0x4000410700?}, {{0x0, 0x0}, 0x0, ...}, ...) Jan 25 20:43:18 orangepi5 bash[194799]: /home/runner/work/bsc/bsc/core/blockchain.go:403 +0x1174 Jan 25 20:43:18 orangepi5 bash[194799]: github.com/ethereum/go-ethereum/eth.New(0x400019e8c0, 0x40013d0000) Jan 25 20:43:18 orangepi5 bash[194799]: /home/runner/work/bsc/bsc/eth/backend.go:252 +0x1234 Jan 25 20:43:18 orangepi5 bash[194799]: github.com/ethereum/go-ethereum/cmd/utils.RegisterEthService(0x0?, 0x40013d0000) Jan 25 20:43:18 orangepi5 bash[194799]: /home/runner/work/bsc/bsc/cmd/utils/flags.go:2156 +0x120 Jan 25 20:43:18 orangepi5 bash[194799]: main.makeFullNode(0x4001a3fbf8?) Jan 25 20:43:18 orangepi5 bash[194799]: /home/runner/work/bsc/bsc/cmd/geth/config.go:175 +0x208 Jan 25 20:43:18 orangepi5 bash[194799]: main.geth(0x40016a35c0) Jan 25 20:43:18 orangepi5 bash[194799]: /home/runner/work/bsc/bsc/cmd/geth/main.go:341 +0xbc Jan 25 20:43:18 orangepi5 bash[194799]: github.com/urfave/cli/v2.(Command).Run(0x400077e160, 0x40016a35c0, {0x40001a6000, 0x15, 0x16}) Jan 25 20:43:18 orangepi5 bash[194799]: /home/runner/go/pkg/mod/github.com/urfave/cli/v2@v2.25.7/command.go:274 +0x73c Jan 25 20:43:18 orangepi5 bash[194799]: github.com/urfave/cli/v2.(App).RunContext(0x40013ca000, {0x303d7c0?, 0x40001a0020}, {0x40001a6000, 0x15, 0x16}) Jan 25 20:43:18 orangepi5 bash[194799]: /home/runner/go/pkg/mod/github.com/urfave/cli/v2@v2.25.7/app.go:332 +0x568 Jan 25 20:43:18 orangepi5 bash[194799]: github.com/urfave/cli/v2.(App).Run(...) Jan 25 20:43:18 orangepi5 bash[194799]: /home/runner/go/pkg/mod/github.com/urfave/cli/v2@v2.25.7/app.go:309