Closed xavier-romero closed 1 month ago
Also found that behavior repeating same test. In this case again, sequencer won't recover after recover, instead it throws a panic (by restarting sequencer with executor already running)
[cdk-erigon-sequencer-001] [INFO] [08-23|13:27:30.308] Build info git_branch= git_tag= git_commit=
[cdk-erigon-sequencer-001] [INFO] [08-23|13:27:30.308] Poseidon hashing Accelerated=true
[cdk-erigon-sequencer-001] [INFO] [08-23|13:27:30.308] Starting Erigon on dynamic chain chain=dynamic-kurtosis
[cdk-erigon-sequencer-001] [INFO] [08-23|13:27:30.309] Maximum peer count ETH=0 total=0
[cdk-erigon-sequencer-001] [INFO] [08-23|13:27:30.310] starting HTTP APIs APIs=eth,debug,net,trace,web3,erigon,zkevm,txpool
[cdk-erigon-sequencer-001] [INFO] [08-23|13:27:30.310] torrent verbosity level=WRN
[cdk-erigon-sequencer-001] [INFO] [08-23|13:27:32.412] Set global gas cap cap=50000000
[cdk-erigon-sequencer-001] [WARN] [08-23|13:27:32.415] NetworkID is not set for dynamic chain chain=dynamic-kurtosis networkID=1
[cdk-erigon-sequencer-001] [INFO] [08-23|13:27:32.415] [Downloader] Runnning with ipv6-enabled=true ipv4-enabled=true download.rate=16mb upload.rate=4mb
[cdk-erigon-sequencer-001] [INFO] [08-23|13:27:32.415] Opening Database label=chaindata path=/home/erigon/data/dynamic-kurtosis-sequencer/chaindata
[cdk-erigon-sequencer-001] [INFO] [08-23|13:27:32.420] Initialised chain configuration config="{ChainID: 10101, Homestead: 0, DAO: 0, Tangerine Whistle: 0, Spurious Dragon: 0, Byzantium: 0, Constantinople: 0, Petersburg: 0, Istanbul: 0, Muir Glacier: 0, Berlin: 0, London: 18446744073709551615, Arrow Glacier: 9999999999999999999999999999999999999999999999999, Gray Glacier: 9999999999999999999999999999999999999999999999999, Terminal Total Difficulty: 58750000000000000000000, Merge Netsplit: <nil>, Shanghai: 18446744073709551615, Cancun: 18446744073709551615, Prague: 18446744073709551615, Engine: ethash}" genesis=0x75aedf14f686d79c7bb46a03afc30ee9f02a00108f83751e7a43ecdee98f6756
[cdk-erigon-sequencer-001] [INFO] [08-23|13:27:32.420] Effective prune_flags= snapshot_flags= history.v3=false
[cdk-erigon-sequencer-001] [INFO] [08-23|13:27:32.425] Initialising Ethereum protocol network=1
[cdk-erigon-sequencer-001] [INFO] [08-23|13:27:32.425] Disk storage enabled for ethash DAGs dir=/home/erigon/data/dynamic-kurtosis-sequencer/ethash-dags count=2
[cdk-erigon-sequencer-001] [INFO] [08-23|13:27:32.436] Starting private RPC server on=localhost:9092
[cdk-erigon-sequencer-001] [INFO] [08-23|13:27:32.436] new subscription to logs established
[cdk-erigon-sequencer-001] [INFO] [08-23|13:27:32.538] [txpool] Started
[cdk-erigon-sequencer-001] panic: runtime error: index out of range [1] with length 0
[cdk-erigon-sequencer-001]
[cdk-erigon-sequencer-001] goroutine 1 [running]:
[cdk-erigon-sequencer-001] github.com/ledgerwatch/erigon/zk/txpool.BestQueue.Swap(...)
[cdk-erigon-sequencer-001] github.com/ledgerwatch/erigon/zk/txpool/pool.go:2401
[cdk-erigon-sequencer-001] container/heap.down({0x2f1aac0, 0xc0019a37c0}, 0x0, 0x947)
[cdk-erigon-sequencer-001] container/heap/heap.go:114 +0x4b
[cdk-erigon-sequencer-001] container/heap.Pop({0x2f1aac0, 0xc0019a37c0})
[cdk-erigon-sequencer-001] container/heap/heap.go:62 +0x5b
[cdk-erigon-sequencer-001] github.com/ledgerwatch/erigon/zk/txpool.(*SubPool).PopBest(0xc0019a37a0)
[cdk-erigon-sequencer-001] github.com/ledgerwatch/erigon/zk/txpool/pool.go:2255 +0x28
[cdk-erigon-sequencer-001] github.com/ledgerwatch/erigon/zk/txpool.promote(0xc0013da210, 0xc0019a3740, 0xc0019a37a0, 0x3b9aca00, 0xc003249380, 0xc000ee79f0)
[cdk-erigon-sequencer-001] github.com/ledgerwatch/erigon/zk/txpool/pool.go:1337 +0x7de
[cdk-erigon-sequencer-001] github.com/ledgerwatch/erigon/zk/txpool.(*TxPool).addTxs(0xc000ebd440, 0xd7, {0x2f084d0, 0xc0001273f8}, 0x104d908?, {{0xc003e1a000, 0x100a, 0x1400}, {0xc003f00000, 0x140c8, ...}, ...}, ...)
[cdk-erigon-sequencer-001] github.com/ledgerwatch/erigon/zk/txpool/pool.go:1052 +0x9ab
[cdk-erigon-sequencer-001] github.com/ledgerwatch/erigon/zk/txpool.(*TxPool).fromDB(0xc000ebd440, {0x2f159a0, 0xc000ee62d0}, {0x2f331a0, 0xc0013dcfc0}, {0x2f331a0?, 0xc0013dc5a0?})
[cdk-erigon-sequencer-001] github.com/ledgerwatch/erigon/zk/txpool/pool.go:1702 +0x93a
[cdk-erigon-sequencer-001] github.com/ledgerwatch/erigon/zk/txpool.(*TxPool).StartIfNotStarted(0xc000ebd440, {0x2f159a0, 0xc000ee62d0}, {0x2f1db68?, 0xc0011e62a0?}, {0x2f331a0, 0xc0013dc5a0})
[cdk-erigon-sequencer-001] github.com/ledgerwatch/erigon/zk/txpool/pool_zk.go:270 +0x128
[cdk-erigon-sequencer-001] github.com/ledgerwatch/erigon/eth.New(0xc000352d20, 0x41605c0)
[cdk-erigon-sequencer-001] github.com/ledgerwatch/erigon/eth/backend.go:871 +0x645d
[cdk-erigon-sequencer-001] github.com/ledgerwatch/erigon/turbo/node.New(0xc0000ba060?, 0x27092e5?)
[cdk-erigon-sequencer-001] github.com/ledgerwatch/erigon/turbo/node/node.go:114 +0x74
[cdk-erigon-sequencer-001] main.runErigon(0xc001093a20?)
[cdk-erigon-sequencer-001] github.com/ledgerwatch/erigon/cmd/cdk-erigon/main.go:65 +0x295
[cdk-erigon-sequencer-001] github.com/urfave/cli/v2.(*Command).Run(0xc001093a20, 0xc0006916c0, {0xc0000522a0, 0x6, 0x6})
[cdk-erigon-sequencer-001] github.com/urfave/cli/v2@v2.25.7/command.go:274 +0x9eb
[cdk-erigon-sequencer-001] github.com/urfave/cli/v2.(*App).RunContext(0xc00102a780, {0x2f159d8?, 0xc000056028}, {0xc0000522a0, 0x6, 0x6})
[cdk-erigon-sequencer-001] github.com/urfave/cli/v2@v2.25.7/app.go:332 +0x616
[cdk-erigon-sequencer-001] github.com/urfave/cli/v2.(*App).Run(...)
[cdk-erigon-sequencer-001] github.com/urfave/cli/v2@v2.25.7/app.go:309
[cdk-erigon-sequencer-001] main.main()
[cdk-erigon-sequencer-001] github.com/ledgerwatch/erigon/cmd/cdk-erigon/main.go:39 +0x85
These issues are fixed in beta18.x
System information
Kurtosis + Erigon v2 beta15
Actual behaviour
Issues I've seen on kurtosis with beta15 by stopping executor.
Case 1:
Firstly zkProver got killed by oom due to high memory -that's how I got to the issue-, then I reproduced by manually stopping the executor during tx processing-> Sequencer invalid batch, the system does NOT recover by restarting executor neither by restarting sequencer.
Case 2:
In this case, the system recovered after restarting executor and sequencer, and resumed processing.
Steps to reproduce the behaviour
Run the network with kurtosis While sending txs, stop executor for a while (1 minute seems enough) Check Sequencer logs The final behavior seems to be different depending on the exact moment you stop executor and/or the amount and size of txs being send and/or some random/unknown factors.