Closed xavier-romero closed 3 days ago
@xavier-romero can you confirm which Fork this bug was observed on please?
@mandrigin will look into but not seen as a showstopper
TODO, we should collect a pprof/goroutined dump when the instance is stuck.
Deja vu - we fixed this in cdk-erigon-lib but as part of the upstream merge of 2.60 the updates to MDBX weren't ported over.
Fix in beta 10 to validate.
With this fix https://github.com/0xPolygonHermez/cdk-erigon/pull/1472 , X Layer erigon RPC still stuck.
With https://github.com/0xPolygonHermez/cdk-erigon/releases/tag/v2.60.0-beta10 and following configure, the rpc will stuck.
datadir: /data/erigon-data/xlayer-mainnet
chain: xlayer-mainnet
http: true
private.api.addr: localhost:18091
zkevm.l2-chain-id: 196
zkevm.l2-sequencer-rpc-url: https://rpc.xlayer.tech
zkevm.l2-datastreamer-url: stream.xlayer.tech:8800
zkevm.l1-chain-id: 1
zkevm.l1-rpc-url: https://rpc.ankr.com/eth/{replace to your eth rpc}
zkevm.address-sequencer: "0xAF9d27ffe4d51eD54AC8eEc78f2785D7E11E5ab1"
zkevm.address-zkevm: "0x2B0ee28D4D51bC9aDde5E58E295873F61F4a0507"
zkevm.address-rollup: "0x5132A183E9F3CB7C848b0AAC5Ae0c4f0491B7aB2"
zkevm.address-ger-manager: "0x580bda1e7A0CFAe92Fa7F6c20A3794F169CE3CFb"
zkevm.l1-rollup-id: 3
zkevm.l1-first-block: 19218658
zkevm.l1-block-range: 2000
zkevm.l1-query-delay: 1000
zkevm.datastream-version: 3
http.api: [eth, debug, net, trace, web3, erigon, zkevm]
http.addr: 0.0.0.0
http.port: 28544
Is this a problem with syncing and holding the network tip @giskook ? I see you mentioned eth_getLogs crashing which is a different issue altogether.
Is this a problem with syncing and holding the network tip @giskook ? I see you mentioned eth_getLogs crashing which is a different issue altogether.
Maybe it's a different issue, let's figure out the stuck one first.
curl http://127.0.0.1:47050/debug/pprof/goroutine?debug=1 > goroutines.log goroutines.log
curl http://127.0.0.1:47050/debug/pprof/profile?seconds=60 > pprof.bin pprof.bin.log
Igor noted this is specific to Xlayer
@hexoscott will close this issue as RPC / Sequencer stuck is resolved. Scott will open a new issue for a specific OKX issue.
Closing this down as the deadlock problem in the sequencer is now fixed, I have opened #1485 to tackle the RPC syncing issue as something separate.
A partner reported issue with sequencer stopping to process transactions under high load, and they identified this issue as related to
db.read.concurrency
configuration. They report that increasing that number the issue goes away. I did not reproduce exactly, but very similar scenario though. So, I've setdb.read.concurrency
to 5 to be able to reach the "high load" easily, then sending simple EOA transfers to the RPC, it gets stuck after few txs.Even after stopping the txs for hours and with no activity at all, the sequencer remains "stuck".