bnb-chain / opbnb

MIT License
398 stars 158 forks source link

Node cannot catch up #92

Closed zhangxinars closed 6 months ago

zhangxinars commented 7 months ago

System information

Network: mainnet

op-node version: ghcr.io/bnb-chain/op-node:0.2.2 OS & Version: Linux

Since December 7th, our node cannot catch up

Here are our startup commands: command: > op-node --l1.trustrpc --sequencer.l1-confs=15 --verifier.l1-confs=0 --l1.http-poll-interval=3s --l1.epoch-poll-interval=45s --l1.rpc-max-batch-size=20 --rollup.config=/rollup.json --rpc.addr=0.0.0.0 --rpc.port=8546 --p2p.sync.req-resp --p2p.listen.ip=0.0.0.0 --p2p.listen.tcp=9003 --p2p.listen.udp=9003 --snapshotlog.file=./snapshot.log --p2p.priv.raw= --l1={private node} --l2=http://l2:8551/ --l2.jwt-secret=/config/jwt-secret.txt --rpc.enable-admin --rpc.addr=0.0.0.0 --rpc.port=8545

This is log:

t=2023-12-07T16:38:33+0000 lvl=warn msg="L1 head signal indicates a possible L1 re-org" old_l1_head=0x16ad12d7437bdc3c4d22d8e9bfb822eb11406b110c99367e5d40343fcf78952b:34150796 new_l1_head_parent=0x088a1ae5dd978c9cc8ee0bdf12995caaebff4cdb8576ab305491d2b01ea7a58b new_l1_head=0xfaf427fac3297dd248b86525c76188c95b30ab30bf341075a76c33c7adb4aea7:34150798 t=2023-12-07T16:38:33+0000 lvl=info msg="New L1 finalized block" l1_finalized=0x1347b16a9513b2d85e89c03aac26b1199c3984d92784c1cc30bf21682e3d76ef:34150822 t=2023-12-07T16:38:33+0000 lvl=info msg="received L1 finality signal, but missing data for immediate L2 finalization" prev_finalized_l1=0x1347b16a9513b2d85e89c03aac26b1199c3984d92784c1cc30bf21682e3d76ef:34150822 signaled_finalized_l1=0x1347b16a9513b2d85e89c03aac26b1199c3984d92784c1cc30bf21682e3d76ef:34150822 t=2023-12-07T16:38:33+0000 lvl=info msg="Advancing bq origin" origin=0x9d43e508d51fd31fc92307d66b783cb913dde5e55cbebae320caf6fd8a3c913c:34131415 originBehind=true t=2023-12-07T16:38:41+0000 lvl=info msg="Advancing bq origin" origin=0xe2def21265b09fd38d744ec3e317140ddeaf0028e047a8e8033511686877b089:34131416 originBehind=true t=2023-12-07T16:38:43+0000 lvl=warn msg="failed to notify engine driver of L1 head change" err="context deadline exceeded" t=2023-12-07T16:38:50+0000 lvl=info msg="Advancing bq origin" origin=0xdaee59dc1d604d3c2a9e0d630f866f339ec09395b268ffa622384083d7797c1e:34131417 originBehind=true t=2023-12-07T16:38:53+0000 lvl=warn msg="failed to notify engine driver of L1 head change" err="context deadline exceeded" t=2023-12-07T16:39:03+0000 lvl=warn msg="failed to notify engine driver of L1 head change" err="context deadline exceeded" t=2023-12-07T16:39:05+0000 lvl=info msg="New L1 safe block" l1_safe=0xe8ae4a5ad1abbd538893b96f30f4952f5c282e64ea85ca225f681d1b4b1cb316:34150830 t=2023-12-07T16:39:05+0000 lvl=info msg="Advancing bq origin" origin=0xdc5cf7d5f1cb993706dfbaf207da9b9a6c9210b2822c8f358b442eb9d7dda974:34131418 originBehind=true t=2023-12-07T16:39:12+0000 lvl=info msg="New L1 finalized block" l1_finalized=0xe8ae4a5ad1abbd538893b96f30f4952f5c282e64ea85ca225f681d1b4b1cb316:34150830 t=2023-12-07T16:39:12+0000 lvl=info msg="received L1 finality signal, but missing data for immediate L2 finalization" prev_finalized_l1=0xe8ae4a5ad1abbd538893b96f30f4952f5c282e64ea85ca225f681d1b4b1cb316:34150830 signaled_finalized_l1=0xe8ae4a5ad1abbd538893b96f30f4952f5c282e64ea85ca225f681d1b4b1cb316:34150830 t=2023-12-07T16:39:12+0000 lvl=info msg="Advancing bq origin" origin=0xe89e4e2621d583023528cc9af3624f42b310850eef8aa2c78fad50ae81803be9:34131419 originBehind=true t=2023-12-07T16:39:12+0000 lvl=info msg="Reading channel" channel=53bf067a06c701ca09a13d44b8526d36 frames=2 t=2023-12-07T16:39:12+0000 lvl=warn msg="L1 head signal indicates a possible L1 re-org" old_l1_head=0x9ee29df21e0c6e5bb2536e0ff8073c33e94838a05a23a39a25c8fd8831b04ffa:34150805 new_l1_head_parent=0xdd2b7e00b7b06c686d7e84a90de544dde6c2a4c6790879292b4032a5ff20e935 new_l1_head=0x6004023a6fd5841f936e910a8cbcd9657ccedd310f4a21c4c8b3e07438b8d4dd:34150811 t=2023-12-07T16:39:12+0000 lvl=warn msg="L1 head signal indicates a possible L1 re-org" old_l1_head=0x3ff6c05de5d2f7e752cc81817b8c90542c57a6ece914b89f90d9fdff7ac8ec9b:34150814 new_l1_head_parent=0x464202f4fd6480760689e18fc0a31ad9961e488511cab23476f7450e2297f3d0 new_l1_head=0xd579499c2ff4ec9cbfdb08c5d9d2ec4eb5e18192673e59db729b5c253c2e444c:34150818 t=2023-12-07T16:39:12+0000 lvl=warn msg="L1 head signal indicates a possible L1 re-org" old_l1_head=0xe738092f7e318d2d665d6c88ac6ce94557066f2d319bc857872adfcbee510ba4:34150829 new_l1_head_parent=0xe8ae4a5ad1abbd538893b96f30f4952f5c282e64ea85ca225f681d1b4b1cb316 new_l1_head=0xf032ba24400eb57e7a2fc686a2f519203bde43c13d4e053a182e116abe0d0637:34150831 t=2023-12-07T16:39:12+0000 lvl=warn msg="L1 head signal indicates a possible L1 re-org" old_l1_head=0xf032ba24400eb57e7a2fc686a2f519203bde43c13d4e053a182e116abe0d0637:34150831 new_l1_head_parent=0x8b14e56f63742dcf1e8ff3cb2f13287779175df68d64804e33f4be43922c96d3 new_l1_head=0x2500ab1a29abe6b14251aac497a78423a891ff9def0604673e16e5b0fa7332dc:34150834 t=2023-12-07T16:39:17+0000 lvl=info msg="Advancing bq origin" origin=0xfc106be75512417d8d2dd997aa2cd38a2d2b132b4fcb62b69faef9c38d4d14f7:34131420 originBehind=true t=2023-12-07T16:39:23+0000 lvl=info msg="Advancing bq origin" origin=0x7e4627e8cc25d143f6d21c080f7d1a98f146581d8c8b70978480b5bef13609c0:34131421 originBehind=true t=2023-12-07T16:39:28+0000 lvl=info msg="New L1 finalized block" l1_finalized=0x31123f2e659a685465cd682bef97c81141489e7eb4bb7b8cd27e3496834ee581:34150838 t=2023-12-07T16:39:28+0000 lvl=info msg="received L1 finality signal, but missing data for immediate L2 finalization" prev_finalized_l1=0x31123f2e659a685465cd682bef97c81141489e7eb4bb7b8cd27e3496834ee581:34150838 signaled_finalized_l1=0x31123f2e659a685465cd682bef97c81141489e7eb4bb7b8cd27e3496834ee581:34150838 t=2023-12-07T16:39:28+0000 lvl=warn msg="L1 head signal indicates a possible L1 re-org" old_l1_head=0x2500ab1a29abe6b14251aac497a78423a891ff9def0604673e16e5b0fa7332dc:34150834 new_l1_head_parent=0x26a6885c0070230cb6752f141d6b1b10137536fbfe703c590eb5bacb2b2fdee4 new_l1_head=0x31123f2e659a685465cd682bef97c81141489e7eb4bb7b8cd27e3496834ee581:34150838 t=2023-12-07T16:39:28+0000 lvl=info msg="New L1 safe block" l1_safe=0x31123f2e659a685465cd682bef97c81141489e7eb4bb7b8cd27e3496834ee581:34150838 t=2023-12-07T16:39:28+0000 lvl=info msg="Advancing bq origin" origin=0x6a72821ffb5730744b8e9dca2e0968634c33a8dc3cc67fe4bd2a5738a94283d7:34131422 originBehind=true t=2023-12-07T16:39:35+0000 lvl=info msg="Advancing bq origin" origin=0x9b1a3d81119bede115a2f39ee1ba2646ed18805055c6be14f4955cf83420125e:34131423 originBehind=true t=2023-12-07T16:39:43+0000 lvl=info msg="Advancing bq origin" origin=0xf1325acf5749fbdbfe83cca8169580b8734a51390c7a27501f1387c496382ec5:34131424 originBehind=true t=2023-12-07T16:40:06+0000 lvl=warn msg="failed to notify engine driver of L1 head change" err="context deadline exceeded"

STdevK commented 7 months ago

Can I check if -l1=${L1_RPC} is a private or public RPC Endpoint? Also what's the command for op-geth?

YuXiaoCoder commented 7 months ago

I was using the L1 address https://bsc-dataseed.bnbchain.org:443 and ran into the same problem

zhangxinars commented 7 months ago

Can I check if -l1=${L1_RPC} is a private or public RPC Endpoint? Also what's the command for op-geth?

Is private RPC

The op-geth command:

YuXiaoCoder commented 7 months ago

I'm using a private L1 archive node and I'm getting the following error t=2023-12-08T11:07:17+0800 lvl=warn msg="failed to poll L1 block" label=safe err="failed to fetch head header: not found" t=2023-12-08T11:09:54+0800 lvl=warn msg="failed to notify engine driver of L1 head change" err="context deadline exceeded"

zhangxinars commented 7 months ago

/entrypoint.sh

exec geth \ --datadir="$GETH_DATA_DIR" \ --verbosity="$VERBOSITY" \ --http \ --http.corsdomain="" \ --http.vhosts="" \ --http.addr=0.0.0.0 \ --http.port="$RPC_PORT" \ --http.api=web3,debug,eth,txpool,net,engine \ --ws \ --ws.addr=0.0.0.0 \ --ws.port="$WS_PORT" \ --ws.origins="" \ --ws.api=debug,eth,txpool,net,engine \ --syncmode=full \ --nodiscover \ --maxpeers=10 \ --unlock=$BLOCK_SIGNER_ADDRESS \ --password="$GETH_DATA_DIR"/password \ --allow-insecure-unlock \ --authrpc.addr="0.0.0.0" \ --authrpc.port="8551" \ --authrpc.vhosts="" \ --authrpc.jwtsecret=/config/jwt-secret.txt \ --gcmode=$GC_MODE \ --cache 32000 \ --cache.preimages \ --metrics \ --metrics.addr=0.0.0.0 \ --metrics.port=6060 \ "$@"

3eph1r0th commented 7 months ago

Same issue even with own dedicated full node as L1.

YuXiaoCoder commented 7 months ago

My nodes are back in sync.

zhangxinars commented 7 months ago

My nodes are back in sync.

Have you caught up?

3eph1r0th commented 7 months ago

Also back in sync.

zhangxinars commented 7 months ago

Excuse me, have all the nodes recovered by themselves? Or what has been optimized? @YuXiaoCoder @3eph1r0th

3eph1r0th commented 7 months ago

Excuse me, have all the nodes recovered by themselves? Or what has been optimized? @YuXiaoCoder @3eph1r0th

Recovered by themselves

YuXiaoCoder commented 7 months ago

Excuse me, have all the nodes recovered by themselves? Or what has been optimized? @YuXiaoCoder @3eph1r0th

Try adding BOOT_NODES to geth as well. https://docs.bnbchain.org/opbnb-docs/docs/tutorials/running-a-local-node#start-components

anyshy commented 7 months ago

My node hasn't caught up yet. It seems like advancing bq origin slowly. image

krish-nr commented 7 months ago

@zhangxinars Hi, during that time, BSC was experiencing a period of high peak TPS, the maximum transaction volume in a block had reached about 8k, which far exceeded the previous peak. Therefore, it was very challenging for L2 to fetch data and derive it. In fact, accessing the RPC nodes of L1 during that period can be alleviated by improving the configuration of the L1 endpoint, but this is only a temporary solution. We are also looking for a more stable and non-sensitive solution to deal with this possible sudden increase in traffic. Sorry for this and we are on it...

3eph1r0th commented 7 months ago

Hi. Issue happened again on all our nodes.

krish-nr commented 7 months ago

We have optimized some time-consuming operations, and BSC has provided some specific interfaces (not supported in ETH) that can help us retrieve L1 data more efficiently, especially when there is a sudden increase in L1 transaction volume. The activation method is to add--l1.rpckind=bsc_fullnode in the configuration options of the op-node. In our own tests, this has significantly alleviated the issue of slow synchronization speed under extreme conditions.