bnb-chain / opbnb

MIT License
403 stars 162 forks source link

Node doesn't sync (mainnet, op-geth & opnode v0.4.2) #217

Closed HarukiMcCree closed 1 month ago

HarukiMcCree commented 1 month ago

System information

Element Value
Network mainnet
Op-node ghcr.io/bnb-chain/op-node:v0.4.2
Op-geth ghcr.io/bnb-chain/op-geth:v0.4.2
CPU Intel(R) Xeon(R) Silver 4210R CPU @ 2.40GHz
RAM DDR4 ECC 64GB
Kernel GNU/Linux 5.10.0-28-amd64
Disks NVME SSD, SAMSUNG MZWLJ7T6HALA-00007 7.68TB
DB snapshot bnb-chain/opbnb-snapshot

Expected behaviour

Op-geth importing blocks

Actual behaviour

Op-geth does not import blocks, even after a long time (about a week). Op-geth logs:

INFO [07-03|09:41:00.995] loop print db stats                      database=/data/geth/chaindata comp_time=0 write_delay_count=0 write_delay_time=0 non_level0_comp_count=0 level0_comp_count=0
INFO [07-03|09:41:03.996] loop print db stats                      database=/data/geth/chaindata comp_time=0 write_delay_count=0 write_delay_time=0 non_level0_comp_count=0 level0_comp_count=0
INFO [07-03|09:41:06.997] loop print db stats                      database=/data/geth/chaindata comp_time=0 write_delay_count=0 write_delay_time=0 non_level0_comp_count=0 level0_comp_count=0
INFO [07-03|09:41:09.997] loop print db stats                      database=/data/geth/chaindata comp_time=0 write_delay_count=0 write_delay_time=0 non_level0_comp_count=0 level0_comp_count=0

Op-node logs:

t=2024-07-03T09:39:23+0000 lvl=info msg="Walking back L1Block by hash" curr=0xd2ac81d52ef8f6deddc3ba86950f1468cc45174696578e4ec67d32ca314c4ccb:39979299 next=0xe92f50026d308480b96537e07bb0ad43ecfb00a61f03caa0da61651e8a8d86b6:39979298 l2block=0xb57da014933b0f8496c1f8ca6e1ef4775452671a17bf7dcce5b5cf191207fe1d:27742576
t=2024-07-03T09:39:23+0000 lvl=warn msg="failed to notify engine driver of L1 head change" err="context deadline exceeded"
t=2024-07-03T09:39:23+0000 lvl=info msg="Walking back L1Block by hash" curr=0xe92f50026d308480b96537e07bb0ad43ecfb00a61f03caa0da61651e8a8d86b6:39979298 next=0x052ea204656bd04010dc7e18c8734f950c95bc738956f36d3e83051d3f481dab:39979297 l2block=0xa70786711003cbda2e2ddb021efbb1afebdd8c0f8d126e2ee01fe283316433e0:27742575
t=2024-07-03T09:39:23+0000 lvl=debug msg="Revalidated node" p2p=discv5 b=14 id=15aad4b826b26547fc755e92f486b66e1ae6eecaf2ae41f963ff48e49f119aa5 checks=6
t=2024-07-03T09:39:23+0000 lvl=info msg="Walking back L1Block by hash" curr=0x052ea204656bd04010dc7e18c8734f950c95bc738956f36d3e83051d3f481dab:39979297 next=0xdb7f46625c038eaff3e16bd49ae5f9e856574413ac78d724fb11560a7b3dde0f:39979296 l2block=0xe43b91c739d36aafab10265d32727ed9bf1cd7784990afa9aec106782b9be6b9:27742570
t=2024-07-03T09:39:23+0000 lvl=info msg="Walking back L1Block by hash" curr=0xdb7f46625c038eaff3e16bd49ae5f9e856574413ac78d724fb11560a7b3dde0f:39979296 next=0x967d90984a10dbffd058d8c61d2e6df6210318a58455aa644f96468da2e23d74:39979295 l2block=0xa87eaad561a62c4af51a5296cedc3129b681655009baa3e8a6361b4e1e6b396e:27742569
t=2024-07-03T09:39:24+0000 lvl=debug msg="validated blockv3" topic=blocksV3 result=ACCEPT from=16Uiu2HAkyawqgokfp7MetgshLiTkoDwPta3v7QS3bEoYSiJBxWjG
t=2024-07-03T09:39:24+0000 lvl=info msg="Walking back L1Block by hash" curr=0x967d90984a10dbffd058d8c61d2e6df6210318a58455aa644f96468da2e23d74:39979295 next=0x8bd03fdeb80bf8fd4493a30702ead7b59bf1c9a98ba7e22155166dfa398afe71:39979294 l2block=0x7d85f6a2e9fc89ec70f7963ce57003176b2a134735551b946ffad74e67d82740:27742564
t=2024-07-03T09:39:24+0000 lvl=info msg="Walking back L1Block by hash" curr=0x8bd03fdeb80bf8fd4493a30702ead7b59bf1c9a98ba7e22155166dfa398afe71:39979294 next=0x9aa29fc2ffc24a6caaa62c1a98d1229845de20e4ce59aa41c1a053fde4a18520:39979293 l2block=0x3f935c1561fe77eae9f5ea58ff061ddc351f0aacf73a959534f66252284d3a1c:27742563
t=2024-07-03T09:39:24+0000 lvl=info msg="disconnected from peer" peer=16Uiu2HAm67BwBSLAzsU8E2ttzNYetdUjd86q5mFWjgJ1PEngoMoG addr=/ip4/54.81.73.138/tcp/9003 peerCount=39
t=2024-07-03T09:39:24+0000 lvl=debug msg="peer left topic" topic=blocksV2 peer=16Uiu2HAm67BwBSLAzsU8E2ttzNYetdUjd86q5mFWjgJ1PEngoMoG
t=2024-07-03T09:39:24+0000 lvl=debug msg="peer left topic" topic=blocksV3 peer=16Uiu2HAm67BwBSLAzsU8E2ttzNYetdUjd86q5mFWjgJ1PEngoMoG
t=2024-07-03T09:39:24+0000 lvl=debug msg="peer left topic" topic=blocksV1 peer=16Uiu2HAm67BwBSLAzsU8E2ttzNYetdUjd86q5mFWjgJ1PEngoMoG
t=2024-07-03T09:39:24+0000 lvl=debug msg="stopped syncing loop of peer" id=16Uiu2HAm67BwBSLAzsU8E2ttzNYetdUjd86q5mFWjgJ1PEngoMoG
t=2024-07-03T09:39:24+0000 lvl=info msg="Walking back L1Block by hash" curr=0x9aa29fc2ffc24a6caaa62c1a98d1229845de20e4ce59aa41c1a053fde4a18520:39979293 next=0xf6d2ddb0309750ebc5540f3d961e2d02272c4419c712e70ddc1a6f4d6231b8e8:39979292 l2block=0x8b32843ff239cfbee81bfbf2fd0e2973e22c43798c3d8dfb12d8ac66a07b064a:27742557
t=2024-07-03T09:39:25+0000 lvl=info msg="Walking back L1Block by hash" curr=0xf6d2ddb0309750ebc5540f3d961e2d02272c4419c712e70ddc1a6f4d6231b8e8:39979292 next=0x282ac8488916828a841dec1e2baefa12adbdfcabbab6d4781edab4b61d429df2:39979291 l2block=0x46fa0a21ded5c99ba1ebc97dd19c2edcbe4dc57a73c54edd8edbbf8370f66021:27742554
t=2024-07-03T09:39:25+0000 lvl=warn msg="failed to notify engine driver of new L2 payload" err="context deadline exceeded" id=0xb301338f684a69d788c3cb8b8fe643048fce9b22e344062086074a84490cbac5:28244852
t=2024-07-03T09:39:25+0000 lvl=info msg="Received signed execution payload from p2p" id=0x10db6a94730ff2edb38bc77780d1abdabfeade313d64d2c3d79a0c4d468aa1d7:28244882 peer=16Uiu2HAmHNbEuRS8iWsgWpmJDTzK58fAJHVFrP4w7HS5wfNM5LyG
t=2024-07-03T09:39:25+0000 lvl=info msg="Walking back L1Block by hash" curr=0x282ac8488916828a841dec1e2baefa12adbdfcabbab6d4781edab4b61d429df2:39979291 next=0x833edba8a0ca46a7b102905b093c5e8208768c032edd1ccfe97aa79455d8231f:39979290 l2block=0x5d7850bcfa5303f821762da1f60c0d80d777056199a58e3aade2b4ab7f6918df:27742551
t=2024-07-03T09:39:25+0000 lvl=debug msg="validated blockv3" topic=blocksV3 result=ACCEPT from=16Uiu2HAmNbo4i83FENbSRqQs3idEwwAjxEBiW9YVfhtqHDfvTnQ8
t=2024-07-03T09:39:25+0000 lvl=info msg="Walking back L1Block by hash" curr=0x833edba8a0ca46a7b102905b093c5e8208768c032edd1ccfe97aa79455d8231f:39979290 next=0x8642b778c33818a317dcba9c02c9b377e9a9ab573a40bc2c4b63f30cd17c59ba:39979289 l2block=0x80a7c0d067a6ec84e45e441a25fa2b21b32656d572e7ff3a6a4514bff00520c7:27742548
t=2024-07-03T09:39:25+0000 lvl=info msg="Walking back L1Block by hash" curr=0x8642b778c33818a317dcba9c02c9b377e9a9ab573a40bc2c4b63f30cd17c59ba:39979289 next=0x2420768da56041cc170cfe17b18905dc4db1a5fc092d4f93296eb6bd081d2858:39979288 l2block=0x5ddf448199e9cbd479ff3c6c63e8500bdf8dace6e48f086d25941bfe821a7985:27742545
t=2024-07-03T09:39:25+0000 lvl=debug msg="Revalidated node" p2p=discv5 b=16 id=bb986003fadb90f60351b13ffc5d40cdefeeb881b3e2a78c3199a475e6d104f6 checks=5
t=2024-07-03T09:39:26+0000 lvl=info msg="Walking back L1Block by hash" curr=0x2420768da56041cc170cfe17b18905dc4db1a5fc092d4f93296eb6bd081d2858:39979288 next=0x7f32ad615b11f802179886518a4cba90e7838c1041e7cb62ad3479dc5150c5e9:39979287 l2block=0xfa97b1c545d0e7f34286093188b58cfc48d2af9c91f3a193743da6043f00fa2a:27742542
t=2024-07-03T09:39:26+0000 lvl=info msg="Walking back L1Block by hash" curr=0x7f32ad615b11f802179886518a4cba90e7838c1041e7cb62ad3479dc5150c5e9:39979287 next=0x41990f14733f1e78f66da137beac7f5f4cc137ba01117246a82c2d24fe4411eb:39979286 l2block=0x304f336fc76b153ee0d2df8b65e8f3ebcb8c248896394d8d67064579b08ba9be:27742540
t=2024-07-03T09:39:26+0000 lvl=info msg="Walking back L1Block by hash" curr=0x41990f14733f1e78f66da137beac7f5f4cc137ba01117246a82c2d24fe4411eb:39979286 next=0x0af0c7ce8c76eb5e2637df57aa47ec4583d317e05211a0eabea78109059e5e31:39979285 l2block=0xb994c79d37b32a6cf42a0e4292e28e0c7bfa24380a51895e8d537a617c8e2cf8:27742539
t=2024-07-03T09:39:26+0000 lvl=debug msg="validated blockv3" topic=blocksV3 result=ACCEPT from=16Uiu2HAky51SJoZMJ4vqw9eZs7w3PTefzVLdegbDn6dG9m985pRq
t=2024-07-03T09:39:26+0000 lvl=info msg="Walking back L1Block by hash" curr=0x0af0c7ce8c76eb5e2637df57aa47ec4583d317e05211a0eabea78109059e5e31:39979285 next=0xdf1cd206ce1e7caa46f226288d58e0f88100cfb46cde5f89638caf5caa3295e8:39979284 l2block=0xda5ac9e7dc04f4c7ecf85afed4b71b60010763ef391c946cedf187bf29a817fe:27742534
t=2024-07-03T09:39:26+0000 lvl=info msg="Walking back L1Block by hash" curr=0xdf1cd206ce1e7caa46f226288d58e0f88100cfb46cde5f89638caf5caa3295e8:39979284 next=0x62fdd28a0090ad89cc0234cd5b5a270f228308847a6ca7cc38fd3c6c9e70f1db:39979283 l2block=0x10da28d4762e8c1b08ba9dcd8c59f4c4dacdc9b93a05ca832e2f947640fa54c8:27742533
t=2024-07-03T09:39:27+0000 lvl=debug msg="peer left topic" topic=blocksV3 peer=16Uiu2HAkwFV41AFwtAg3KWGymgp4a1oqqyKHiX9Au7he6paJX3Rt
t=2024-07-03T09:39:27+0000 lvl=debug msg="peer left topic" topic=blocksV1 peer=16Uiu2HAkwFV41AFwtAg3KWGymgp4a1oqqyKHiX9Au7he6paJX3Rt
t=2024-07-03T09:39:27+0000 lvl=debug msg="peer left topic" topic=blocksV2 peer=16Uiu2HAkwFV41AFwtAg3KWGymgp4a1oqqyKHiX9Au7he6paJX3Rt
t=2024-07-03T09:39:27+0000 lvl=info msg="disconnected from peer" peer=16Uiu2HAkwFV41AFwtAg3KWGymgp4a1oqqyKHiX9Au7he6paJX3Rt addr=/ip4/3.14.248.20/tcp/9003 peerCount=38
t=2024-07-03T09:39:27+0000 lvl=debug msg="stopped syncing loop of peer" id=16Uiu2HAkwFV41AFwtAg3KWGymgp4a1oqqyKHiX9Au7he6paJX3Rt
t=2024-07-03T09:39:27+0000 lvl=info msg="Walking back L1Block by hash" curr=0x62fdd28a0090ad89cc0234cd5b5a270f228308847a6ca7cc38fd3c6c9e70f1db:39979283 next=0x2cfb096d455c7c95ff9fdba5c6e80c3c2aec883b330ad55d7915fd3720838083:39979282 l2block=0x556cc285a1848889150d98ac73564af6aa6bd24acc759abd411128b3dfc0a260:27742530
t=2024-07-03T09:39:27+0000 lvl=debug msg="validated blockv3" topic=blocksV3 result=ACCEPT from=16Uiu2HAkyawqgokfp7MetgshLiTkoDwPta3v7QS3bEoYSiJBxWjG
t=2024-07-03T09:39:27+0000 lvl=info msg="Walking back L1Block by hash" curr=0x2cfb096d455c7c95ff9fdba5c6e80c3c2aec883b330ad55d7915fd3720838083:39979282 next=0xd1ef6c35ee41a7d7f881ac1871c8f604bbae3af4b1b458dc6f77a202507ce13b:39979281 l2block=0xa7826193eed4ff509764df8e3627c57822e0ec15a6d320e305688695788eafa1:27742524
t=2024-07-03T09:39:27+0000 lvl=info msg="Walking back L1Block by hash" curr=0xd1ef6c35ee41a7d7f881ac1871c8f604bbae3af4b1b458dc6f77a202507ce13b:39979281 next=0x6ace791b3cc4458ab762211f16a2939030635f5e0c190bc8866d02a573d57d20:39979280 l2block=0xdd3f2027bf15d15a86d76348f316c9fe638afa090bcc7f2cb9f2a62ef647d39f:27742522
t=2024-07-03T09:39:28+0000 lvl=info msg="Walking back L1Block by hash" curr=0x6ace791b3cc4458ab762211f16a2939030635f5e0c190bc8866d02a573d57d20:39979280 next=0x235fd01b0b6d15c40fc229cd8b1002f4a06ab5dae500136552533ab739fd96e0:39979279 l2block=0x58d040dbeb3f8fffdc660ffccd78276bbaff79126ba10240c52c89317dfab6d5:27742521

Op-geth logs after Op-node done catching up:

INFO [07-03|13:42:25.176] announced transaction is underpriced     hash=0x6d0670007bfda8af7a238a382ce33d50156faeb356b18404ccee7d828478da00
INFO [07-03|13:42:25.208] announced transaction is underpriced     hash=0x56511dc71bb6bfc11d73092b56b533aa96cb93bb065fa828b1343d150801c2ec
INFO [07-03|13:42:25.233] announced transaction is underpriced     hash=0xd60a574e5712c203452c09231e63773ce451b30938ef8e2cabedc67f393b260f
INFO [07-03|13:42:25.262] announced transaction is underpriced     hash=0x571f9770334896456d0056e4770d6baae7f1cacbfd180b3317dcbba211093f56
INFO [07-03|13:42:25.267] announced transaction is underpriced     hash=0x76a7707ff72e6634bc3d11fa44298b51f1bdafcac11090570463e3ed45d5f7fb
INFO [07-03|13:42:25.268] announced transaction is underpriced     hash=0x571f9770334896456d0056e4770d6baae7f1cacbfd180b3317dcbba211093f56
INFO [07-03|13:42:25.292] announced transaction is underpriced     hash=0x76a7707ff72e6634bc3d11fa44298b51f1bdafcac11090570463e3ed45d5f7fb
INFO [07-03|13:42:25.292] loop print db stats                      database=/data/geth/chaindata comp_time=1,929,328,345,853 write_delay_count=0 write_delay_time=3,821,325,143 non_level0_comp_count=10588 level0_comp_count=1325
INFO [07-03|13:42:25.404] announced transaction is underpriced     hash=0x76a7707ff72e6634bc3d11fa44298b51f1bdafcac11090570463e3ed45d5f7fb
INFO [07-03|13:42:25.405] announced transaction is underpriced     hash=0x571f9770334896456d0056e4770d6baae7f1cacbfd180b3317dcbba211093f56
INFO [07-03|13:42:25.496] announced transaction is underpriced     hash=0x8ae34aa8c9a357b55d830e0ad824e82bb0f2566ffe3225ca0f6d823484be2a5e

Yeah, logs are taken at different time periods, but it doesn't matter IMHO Also we tried launching op-geth with different arguments and different snapshots - and it didn't resolve the issue.

Steps to reproduce the behaviour

  1. Download and unpack snapshot archive into datadir

  2. Start op-geth with args:

    --opBNBMainnet
    --datadir=/data
    --verbosity=3
    --http
    --http.corsdomain=*
    --http.vhosts=*
    --http.addr=0.0.0.0
    --http.port=8545
    --http.api=net,eth,engine
    --ws
    --ws.addr=0.0.0.0
    --ws.port=8546
    --ws.origins=*
    --ws.api=ethengine
    --syncmode=full
    --allow-insecure-unlock
    --authrpc.addr=0.0.0.0
    --authrpc.port=8551
    --authrpc.vhosts=*
    --gcmode=full
    --metrics
    --metrics.port=6060
    --metrics.addr=0.0.0.0
    --authrpc.jwtsecret=/local/jwtsecret
  3. Start op-node with args:

    --l1.trustrpc
    --sequencer.l1-confs=15
    --verifier.l1-confs=15
    --l1.http-poll-interval=3s
    --l1.epoch-poll-interval=45s
    --l1.rpc-max-batch-size=20
    --rpc.addr=0.0.0.0
    --rpc.port=8545
    --p2p.sync.req-resp
    --p2p.listen.ip=0.0.0.0
    --p2p.listen.tcp=9003
    --p2p.listen.udp=9003
    --snapshotlog.file=./snapshot.log
    --metrics.enabled
    --metrics.addr=0.0.0.0
    --metrics.port=7300
    --pprof.enabled
    --rpc.enable-admin
    --log.level=debug
    --l2=http://127.0.0.1:8551
    --l2.jwt-secret=/local/jwtsecret
    --l1.rpckind=standard
    --network=opBNBMainnet
    --l1=http://our-l1-bsc:8545
STdevK commented 1 month ago

We suspect it is caused by the large amount of data, which leads to degraded performance of geth. If this user encounters the same issue again, we suggest that you save all the op-geth and op-node logs during the stuck period. Most likely, the error "Database compacting, degraded performance database=/data/geth/chaindata" can also be seen in the op-geth logs.

You can find more information "Data compacting" here. https://docs.bnbchain.org/opbnb-docs/docs/tutorials/running-a-local-node/

github-actions[bot] commented 1 month ago

This issue is stale because it has been open 14 days with no activity. Remove stale label or comment or this will be closed in 5 days.

github-actions[bot] commented 1 month ago

The issue is closed due to long inactivity. You can reopen it for further questions or comments.