ethereum-optimism / op-geth

GNU Lesser General Public License v3.0
281 stars 735 forks source link

Mainnet Node syncing very slow and often just stuck in an idle state #241

Closed valamidev closed 4 months ago

valamidev commented 7 months ago

System information

Geth version: op-geth:v1.101305.0 , op-geth:v1.101305.3

We currently run 2 Op-geth. Mainnet node (Node 1/2) on separated machines, the following issues just occurred on Node 1.:

During the whole process the Op-node was healthy and up to the latest block height, so as the Ethereum nodes.

Issues in chronological order:

And this procedure is going on since more than a day.

Overall the sync is slow too, it can take 11 second to sync a single block:

INFO [02-07|16:12:51.000] Imported new potential chain segment     number=115,839,990 hash=1bbf70..64cef6 blocks=1 txs=1 mgas=0.047 elapsed=1.572ms     mgasps=29.829  age=12h6m54s  snapdiffs=320.15KiB triedirty=1.25MiB
INFO [02-07|16:12:51.001] Chain head was updated                   number=115,839,990 hash=1bbf70..64cef6 root=337418..0632d0 elapsed="235.384µs" age=12h6m54s
INFO [02-07|16:13:02.467] Starting work on payload                 id=0x80d3075c3fea7640
INFO [02-07|16:13:02.474] Imported new potential chain segment     number=115,839,991 hash=59a4bf..85684e blocks=1 txs=1 mgas=0.050 elapsed=1.852ms     mgasps=27.252  age=12h7m3s   snapdiffs=320.46KiB triedirty=1.25MiB
INFO [02-07|16:13:02.476] Chain head was updated                   number=115,839,991 hash=59a4bf..85684e root=862b49..5fb80b elapsed="250.603µs" age=12h7m3s

Hardware is like: 2Gbps uplink, 0.5TB Ram, NVME SSD raid, EPIC CPU

Does anyone ever experienced things like this or any idea, why this can happen?

opfocus commented 7 months ago

Have you observed any anomalies with L1RPC?

valamidev commented 7 months ago

Have you observed any anomalies with L1RPC?

No, we are using Geth it is healthy and synced up to blockheight.

valamidev commented 7 months ago

What makes me very curious why this 12h 6m delay is happening:

INFO [02-12|14:15:13.993] Chain head was updated                   number=116,052,473 hash=cddcb0..833f29 root=859c8b..ff58fe elapsed="207.15µs"  age=12h6m30s
INFO [02-12|14:15:14.073] Starting work on payload                 id=0x49439aae3eda3936
INFO [02-12|14:15:14.077] Imported new potential chain segment     number=116,052,474 hash=530161..bdd5c2 blocks=1 txs=1 mgas=0.047 elapsed=1.687ms     mgasps=27.802  age=12h6m29s  snapdiffs=1.41MiB    triedirty=3.14MiB
INFO [02-12|14:15:14.079] Chain head was updated                   number=116,052,474 hash=530161..bdd5c2 root=544bcf..80c0f0 elapsed="415.963µs" age=12h6m29s

More than 2 weeks passed and the Node is always keep this "distance" from the blockheight, sometimes more but never less than 12h 6m.

opfocus commented 7 months ago

Can you provide information about the sync progress entry in the op-node log and whether it indicates that batches have been dropped?

valamidev commented 7 months ago

Can you provide information about the sync progress entry in the op-node log and whether it indicates that batches have been dropped?

It seems nominal:

t=2024-02-13T09:25:48+0000 lvl=info msg="Optimistically queueing unsafe L2 execution payload" id=0x7731b5df70446de2dd1d4d59e403020cd8c2d4d27485e0ab48a8fa7ddb263a92:116108785
t=2024-02-13T09:25:48+0000 lvl=info msg="Reading channel"                        channel=353a41d213a6c21f1195d40dc405aad6 frames=1
t=2024-02-13T09:25:48+0000 lvl=info msg="Advancing bq origin"                    origin=0x6aa89646e9615ae08d2f397c02a33bd47839d22a59133f88c24f5a7a2c3d896d:19215990 originBehind=false
t=2024-02-13T09:25:49+0000 lvl=info msg="Advancing bq origin"                    origin=0xb6a6eb14ef77d88c702355adc32066acb5da335e3d7f50e0bfbd9d94ff089f60:19215991 originBehind=false
t=2024-02-13T09:25:49+0000 lvl=info msg="Advancing bq origin"                    origin=0xe08f2450624a5efd8e4303535af191661a8fe1abc7c74bb2f3bb45225db25324:19215992 originBehind=false
t=2024-02-13T09:25:49+0000 lvl=info msg="Received signed execution payload from p2p" id=0x8b588f1a01a4081fb8e10860f1066eb927a41d0d87a5b69262d9f0c3d1e4c1db:116108786 peer=16Uiu2HAmJMyUxRaVq689PEAcyWQFS5LYyZUA4zZmboUtnsBFsX3i
t=2024-02-13T09:25:50+0000 lvl=info msg="Optimistically queueing unsafe L2 execution payload" id=0x8b588f1a01a4081fb8e10860f1066eb927a41d0d87a5b69262d9f0c3d1e4c1db:116108786
t=2024-02-13T09:25:50+0000 lvl=info msg="Advancing bq origin"                    origin=0x7cb7d061bd76c75e36b74b5e59ba836a1653cf69877834d83c190bd3fb2da7f7:19215993 originBehind=false
t=2024-02-13T09:25:50+0000 lvl=info msg="Advancing bq origin"                    origin=0xcddd5c64e83fa83111f2829d5f352cc049f29d8021a723d0d03e93c8104416a1:19215994 originBehind=false
t=2024-02-13T09:25:51+0000 lvl=info msg="Received signed execution payload from p2p" id=0x34d4a344476fcdc1f407d81ab8f05fdfa11d37aff4afa213895f5f20708628ac:116108787 peer=16Uiu2HAm423f1zgf1HqGbL9x5zYJmkzgK1wjgqC23AiGRHEh5ynA
t=2024-02-13T09:25:51+0000 lvl=warn msg="failed to serve p2p sync request"       serve=payloads_by_number peer=16Uiu2HAmS-51339 remote=/ip4/54.160.33.54/tcp/9003     req=116,108,778 err="peer requested unknown block by number: not found"
t=2024-02-13T09:25:51+0000 lvl=info msg="Advancing bq origin"                    origin=0xf450f5bedf3d5d7a0f553c1ad84af30ada8ac4429150befeb9b1896fd8d809f8:19215995 originBehind=false
t=2024-02-13T09:25:51+0000 lvl=info msg="created new channel"                    origin=0xf450f5bedf3d5d7a0f553c1ad84af30ada8ac4429150befeb9b1896fd8d809f8:19215995 channel=1313f9c6c7e5b229abd0675b6eab3a37 length=118,456 frame_number=0 is_last=true
t=2024-02-13T09:25:51+0000 lvl=info msg="Optimistically queueing unsafe L2 execution payload" id=0x34d4a344476fcdc1f407d81ab8f05fdfa11d37aff4afa213895f5f20708628ac:116108787
t=2024-02-13T09:25:51+0000 lvl=info msg="Reading channel"                        channel=1313f9c6c7e5b229abd0675b6eab3a37 frames=1
t=2024-02-13T09:25:52+0000 lvl=info msg="Advancing bq origin"                    origin=0x726ba79c5c7c5a76a11673bda2df9d81dd62e8e58bb15b9b65bb9bcea684a043:19215996 originBehind=false
t=2024-02-13T09:25:52+0000 lvl=info msg="Advancing bq origin"                    origin=0x5070f5560f49c3587a8233d432a70597e39b96e924e2bf6265e14a67a341e55b:19215997 originBehind=false
t=2024-02-13T09:25:53+0000 lvl=info msg="Advancing bq origin"                    origin=0x52bd4507e8a34135731e439e6b7373fbf07829eaa4305788b8a21c4f7c0f031a:19215998 originBehind=false
t=2024-02-13T09:25:53+0000 lvl=info msg="Received signed execution payload from p2p" id=0xe80138762b60536b3000636a72f545909e8680e9051ea4ae4ef48d5c9337ea1d:116108788 peer=16Uiu2HAm8Xug1ovyDgEya1JMJU5o1MsJ5qD1cA2pVePoB1NwafJv
t=2024-02-13T09:25:53+0000 lvl=info msg="Advancing bq origin"                    origin=0x4578ccbb5800c90d3231d300c897c3ddd6ae84a8baf1550f0ad2158cb9e97ea4:19215999 originBehind=false
t=2024-02-13T09:25:53+0000 lvl=info msg="Advancing bq origin"                    origin=0x2676a1507e780a6f309638e5564c5a74cf6b79ebf55f215fc0dac574a5b41308:19216000 originBehind=false
t=2024-02-13T09:25:53+0000 lvl=info msg="created new channel"                    origin=0x2676a1507e780a6f309638e5564c5a74cf6b79ebf55f215fc0dac574a5b41308:19216000 channel=d9f7bae7bca84f59ab4b16dab8274c92 length=119,244 frame_number=0 is_last=true
t=2024-02-13T09:25:53+0000 lvl=info msg="Optimistically queueing unsafe L2 execution payload" id=0xe80138762b60536b3000636a72f545909e8680e9051ea4ae4ef48d5c9337ea1d:116108788
t=2024-02-13T09:25:53+0000 lvl=info msg="Reading channel"                        channel=d9f7bae7bca84f59ab4b16dab8274c92 frames=1
t=2024-02-13T09:25:54+0000 lvl=info msg="Advancing bq origin"                    origin=0xcffb438205989e7b2efe00b9fc137c336133211af8d95b2805fe936a3089445f:19216001 originBehind=false
t=2024-02-13T09:25:54+0000 lvl=info msg="Advancing bq origin"                    origin=0x9801571404954fb1807766e9ebc9bf862fc84e93172f52bbf2c591f36e37facd:19216002 originBehind=false
valamidev commented 4 months ago

Problem found, there was issue with finding P2P peers.

opfocus commented 4 months ago

blem found, there was issue with finding P2P peers.

how did you solve it

valamidev commented 4 months ago

blem found, there was issue with finding P2P peers.

how did you solve it

We had issue with forwarding ports and nodiscovery

zeGzD commented 3 months ago

blem found, there was issue with finding P2P peers.

how did you solve it

We had issue with forwarding ports and nodiscovery

I am facing the same issue. Could you please explain a bit more? Thank you.

opfocus commented 3 months ago

blem found, there was issue with finding P2P peers.

how did you solve it

We had issue with forwarding ports and nodiscovery

I am facing the same issue. Could you please explain a bit more? Thank you.

I think he means to remove this configuration--nodiscoveron op-geth.