erigontech / erigon

Ethereum implementation on the efficiency frontier https://erigon.gitbook.io
GNU Lesser General Public License v3.0
3.12k stars 1.11k forks source link

Syncing always lagging a few blocks back when running ethereum archive node #9658

Open jasonyic opened 7 months ago

jasonyic commented 7 months ago

System information

Erigon version: v2.58.1

OS & Version: Linux

Commit hash: 2.58.1-f12e451c

Erigon Command (with flags/config):

erigon --datadir=/data --chain=mainnet --http --http.port=8545 --http.addr=0.0.0.0 --http.api=admin,eth,web3,net,debug,trace --http.corsdomain=* --http.vhosts=* --nat=extip:$(NAT_IP) --port=$(NAT_PORT) --authrpc.addr=0.0.0.0 --authrpc.vhosts=* --authrpc.port=8550 --authrpc.jwtsecret=/data/jwtsecret.hex --metrics --metrics.addr=0.0.0.0 --metrics.port=9545 --config=/data/config.toml --maxpeers=50 --rpc.returndata.limit=1000000 --snapshots=false --db.size.limit=3TB

Consensus Layer: TEKU

Consensus Layer Command (with flags/config):

teku --data-path=/data --network=mainnet --rest-api-enabled=true --rest-api-host-allowlist=* --rest-api-interface=0.0.0.0 --rest-api-port=5051 --ee-endpoint=http://localhost:8550 --p2p-advertised-ip=$(NAT_IP) --p2p-port=$(NAT_PORT) --ee-jwt-secret-file=/data/jwtsecret.hex --validators-proposer-default-fee-recipient=$(PLATFORM_FEE_RECIPIENT_ADDRESS) --initial-state=http://teku-beacon-archive.mycompany-io/eth/v2/debug/beacon/states/finalized --metrics-enabled=true --metrics-port=8008 --metrics-interface=0.0.0.0 --metrics-host-allowlist=* --data-storage-mode=archive --log-destination=CONSOLE --config-file=/config/client --metrics-block-timing-tracking-enabled

Chain/Network: ethereum mainnet

Expected behaviour

node should keep in-sync with block head with 1-2 blocks lagging behind.

Actual behaviour

node was lagging behind as much as 20 blocks sometimes. we observed that [snapshots] Prune Blocks and blocks verification successful can take quite long, e.g. 5-15 sec to complete each block.

erigon has good number of peers {"inbound":1,"outbound":31,"peerCount":32} teku: {"inbound":0,"outbound":40,"peerCount":40}

[INFO] [03-10|03:36:29.118] [4/12 Execution] Completed on            block=19402053
[INFO] [03-10|03:36:52.073] RPC Daemon notified of new headers       from=19402045 to=19402053 hash=0xeb0d897e216158d50a7633bd20c7916ec238c6c3b2e46bea866871d5e8594926 header sending=22.909µs log sending=218ns
[INFO] [03-10|03:36:52.073] head updated                             hash=0xeb0d897e216158d50a7633bd20c7916ec238c6c3b2e46bea866871d5e8594926 number=19402053
[INFO] [03-10|03:36:55.798] [snapshots] Prune Blocks                 to=19312000 limit=100
[INFO] [03-10|03:37:00.858] [NewPayload] Handling new payload        height=19402056 hash=0x2cfc42f6706321ff1050875f369490a0071a202c5975d14d5d6197299731a8b8
[INFO] [03-10|03:37:00.858] [EngineBlockDownloader] Downloading PoS headers... hash=0x243a476d10a48767056c6cc9c33047ca1955c2bc5780f448f2efa27112d06143 requestId=0
[INFO] [03-10|03:37:01.302] [EngineBlockDownloader] Processed        highest=19402055
[INFO] [03-10|03:37:01.302] Beginning downloaded blocks insertion 
[INFO] [03-10|03:37:01.321] [EngineBlockDownloader] Finished downloading blocks from=19402053 to=19402055
[INFO] [03-10|03:37:11.153] [EngineBlockDownloader] blocks verification successful 
[INFO] [03-10|03:37:11.912] [NewPayload] Handling new payload        height=19402057 hash=0xd0519982b243c84f2f331e10ed691fa01362c58549ab3492c4e9269f7dd288fd
[INFO] [03-10|03:37:16.166] [4/12 Execution] Completed on            block=19402057
[INFO] [03-10|03:37:28.437] RPC Daemon notified of new headers       from=19402053 to=19402057 hash=0xd0519982b243c84f2f331e10ed691fa01362c58549ab3492c4e9269f7dd288fd header sending=18.741µs log sending=240ns
[INFO] [03-10|03:37:28.437] head updated                             hash=0xd0519982b243c84f2f331e10ed691fa01362c58549ab3492c4e9269f7dd288fd number=19402057
[INFO] [03-10|03:37:33.444] [snapshots] Prune Blocks                 to=19312000 limit=100
[INFO] [03-10|03:37:36.603] [NewPayload] Handling new payload        height=19402059 hash=0xaab016c48310f39a0c128c29c1a4a6a4ece261daa611f3d30b4ba4f7d6e4b57d
[INFO] [03-10|03:37:36.603] [EngineBlockDownloader] Downloading PoS headers... hash=0x1f5e1a9d578565391085561b20724de9f75b8acecbc1969eb17486537f03c063 requestId=0
[INFO] [03-10|03:37:36.647] [EngineBlockDownloader] Processed        highest=19402058
[INFO] [03-10|03:37:36.647] Beginning downloaded blocks insertion 
[INFO] [03-10|03:37:36.780] [EngineBlockDownloader] Finished downloading blocks from=19402057 to=19402058
[INFO] [03-10|03:37:45.156] [EngineBlockDownloader] blocks verification successful 
[INFO] [03-10|03:37:48.319] [NewPayload] Handling new payload        height=19402060 hash=0x661b891de38ec85644f81fea21d63fc084fd845aa12eea37624202808442ec34
[INFO] [03-10|03:37:51.279] [4/12 Execution] Completed on            block=19402060

Steps to reproduce the behaviour

N/A

Backtrace

N/A

darklow commented 7 months ago

Seeing same problem, but not just few blocks, constantly few to many minutes behind to the moment where node becomes unusable to delays up to 10minutes:

version:

git_branch=heads/v2.58.2 git_tag=v2.58.2-dirty git_commit=125509e483326d77eb810fdc33a64d30ed98e18f

cmd:

erigon --http.addr=0.0.0.0 --http.vhosts=* --http.corsdomain=* \
 --http.api=eth,erigon,engine,net,web3,debug,txpool,trace --txpool.api.addr=erigon:9090 \
 --authrpc.addr=0.0.0.0 --authrpc.vhosts=* --authrpc.port=8551 --internalcl \
 --prune=htc --prune.r.older=2607195

log (7min no sync between 13:01-13:08)

erigon  | [INFO] [03-14|13:00:53.590] RPC Daemon notified of new headers       from=19433357 to=19433406 hash=0x088f6d3f450fb27f3267ae3d20a9fe34c100320eac4ad229e6e9ac17f395cd14 header sending=63.9µs log sending=186ns
erigon  | [INFO] [03-14|13:00:53.590] head updated                             hash=0x088f6d3f450fb27f3267ae3d20a9fe34c100320eac4ad229e6e9ac17f395cd14 number=19433406
erigon  | [INFO] [03-14|13:00:55.989] [snapshots] Prune Blocks                 to=19342000 limit=100
erigon  | [INFO] [03-14|13:00:58.195] [4/12 Execution] Completed on            block=19433408
erigon  | [INFO] [03-14|13:00:58.912] RPC Daemon notified of new headers       from=19433406 to=19433408 hash=0xcdf198bd1ef031646927cdd79ca2655a1ec3248c84090264d01c17269111b897 header sending=13.991µs log sending=179ns
erigon  | [INFO] [03-14|13:00:58.912] head updated                             hash=0xcdf198bd1ef031646927cdd79ca2655a1ec3248c84090264d01c17269111b897 number=19433408
erigon  | [INFO] [03-14|13:00:59.363] [snapshots] Prune Blocks                 to=19342000 limit=100
erigon  | [INFO] [03-14|13:01:17.337] [4/12 Execution] Completed on            block=19433410
erigon  | [INFO] [03-14|13:01:18.437] RPC Daemon notified of new headers       from=19433408 to=19433410 hash=0x428458a7075f332ba09059a5c67d8c6f4bbb0b187b2fee35e0200f76eff77207 header sending=16.124µs log sending=240ns
erigon  | [INFO] [03-14|13:01:18.437] head updated                             hash=0x428458a7075f332ba09059a5c67d8c6f4bbb0b187b2fee35e0200f76eff77207 number=19433410
erigon  | [INFO] [03-14|13:01:18.878] [snapshots] Prune Blocks                 to=19342000 limit=100
erigon  | [INFO] [03-14|13:01:36.034] P2P                                      app=caplin peers=54
erigon  | [INFO] [03-14|13:02:12.085] [p2p] GoodPeers                          eth68=99 eth67=16
erigon  | [INFO] [03-14|13:02:12.404] [txpool] stat                            pending=858 baseFee=1582 queued=19806 alloc=12.9GB sys=18.0GB
erigon  | [INFO] [03-14|13:02:32.009] [snapshots:retire] Blocks Stat           blocks=19343k indices=19343k alloc=7.6GB sys=18.0GB
erigon  | [INFO] [03-14|13:02:36.033] P2P                                      app=caplin peers=62
erigon  | [INFO] [03-14|13:03:36.033] P2P                                      app=caplin peers=63
erigon  | [INFO] [03-14|13:04:36.033] P2P                                      app=caplin peers=51
erigon  | [INFO] [03-14|13:05:12.081] [p2p] GoodPeers                          eth68=99 eth67=17
erigon  | [INFO] [03-14|13:05:12.405] [txpool] stat                            pending=2685 baseFee=3008 queued=19934 alloc=5.6GB sys=18.0GB
erigon  | [INFO] [03-14|13:05:36.033] P2P                                      app=caplin peers=45
erigon  | [INFO] [03-14|13:06:36.033] P2P                                      app=caplin peers=57
erigon  | [INFO] [03-14|13:07:36.033] P2P                                      app=caplin peers=53
erigon  | [INFO] [03-14|13:08:07.612] [3/12 Senders] Started                   from=19433410 to=19433443
erigon  | [INFO] [03-14|13:08:07.738] [4/12 Execution] Blocks execution        from=19433410 to=19433443
erigon  | [INFO] [03-14|13:08:10.243] [4/12 Execution] Completed on            block=19433443
erigon  | [INFO] [03-14|13:08:10.243] [5/12 HashState] Promoting plain state   from=19433410 to=19433443
erigon  | [INFO] [03-14|13:08:10.243] [5/12 HashState] Incremental promotion   from=19433410 to=19433443 codes=true csbucket=AccountChangeSet
erigon  | [INFO] [03-14|13:08:10.335] [5/12 HashState] Incremental promotion   from=19433410 to=19433443 codes=false csbucket=AccountChangeSet
erigon  | [INFO] [03-14|13:08:10.512] [5/12 HashState] Incremental promotion   from=19433410 to=19433443 codes=false csbucket=StorageChangeSet
erigon  | [INFO] [03-14|13:08:10.829] [6/12 IntermediateHashes] Generating intermediate hashes from=19433410 to=19433443
erigon  | [INFO] [03-14|13:08:12.084] [p2p] GoodPeers                          eth68=99 eth67=17
erigon  | [INFO] [03-14|13:08:12.403] [txpool] stat                            pending=4189 baseFee=3787 queued=20033 alloc=7.4GB sys=18.0GB
erigon  | [INFO] [03-14|13:08:12.823] [7/12 CallTraces] Pruned call trace intermediate table from=19343410 to=19343442
erigon  | [INFO] [03-14|13:08:20.992] RPC Daemon notified of new headers       from=19433410 to=19433443 hash=0xa7c6125d5b08383c85e4b3644cfb48622fd104d09015a25c1a4f46aa8d447f2f header sending=63.662µs log sending=192ns
erigon  | [INFO] [03-14|13:08:20.992] head updated                             hash=0xa7c6125d5b08383c85e4b3644cfb48622fd104d09015a25c1a4f46aa8d447f2f number=19433443
erigon  | [INFO] [03-14|13:08:22.334] [snapshots] Prune Blocks                 to=19343000 limit=100
erigon  | [INFO] [03-14|13:08:33.092] [4/12 Execution] Completed on            block=19433445
erigon  | [INFO] [03-14|13:08:34.198] RPC Daemon notified of new headers       from=19433443 to=19433445 hash=0x722b756ef485425c7ca12c84ca7a37f0a028072792c6f4561d487704a52dc1da header sending=13.913µs log sending=203ns
erigon  | [INFO] [03-14|13:08:34.198] head updated                             hash=0x722b756ef485425c7ca12c84ca7a37f0a028072792c6f4561d487704a52dc1da number=19433445
erigon  | [INFO] [03-14|13:08:34.493] [snapshots] Prune Blocks                 to=19343000 limit=100
erigon  | [INFO] [03-14|13:08:36.033] P2P                                      app=caplin peers=61
erigon  | [INFO] [03-14|13:08:55.611] [4/12 Execution] Completed on            block=19433447
erigon  | [INFO] [03-14|13:08:56.234] RPC Daemon notified of new headers       from=19433445 to=19433447 hash=0x92c39f321188c0ab08980715ae25cf0605d76a349cd889d58ad32fa1f074a0c8 header sending=14.188µs log sending=180ns
erigon  | [INFO] [03-14|13:08:56.234] head updated                             hash=0x92c39f321188c0ab08980715ae25cf0605d76a349cd889d58ad32fa1f074a0c8 number=19433447
erigon  | [INFO] [03-14|13:08:56.668] [snapshots] Prune Blocks                 to=19343000 limit=100
toolchainx commented 7 months ago

seen the same problem with v2.58.1, node becomes unusable to delays up to 10minutes. I use prysm as the CL client and it reports the following error occationally:

image

start up cmd:

          command:
            - erigon
          args: [ "--fakepow",
            "--rpc.gascap=80000000",
            "--private.api.addr=localhost:9090",
            "--txpool.globalslots=400000",
            "--txpool.globalqueue=400000",
            "--txpool.globalbasefeeslots=400000",
            "--txpool.accountqueue=16",
            "--db.size.limit=8TB",
            "--metrics",
            "--metrics.addr=0.0.0.0",
            "--authrpc.addr=0.0.0.0",
            "--authrpc.vhosts=*",
            "--authrpc.jwtsecret=/opt/secrets/jwt.hex",
            "--http.addr=0.0.0.0",
            "--http.corsdomain=*",
            "--http.vhosts=*",
            "--http.port=9000",
            "--http.api=erigon,ots,engine,eth,trace,net,txpool,debug,web3,admin",
radoslav-dyulgerov commented 7 months ago

Same issue v2.58.2

[Mar 18 05:39:58 eth01 erigon[7922]: [INFO] [03-18|05:39:58.669] [txpool] stat                            pending=7072 baseFee=28469 queued=29989 alloc=13.4GB sys=32.8GB
Mar 18 05:40:30 eth01 erigon[7922]: [INFO] [03-18|05:40:30.628] P2P                                      app=caplin peers=42
Mar 18 05:40:45 eth01 erigon[7922]: [INFO] [03-18|05:40:45.540] [Caplin] Forward Sync                    app=caplin stage=ForwardSync progress=8659647 distance-from-chain-tip=9m24s estimated-time-remaining=24s
Mar 18 05:41:26 eth01 erigon[7922]: [INFO] [03-18|05:41:26.494] [Caplin] Forward Sync                    app=caplin stage=ForwardSync progress=8659702 distance-from-chain-tip=-1m36s estimated-time-remaining=0s
Mar 18 05:41:30 eth01 erigon[7922]: [INFO] [03-18|05:41:30.628] P2P                                      app=caplin peers=39
Mar 18 05:42:30 eth01 erigon[7922]: [INFO] [03-18|05:42:30.628] P2P                                      app=caplin peers=42
Mar 18 05:42:31 eth01 erigon[7922]: [INFO] [03-18|05:42:31.985] [3/12 Senders] Started                   from=19459567 to=19459671
Mar 18 05:42:32 eth01 erigon[7922]: [INFO] [03-18|05:42:32.109] [4/12 Execution] Blocks execution        from=19459567 to=19459671
Mar 18 05:42:50 eth01 erigon[7922]: [INFO] [03-18|05:42:50.572] [p2p] GoodPeers                          eth68=33 eth67=28 eth66=1
Mar 18 05:42:58 eth01 erigon[7922]: [INFO] [03-18|05:42:58.676] [txpool] stat                            pending=8021 baseFee=28612 queued=30000 alloc=19.6GB sys=32.8GB
Mar 18 05:43:02 eth01 erigon[7922]: [INFO] [03-18|05:43:02.451] [4/12 Execution] Executed blocks         number=19459629 blk/s=2.0 tx/s=329.7 Mgas/s=31.5 gasState=0.00 batch=1.7MB alloc=21.6GB sys=32.8GB
Mar 18 05:43:30 eth01 erigon[7922]: [INFO] [03-18|05:43:30.628] P2P                                      app=caplin peers=35
Mar 18 05:43:33 eth01 erigon[7922]: [INFO] [03-18|05:43:33.817] [4/12 Execution] Executed blocks         number=19459663 blk/s=1.1 tx/s=150.6 Mgas/s=16.6 gasState=0.00 batch=2.5MB alloc=15.2GB sys=32.8GB
Mar 18 05:43:49 eth01 erigon[7922]: [INFO] [03-18|05:43:49.748] [4/12 Execution] Completed on            block=19459671
Mar 18 05:43:49 eth01 erigon[7922]: [INFO] [03-18|05:43:49.749] [4/12 Execution] DONE                    in=1m17.639230308s
Mar 18 05:43:49 eth01 erigon[7922]: [INFO] [03-18|05:43:49.749] [5/12 HashState] Promoting plain state   from=19459567 to=19459671
Mar 18 05:43:49 eth01 erigon[7922]: [INFO] [03-18|05:43:49.749] [5/12 HashState] Incremental promotion   from=19459567 to=19459671 codes=true csbucket=AccountChangeSet
Mar 18 05:43:58 eth01 erigon[7922]: [INFO] [03-18|05:43:58.037] [5/12 HashState] Incremental promotion   from=19459567 to=19459671 codes=false csbucket=AccountChangeSet
Mar 18 05:44:20 eth01 erigon[7922]: [INFO] [03-18|05:44:20.664] [5/12 HashState] Incremental promotion   from=19459567 to=19459671 codes=false csbucket=StorageChangeSet
Mar 18 05:44:30 eth01 erigon[7922]: [INFO] [03-18|05:44:30.628] P2P                                      app=caplin peers=32
Mar 18 05:44:52 eth01 erigon[7922]: [INFO] [03-18|05:44:52.761] [5/12 HashState] ETL [2/2] Loading       into=HashedStorage current_prefix=8521aab8
Mar 18 05:45:13 eth01 erigon[7922]: [INFO] [03-18|05:45:13.194] [5/12 HashState] DONE                    in=1m23.445616353s
Mar 18 05:45:13 eth01 erigon[7922]: [INFO] [03-18|05:45:13.194] [6/12 IntermediateHashes] Generating intermediate hashes from=19459567 to=19459671
Mar 18 05:45:30 eth01 erigon[7922]: [INFO] [03-18|05:45:30.628] P2P                                      app=caplin peers=25
Mar 18 05:45:43 eth01 erigon[7922]: [INFO] [03-18|05:45:43.761] [6/12 IntermediateHashes] Calculating Merkle root current key=33a67469
Mar 18 05:45:50 eth01 erigon[7922]: [INFO] [03-18|05:45:50.572] [p2p] GoodPeers                          eth68=33 eth66=1 eth67=28
Mar 18 05:45:58 eth01 erigon[7922]: [INFO] [03-18|05:45:58.675] [txpool] stat                            pending=9181 baseFee=28732 queued=29990 alloc=12.6GB sys=32.8GB
Mar 18 05:46:14 eth01 erigon[7922]: [INFO] [03-18|05:46:14.306] [6/12 IntermediateHashes] Calculating Merkle root current key=8679e8ed
Mar 18 05:46:30 eth01 erigon[7922]: [INFO] [03-18|05:46:30.628] P2P                                      app=caplin peers=31
Mar 18 05:46:43 eth01 erigon[7922]: [INFO] [03-18|05:46:43.762] [6/12 IntermediateHashes] Calculating Merkle root current key=d8e32155
Mar 18 05:47:07 eth01 erigon[7922]: [INFO] [03-18|05:47:07.768] [6/12 IntermediateHashes] DONE           in=1m54.574163429s
Mar 18 05:47:30 eth01 erigon[7922]: [INFO] [03-18|05:47:30.628] P2P                                      app=caplin peers=25
Mar 18 05:48:30 eth01 erigon[7922]: [INFO] [03-18|05:48:30.628] P2P                                      app=caplin peers=28
Mar 18 05:48:32 eth01 erigon[7922]: [INFO] [03-18|05:48:32.163] [9/12 StorageHistoryIndex] ETL [2/2] Loading into=StorageHistory current_prefix=a735a3af
Mar 18 05:48:50 eth01 erigon[7922]: [INFO] [03-18|05:48:50.571] [p2p] GoodPeers                          eth68=33 eth67=28 eth66=1
Mar 18 05:48:52 eth01 erigon[7922]: [INFO] [03-18|05:48:52.678] [10/12 LogIndex] processing              from=19459568 to=19459671
Mar 18 05:48:58 eth01 erigon[7922]: [INFO] [03-18|05:48:58.670] [txpool] stat                            pending=10000 baseFee=28921 queued=30000 alloc=13.5GB sys=32.8GB
Mar 18 05:49:22 eth01 erigon[7922]: [INFO] [03-18|05:49:22.995] [10/12 LogIndex] ETL [2/2] Loading       into=LogTopicIndex current_prefix=000000000000000000000000c4710d008d27dd8cdeb4955900e5d744372ae588ffffffff
Mar 18 05:49:30 eth01 erigon[7922]: [INFO] [03-18|05:49:30.628] P2P                                      app=caplin peers=25
Mar 18 05:50:02 eth01 erigon[7922]: [INFO] [03-18|05:50:02.002] RPC Daemon notified of new headers       from=19459567 to=19459671 hash=0x9938749488f26c686f2764e3475d1d3105172a0f0e8a8e229199653528c0b8e9 header sending=253.51µs log sending=683ns
Mar 18 05:50:02 eth01 erigon[7922]: [INFO] [03-18|05:50:02.002] head updated                             hash=0x9938749488f26c686f2764e3475d1d3105172a0f0e8a8e229199653528c0b8e9 number=19459671