ethereum-optimism / op-geth

GNU Lesser General Public License v3.0
254 stars 652 forks source link

Snap Sync - "Syncing: state healing in progress" never ends #328

Closed andreclaro closed 3 weeks ago

andreclaro commented 3 weeks ago

System information

op_geth: "v1.101315.0" op_node: "op-node/v1.7.5"

OS & Version: Linux (Ubuntu:

Distributor ID: Ubuntu
Description:    Ubuntu 22.04.3 LTS
Release:    22.04
Codename:   jammy

Expected behaviour

I was expected snap sync to take a few days not weeks or months.

Actual behaviour

"Syncing: state healing in progress" is taking a lot to finish.... Op-geth is running for 7h 34min and it only download 331,784 accounts:

INFO [06-03|19:18:08.821] Syncing: state healing in progress       accounts=331,784@24.57MiB slots=839,671@59.93MiB codes=198@1.28MiB    nodes=19,919,444@6.16GiB pending=14350

Steps to reproduce the behaviour

Start nodes with snap sync enabled

Example of the command for base mainnet... for optimism it was used a similar command:

Op-geth cli:

/usr/local/bin/geth \
    --op-network=base-mainnet \
    --datadir=/var/lib/base/data \
    --syncmode=snap \
    --gcmode=archive \
    --networkid="8453" \
    --http \
    --http.addr=0.0.0.0 \
    --http.port=8545 \
    --http.vhosts="*" \
    --http.corsdomain="*" \
    --http.api=web3,debug,eth,net,engine,geth \
    --ws \
    --ws.port=8546 \
    --ws.addr=0.0.0.0 \
    --ws.origins="*" \
    --ws.api=debug,eth,net,engine \
    --authrpc.addr=127.0.0.1 \
    --authrpc.port=8551 \
    --authrpc.jwtsecret=/var/lib/base/jwt.txt \
    --authrpc.vhosts="*" \
    --metrics \
    --metrics.addr=0.0.0.0 \
    --metrics.port=6060 \
    --port=30303 \
    --discovery.port=30303 \
    --db.engine=pebble \
    --state.scheme=hash \
    --rollup.disabletxpoolgossip=true \
    --rollup.sequencerhttp=https://mainnet-sequencer.base.org \
    --rollup.superchain-upgrades \
    --rollup.halt=major \
    --verbosity=3

op-node cli:

/usr/local/bin/op-node \
    --syncmode=execution-layer \
    --l1.trustrpc \
    --l1.rpckind=basic \
    --l1=http://L1_ADDRESS:8545 \
    --l1.beacon=http://L1_ADDRESS:5052 \
    --l2=http://127.0.0.1:8551 \
    --rpc.addr=127.0.0.1 \
    --rpc.port=9545 \
    --l2.jwt-secret=/var/lib/base/jwt.txt \
    --network=base-mainnet \
    --p2p.peerstore.path=/var/lib/base/data/opnode_peerstore_db \
    --p2p.priv.path=/var/lib/base/data/opnode_p2p_priv.txt \
    --p2p.discovery.path=/var/lib/base/data/opnode_discovery_db \
    --rollup.load-protocol-versions=true \
    --rollup.halt=major

Logs

op-geth in optimism mainnet:

Jun 03 19:14:29 m-optimism-02 geth[2372795]: WARN [06-03|19:14:29.265] Unexpected trienode heal packet          peer=c240164a reqid=7,463,511,122,718,607,702
Jun 03 19:14:29 m-optimism-02 geth[2372795]: WARN [06-03|19:14:29.266] Unexpected trienode heal packet          peer=a17f4298 reqid=7,825,150,680,345,553,163
Jun 03 19:14:29 m-optimism-02 geth[2372795]: WARN [06-03|19:14:29.266] Unexpected trienode heal packet          peer=b930c42d reqid=6,783,511,754,014,021,662
Jun 03 19:14:29 m-optimism-02 geth[2372795]: WARN [06-03|19:14:29.301] Unexpected trienode heal packet          peer=6a241836 reqid=4,357,611,404,741,055,740
Jun 03 19:14:29 m-optimism-02 geth[2372795]: WARN [06-03|19:14:29.301] Unexpected trienode heal packet          peer=c3d317d3 reqid=3,000,766,357,146,577,673
Jun 03 19:14:29 m-optimism-02 geth[2372795]: WARN [06-03|19:14:29.301] Unexpected trienode heal packet          peer=dc230607 reqid=952,936,897,351,638,830
Jun 03 19:14:29 m-optimism-02 geth[2372795]: WARN [06-03|19:14:29.326] Unexpected trienode heal packet          peer=84525520 reqid=1,840,405,469,761,331,709
Jun 03 19:14:29 m-optimism-02 geth[2372795]: WARN [06-03|19:14:29.353] Unexpected trienode heal packet          peer=570640a3 reqid=8,746,312,551,116,258,011
Jun 03 19:14:29 m-optimism-02 geth[2372795]: WARN [06-03|19:14:29.380] Unexpected trienode heal packet          peer=3e724112 reqid=6,294,787,873,954,858,255
Jun 03 19:14:30 m-optimism-02 geth[2372795]: INFO [06-03|19:14:30.244] Syncing: chain download in progress      synced=100.00% chain=216.31GiB headers=120,921,645@47.88GiB bodies=120,921,580@74.58GiB receipts=120,921,580@93.85GiB eta=2m10.660s
Jun 03 19:14:31 m-optimism-02 geth[2372795]: INFO [06-03|19:14:31.122] Forkchoice requested sync to new head    number=120,921,647 hash=0a92e3..603727
Jun 03 19:14:33 m-optimism-02 geth[2372795]: INFO [06-03|19:14:33.131] Forkchoice requested sync to new head    number=120,921,648 hash=548d82..73986a
Jun 03 19:14:35 m-optimism-02 geth[2372795]: INFO [06-03|19:14:35.128] Forkchoice requested sync to new head    number=120,921,649 hash=748a58..1edb9e
Jun 03 19:14:37 m-optimism-02 geth[2372795]: INFO [06-03|19:14:37.123] Forkchoice requested sync to new head    number=120,921,650 hash=e8dc71..40386b
Jun 03 19:14:37 m-optimism-02 geth[2372795]: INFO [06-03|19:14:37.356] Syncing: state healing in progress       accounts=329,607@24.40MiB slots=838,910@59.87MiB codes=197@1.28MiB    nodes=19,901,084@6.15GiB pending=5601
Jun 03 19:14:38 m-optimism-02 geth[2372795]: INFO [06-03|19:14:38.261] Syncing: chain download in progress      synced=100.00% chain=216.31GiB headers=120,921,650@47.88GiB bodies=120,921,580@74.58GiB receipts=120,921,580@93.85GiB eta=2m18.609s
Jun 03 19:14:39 m-optimism-02 geth[2372795]: INFO [06-03|19:14:39.128] Forkchoice requested sync to new head    number=120,921,651 hash=371f33..dfa856
Jun 03 19:14:41 m-optimism-02 geth[2372795]: INFO [06-03|19:14:41.121] Forkchoice requested sync to new head    number=120,921,652 hash=6bbf3d..4c4b1b
Jun 03 19:14:43 m-optimism-02 geth[2372795]: INFO [06-03|19:14:43.123] Forkchoice requested sync to new head    number=120,921,653 hash=562e10..eb932e
Jun 03 19:14:45 m-optimism-02 geth[2372795]: INFO [06-03|19:14:45.123] Forkchoice requested sync to new head    number=120,921,654 hash=46816d..eac17d
Jun 03 19:14:45 m-optimism-02 geth[2372795]: INFO [06-03|19:14:45.360] Syncing: state healing in progress       accounts=329,607@24.40MiB slots=838,910@59.87MiB codes=197@1.28MiB    nodes=19,901,643@6.15GiB pending=7497
Jun 03 19:14:46 m-optimism-02 geth[2372795]: INFO [06-03|19:14:46.272] Syncing: chain download in progress      synced=100.00% chain=216.31GiB headers=120,921,654@47.88GiB bodies=120,921,580@74.58GiB receipts=120,921,580@93.85GiB eta=2m26.562s
Jun 03 19:14:47 m-optimism-02 geth[2372795]: INFO [06-03|19:14:47.214] Forkchoice requested sync to new head    number=120,921,655 hash=0e2739..3c3448
Jun 03 19:14:49 m-optimism-02 geth[2372795]: INFO [06-03|19:14:49.121] Forkchoice requested sync to new head    number=120,921,656 hash=0fc373..5a45b7
Jun 03 19:14:51 m-optimism-02 geth[2372795]: INFO [06-03|19:14:51.128] Forkchoice requested sync to new head    number=120,921,657 hash=dda588..46a80c
Jun 03 19:14:53 m-optimism-02 geth[2372795]: INFO [06-03|19:14:53.124] Forkchoice requested sync to new head    number=120,921,658 hash=4288b2..6c469c

op-geth in base mainnet:

Jun 03 19:14:51 m-base-02 geth[3802696]: INFO [06-03|19:14:51.441] Forkchoice requested sync to new head    number=15,326,372 hash=ed05cf..6d9cfe
Jun 03 19:14:53 m-base-02 geth[3802696]: INFO [06-03|19:14:53.005] Syncing: chain download in progress      synced=100.00% chain=127.43GiB headers=15,326,371@6.60GiB bodies=15,326,261@62.24GiB receipts=15,326,261@58.60GiB eta=3m42.068s
Jun 03 19:14:53 m-base-02 geth[3802696]: INFO [06-03|19:14:53.524] Forkchoice requested sync to new head    number=15,326,373 hash=e7e087..0347e3
Jun 03 19:14:55 m-base-02 geth[3802696]: INFO [06-03|19:14:55.506] Forkchoice requested sync to new head    number=15,326,374 hash=b8bf26..619e4a
Jun 03 19:14:57 m-base-02 geth[3802696]: INFO [06-03|19:14:57.732] Forkchoice requested sync to new head    number=15,326,375 hash=f62c14..2dfb20
Jun 03 19:14:58 m-base-02 geth[3802696]: INFO [06-03|19:14:58.569] Syncing: state healing in progress       accounts=906,680@56.11MiB slots=2,981,977@219.66MiB codes=6891@41.57MiB nodes=51,574,871@13.36GiB pending=14363
Jun 03 19:14:59 m-base-02 geth[3802696]: INFO [06-03|19:14:59.353] Forkchoice requested sync to new head    number=15,326,376 hash=a1bbd4..249ad5
Jun 03 19:15:01 m-base-02 geth[3802696]: INFO [06-03|19:15:01.009] Syncing: chain download in progress      synced=100.00% chain=127.43GiB headers=15,326,376@6.60GiB bodies=15,326,261@62.24GiB receipts=15,326,261@58.60GiB eta=3m50.120s
Jun 03 19:15:01 m-base-02 geth[3802696]: INFO [06-03|19:15:01.702] Forkchoice requested sync to new head    number=15,326,377 hash=0ec5ad..d1106f
Jun 03 19:15:03 m-base-02 geth[3802696]: INFO [06-03|19:15:03.313] Forkchoice requested sync to new head    number=15,326,378 hash=d0034f..0e2825
Jun 03 19:15:05 m-base-02 geth[3802696]: INFO [06-03|19:15:05.521] Forkchoice requested sync to new head    number=15,326,379 hash=fa788a..d09e62
Jun 03 19:15:06 m-base-02 geth[3802696]: INFO [06-03|19:15:06.586] Syncing: state healing in progress       accounts=907,055@56.14MiB slots=2,982,463@219.70MiB codes=6891@41.57MiB nodes=51,577,805@13.36GiB pending=13974
Jun 03 19:15:07 m-base-02 geth[3802696]: INFO [06-03|19:15:07.309] Forkchoice requested sync to new head    number=15,326,380 hash=2b05ce..540f18
Jun 03 19:15:09 m-base-02 geth[3802696]: INFO [06-03|19:15:09.013] Syncing: chain download in progress      synced=100.00% chain=127.43GiB headers=15,326,380@6.60GiB bodies=15,326,261@62.24GiB receipts=15,326,261@58.60GiB eta=3m58.176s
Jun 03 19:15:09 m-base-02 geth[3802696]: INFO [06-03|19:15:09.439] Forkchoice requested sync to new head    number=15,326,381 hash=c86691..3bc202
Jun 03 19:15:11 m-base-02 geth[3802696]: INFO [06-03|19:15:11.433] Forkchoice requested sync to new head    number=15,326,382 hash=c4bc96..4b3011
Jun 03 19:15:13 m-base-02 geth[3802696]: INFO [06-03|19:15:13.361] Forkchoice requested sync to new head    number=15,326,383 hash=71e329..1c1abe
Jun 03 19:15:14 m-base-02 geth[3802696]: INFO [06-03|19:15:14.696] Syncing: state healing in progress       accounts=907,337@56.15MiB slots=2,983,027@219.74MiB codes=6894@41.58MiB nodes=51,581,046@13.36GiB pending=14710
Jun 03 19:15:14 m-base-02 geth[3802696]: WARN [06-03|19:15:14.894] Pivot seemingly stale, moving            old=15,326,262 new=15,326,319
Jun 03 19:15:15 m-base-02 geth[3802696]: INFO [06-03|19:15:15.223] Syncing: state healing in progress       accounts=907,343@56.16MiB slots=2,983,059@219.74MiB codes=6894@41.58MiB nodes=51,581,251@13.36GiB pending=14828
Jun 03 19:15:15 m-base-02 geth[3802696]: WARN [06-03|19:15:15.238] Unexpected trienode heal packet          peer=d2308a34 reqid=1,285,784,775,404,610,008
Jun 03 19:15:15 m-base-02 geth[3802696]: WARN [06-03|19:15:15.242] Unexpected trienode heal packet          peer=b5ae8a65 reqid=2,893,195,299,212,542,727
Jun 03 19:15:15 m-base-02 geth[3802696]: WARN [06-03|19:15:15.248] Unexpected trienode heal packet          peer=f2be8fb2 reqid=8,598,187,639,251,733,829
Jun 03 19:15:15 m-base-02 geth[3802696]: WARN [06-03|19:15:15.251] Unexpected trienode heal packet          peer=1c176021 reqid=336,205,833,007,489,592
Jun 03 19:15:15 m-base-02 geth[3802696]: WARN [06-03|19:15:15.256] Unexpected trienode heal packet          peer=25109c23 reqid=4,523,216,003,359,259,572
Jun 03 19:15:15 m-base-02 geth[3802696]: WARN [06-03|19:15:15.260] Unexpected trienode heal packet          peer=6e1963b4 reqid=7,993,713,479,880,516,988
Jun 03 19:15:15 m-base-02 geth[3802696]: WARN [06-03|19:15:15.266] Unexpected trienode heal packet          peer=03d4fc22 reqid=5,832,566,645,314,874,925
Jun 03 19:15:15 m-base-02 geth[3802696]: WARN [06-03|19:15:15.266] Unexpected trienode heal packet          peer=6ab3c255 reqid=6,332,110,799,182,848,106
Jun 03 19:15:15 m-base-02 geth[3802696]: WARN [06-03|19:15:15.267] Unexpected trienode heal packet          peer=73384605 reqid=68,074,781,591,380,592
andreclaro commented 3 weeks ago

I also noticed the warning Pivot seemingly stale, moving message several times in the logs:

geth[3802696]: WARN [06-03|19:09:32.831] Pivot seemingly stale, moving            old=15,326,091 new=15,326,148
andreclaro commented 3 weeks ago

I have just tried using io2 with max iops and I don't see improvements...

andreclaro commented 3 weeks ago

I was able to spin-up the node using snap sync after cleaning the db and using an instance type with more cpu and memory.... Example for base mainnet:

op.geth:

/usr/local/bin/geth \
    --op-network=base-mainnet \
    --datadir=/var/lib/base/data \
    --syncmode=snap \
    --gcmode=archive \
    --networkid="8453" \
    --http \
    --http.addr=0.0.0.0 \
    --http.port=8545 \
    --http.vhosts="*" \
    --http.corsdomain="*" \
    --http.api=web3,debug,eth,net,engine,geth \
    --ws \
    --ws.port=8546 \
    --ws.addr=0.0.0.0 \
    --ws.origins="*" \
    --ws.api=debug,eth,net,engine \
    --authrpc.addr=127.0.0.1 \
    --authrpc.port=8551 \
    --authrpc.jwtsecret=/var/lib/base/jwt.txt \
    --authrpc.vhosts="*" \
    --metrics \
    --metrics.addr=0.0.0.0 \
    --metrics.port=6060 \
    --maxpeers=500 \
    --port=30303 \
    --discovery.port=30303 \
    --rollup.disabletxpoolgossip=true \
    --rollup.sequencerhttp=https://mainnet-sequencer.base.org \
    --rollup.superchain-upgrades \
    --rollup.halt=major \
    --verbosity=3

op-node:

/usr/local/bin/op-node \
    --syncmode=execution-layer \
    --l1.trustrpc \
    --l1.rpckind=basic \
    --l1=http://ETH_URL:8545 \
    --l1.beacon=http://ETH_URL:5052 \
    --l2=http://127.0.0.1:8551 \
    --rpc.addr=127.0.0.1 \
    --rpc.port=9545 \
    --l2.jwt-secret=/var/lib/base/jwt.txt \
    --network=base-mainnet \
    --p2p.peerstore.path=/var/lib/base/data/opnode_peerstore_db \
    --p2p.priv.path=/var/lib/base/data/opnode_p2p_priv.txt \
    --p2p.discovery.path=/var/lib/base/data/opnode_discovery_db \
    --rollup.load-protocol-versions=true \
    --rollup.halt=major
Type=simple