Erigon client stops syncing intermittently until it's restarted

Adewagold commented 1 year ago

Hello, we are experiencing some issues with our erigon node in Goerli, where the node stops syncing and continues after a restart. We are wondering if this is a general issue or if there's a fix or flag for this that we are not setting. Below are the logs for the time it got stuck.

[WARN] [03-09|07:30:10.875] no log dir set, console logging only
[INFO] [03-09|07:30:10.875] Starting metrics server                  addr=http://0.0.0.0:6060/debug/metrics/prometheus
[INFO] [03-09|07:30:10.875] Starting pprof server                    cpu="go tool pprof -lines -http=: http://0.0.0.0:6061/debug/pprof/profile?seconds=20" heap="go tool pprof -lines -http=: http://0.0.0.0:6061/debug/pprof/heap"
[WARN] [03-09|07:30:10.892] no log dir set, console logging only
[INFO] [03-09|07:30:10.893] Build info                               git_branch=heads/v2.39.0 git_tag=v2.39.0 git_commit=b5891e2c8a91a99511223fe9370a951106aa208e
[INFO] [03-09|07:30:10.893] Starting Erigon on Görli testnet...
[INFO] [03-09|07:30:10.896] Maximum peer count                       ETH=100 total=100
[INFO] [03-09|07:30:10.897] starting HTTP APIs                       APIs=eth,erigon,engine
[INFO] [03-09|07:30:10.897] torrent verbosity                        level=WRN
[INFO] [03-09|07:30:12.999] Set global gas cap                       cap=50000000
[INFO] [03-09|07:30:13.001] [Downloader] Runnning with               ipv6-enabled=true ipv4-enabled=true download.rate=16mb upload.rate=4mb
[INFO] [03-09|07:30:13.002] Opening Database                         label=chaindata path=/home/adewale/.local/share/erigon/chaindata
[INFO] [03-09|07:30:13.017] Initialised chain configuration          config="{ChainID: 5, Homestead: 0, DAO: <nil>, DAO Support: true, Tangerine Whistle: 0, Spurious Dragon: 0, Byzantium: 0, Constantinople: 0, Petersburg: 0, Istanbul: 1561651, Muir Glacier: <nil>, Berlin: 4460644, London: 5062605, Arrow Glacier: <nil>, Gray Glacier: <nil>, Terminal Total Difficulty: 10790000, Merge Netsplit: <nil>, Shanghai: <nil>, Cancun: <nil>, Engine: clique}" genesis=0xbf7e331f7f7c1dd2e05159666b3bf8bc7a8a3a9eb1d518969eab529dd9b88c1a
[INFO] [03-09|07:30:13.017] Effective                                prune_flags= snapshot_flags= history.v3=false
[INFO] [03-09|07:30:13.017] Initialising Ethereum protocol           network=5
[INFO] [03-09|07:30:13.041] Starting private RPC server              on=0.0.0.0:9090
[INFO] [03-09|07:30:13.041] new subscription to logs established
[INFO] [03-09|07:30:13.041] rpc filters: subscribing to Erigon events
[INFO] [03-09|07:30:13.041] New txs subscriber joined
[INFO] [03-09|07:30:13.042] new subscription to newHeaders established
[INFO] [03-09|07:30:13.042] Reading JWT secret                       path=/tmp/.jwt/jwt.hex
[INFO] [03-09|07:30:13.042] HTTP endpoint opened for Engine API      url=0.0.0.0:8551 ws=true ws.compression=true
[INFO] [03-09|07:30:13.043] HTTP endpoint opened                     url=localhost:8545 ws=false ws.compression=true grpc=false
[INFO] [03-09|07:30:13.052] [2/15 Headers] Waiting for Consensus Layer...
[INFO] [03-09|07:30:13.053] [txpool] Started
[INFO] [03-09|07:30:13.077] Started P2P networking                   version=67 self=enode://635862a06b72f6543a11e89fe5cb89f9029f223ffe37f3388bb7b1e97495bca21d2b25471ed421d50b1416c8614b5bdfb3e8cd7cb25114a92046c4d6a5a90607@127.0.0.1:30303 name=erigon/v2.39.0-stable-b5891e2c/linux-amd64/go1.19.6
[INFO] [03-09|07:30:13.086] Started P2P networking                   version=68 self=enode://635862a06b72f6543a11e89fe5cb89f9029f223ffe37f3388bb7b1e97495bca21d2b25471ed421d50b1416c8614b5bdfb3e8cd7cb25114a92046c4d6a5a90607@127.0.0.1:30304 name=erigon/v2.39.0-stable-b5891e2c/linux-amd64/go1.19.6
[WARN] [03-09|07:30:20.845] no log dir set, console logging only
[WARN] [03-09|07:30:20.845] no log dir set, console logging only
[INFO] [03-09|07:30:20.847] DB schemas compatible                    reader=6.0.0 database=6.0.0
[INFO] [03-09|07:30:20.848] Use --snapshots=false
[INFO] [03-09|07:30:20.865] rpc filters: subscribing to Erigon events
[INFO] [03-09|07:30:20.866] HTTP endpoint opened                     url=0.0.0.0:8845 ws=true ws.compression=false grpc=false
[INFO] [03-09|07:30:20.868] New txs subscriber joined
[INFO] [03-09|07:30:20.869] interfaces compatible                    remote_db= client=6.2.0 server=6.2.0
[INFO] [03-09|07:30:20.870] interfaces compatible                    remote_service=eth_backend client=3.2.0 server=3.2.0
[INFO] [03-09|07:30:20.870] interfaces compatible                    remote_service=mining client=1.0.0 server=1.0.0
[INFO] [03-09|07:30:20.871] interfaces compatible                    remote_service=tx_pool client=1.0.0 server=1.0.0
[INFO] [03-09|07:30:20.871] new subscription to newHeaders established
[INFO] [03-09|07:31:13.042] [txpool] stat                            block=0 pending=0 baseFee=0 queued=4 alloc=121.3MB sys=150.3MB

[INFO] [03-09|07:32:13.018] [p2p] GoodPeers                          eth67=16 eth66=1 eth68=1
[INFO] [03-09|07:32:13.043] [txpool] stat                            block=0 pending=0 baseFee=0 queued=4 alloc=122.2MB sys=194.6MB

MrLouzao commented 1 year ago

Hello. I have the same error. I started a new client with Goerli network and even if this is connected to a Lighthouse client that is in sync, the process of downloading blocks seems to be stuck at block 0.

I am deploying the node on Kubernetes with the following settings:

Erigon version: v2.40.1
Docker image: FROM docker.io/library/alpine:3.16

I build the image and everything seems to be ok. When I start the container the sync process should start and I can see the same traces but remains the txpool on block 0, and no sync process has started. I have peers too.

This is the command that I use to start the Erigon process:

"erigon --chain goerli --datadir='/home/node/.local/share/erigon' --snapshots=false \
        --private.api.addr=0.0.0.0:9090 \
        --metrics --metrics.addr=0.0.0.0 \
        --pprof --pprof.addr=0.0.0.0 --pprof.port=6061 \
        --authrpc.addr=0.0.0.0 --authrpc.port=8551 --authrpc.vhosts=* --authrpc.jwtsecret='/tmp/.jwt/jwt.hex' \
        --externalcl & \
        sleep 10 && \
        rpcdaemon --datadir='/home/node/.local/share/erigon' --snapshot=false \
        --private.api.addr=0.0.0.0:9090 \
        --http.addr=0.0.0.0 --http.port=8845 --http.vhosts=* --http.corsdomain=* --http.api=eth,erigon,web3,net,debug,trace,txpool,engine \
        --ws"

Is the txpool process making the node to stop? Should I remove it from the API list?

AskAlexSharov commented 1 year ago

not enough logs

MrLouzao commented 1 year ago

not enough logs

What kind of logs are useful? This ones? https://github.com/ledgerwatch/erigon#logging

AskAlexSharov commented 1 year ago

erigon has only one logs now, just usual output. first message has some: https://github.com/ledgerwatch/erigon/issues/7077#issue-1619047935 - but it's not enough.

MrLouzao commented 1 year ago

erigon has only one logs now, just usual output. first message has some: #7077 (comment) - but it's not enough.

I have similar traces to @Adewagold . The process is showing always that txpool has no data and is not syncing. But I can't find any error on logs.

[WARN] [03-14|06:48:37.556] no log dir set, console logging only
[INFO] [03-14|06:48:37.557] Starting metrics server                  addr=http://0.0.0.0:6060/debug/metrics/prometheus
[INFO] [03-14|06:48:37.557] Starting pprof server                    cpu="go tool pprof -lines -http=: http://0.0.0.0:6061/debug/pprof/profile?seconds=20" heap="go tool pprof -lines -http=: http://0.0.0.0:6061/debug/pprof/heap"
[WARN] [03-14|06:48:37.570] no log dir set, console logging only
[INFO] [03-14|06:48:37.570] Build info                               git_branch=heads/v2.40.1 git_tag=v2.40.1 git_commit=94b4cee5a898a47b9241ce824622af6350e5dc5b
[INFO] [03-14|06:48:37.570] Starting Erigon on Görli testnet...
[INFO] [03-14|06:48:37.575] Maximum peer count                       ETH=100 total=100
[INFO] [03-14|06:48:37.575] starting HTTP APIs                       APIs=eth,erigon,engine
[INFO] [03-14|06:48:37.575] torrent verbosity                        level=WRN
[INFO] [03-14|06:48:39.678] Set global gas cap                       cap=50000000
[INFO] [03-14|06:48:39.680] [Downloader] Runnning with               ipv6-enabled=true ipv4-enabled=true download.rate=16mb upload.rate=4mb
[INFO] [03-14|06:48:39.680] Opening Database                         label=chaindata path=/home/node/.local/share/erigon/chaindata
[INFO] [03-14|06:48:39.697] Initialised chain configuration          config="{ChainID: 5, Homestead: 0, DAO: <nil>, DAO Support: true, Tangerine Whistle: 0, Spurious Dragon: 0, Byzantium: 0, Constantinople: 0, Petersburg: 0, Istanbul: 1561651, Muir Glacier: <nil>, Berlin: 4460644, London: 5062605, Arrow Glacier: <nil>, Gray Glacier: <nil>, Terminal Total Difficulty: 10790000, Merge Netsplit: <nil>, Shanghai: 1678832736, Cancun: <nil>, Sharding: <nil>, Prague: <nil>, Engine: clique}" genesis=0xbf7e331f7f7c1dd2e05159666b3bf8bc7a8a3a9eb1d518969eab529dd9b88c1a
[INFO] [03-14|06:48:39.697] Effective                                prune_flags= snapshot_flags= history.v3=false
[INFO] [03-14|06:48:39.710] Initialising Ethereum protocol           network=5
[INFO] [03-14|06:48:39.733] Starting private RPC server              on=0.0.0.0:9090
[INFO] [03-14|06:48:39.733] new subscription to logs established
[INFO] [03-14|06:48:39.734] rpc filters: subscribing to Erigon events
[INFO] [03-14|06:48:39.734] new subscription to newHeaders established
[INFO] [03-14|06:48:39.734] New txs subscriber joined
[INFO] [03-14|06:48:39.735] Reading JWT secret                       path=/tmp/.jwt/jwt.hex
[INFO] [03-14|06:48:39.735] HTTP endpoint opened for Engine API      url=[::]:8551 ws=true ws.compression=true
[INFO] [03-14|06:48:39.735] HTTP endpoint opened                     url=127.0.0.1:8545 ws=false ws.compression=true grpc=false
[INFO] [03-14|06:48:39.746] [txpool] Started
[INFO] [03-14|06:48:39.774] Started P2P networking                   version=68 self=enode://635862a06b72f6543a11e89fe5cb89f9029f223ffe37f3388bb7b1e97495bca21d2b25471ed421d50b1416c8614b5bdfb3e8cd7cb25114a92046c4d6a5a90607@127.0.0.1:30304 name=erigon/v2.40.1-stable-94b4cee5/linux-amd64/go1.19.7
[INFO] [03-14|06:48:39.783] Started P2P networking                   version=67 self=enode://635862a06b72f6543a11e89fe5cb89f9029f223ffe37f3388bb7b1e97495bca21d2b25471ed421d50b1416c8614b5bdfb3e8cd7cb25114a92046c4d6a5a90607@127.0.0.1:30303 name=erigon/v2.40.1-stable-94b4cee5/linux-amd64/go1.19.7
[INFO] [03-14|06:48:39.784] [2/15 Headers] Waiting for Consensus Layer...
[WARN] [03-14|06:48:47.493] no log dir set, console logging only
[WARN] [03-14|06:48:47.493] no log dir set, console logging only
[INFO] [03-14|06:48:47.495] DB schemas compatible                    reader=6.0.0 database=6.0.0
[INFO] [03-14|06:48:47.496] Use --snapshots=false
[INFO] [03-14|06:48:47.496] [snapshots] Blocks Stat                  blocks=0k indices=0k alloc=20.6MB sys=35.9MB
[INFO] [03-14|06:48:47.498] [snapshots] Blocks Stat                  blocks=0k indices=0k alloc=20.8MB sys=35.9MB
[INFO] [03-14|06:48:47.516] rpc filters: subscribing to Erigon events
[INFO] [03-14|06:48:47.518] new subscription to newHeaders established
[INFO] [03-14|06:48:47.518] HTTP endpoint opened                     url=[::]:8845 ws=true ws.compression=false grpc=false
[INFO] [03-14|06:48:47.518] interfaces compatible                    remote_db= client=6.2.0 server=6.2.0
[INFO] [03-14|06:48:47.519] New txs subscriber joined
[INFO] [03-14|06:48:47.519] interfaces compatible                    remote_service=eth_backend client=3.2.0 server=3.2.0
[INFO] [03-14|06:48:47.519] [snapshots] Blocks Stat                  blocks=0k indices=0k alloc=21.7MB sys=39.9MB
[INFO] [03-14|06:48:47.520] interfaces compatible                    remote_service=mining client=1.0.0 server=1.0.0
[INFO] [03-14|06:48:47.520] interfaces compatible                    remote_service=tx_pool client=1.0.0 server=1.0.0
[INFO] [03-14|06:49:39.735] [txpool] stat                            block=0 pending=0 baseFee=0 queued=22 alloc=97.1MB sys=123.7MB
[INFO] [03-14|06:50:39.710] [p2p] GoodPeers                          eth66=2 eth67=7 eth68=6
[INFO] [03-14|06:50:39.735] [txpool] stat                            block=0 pending=0 baseFee=0 queued=22 alloc=152.5MB sys=182.6MB

[INFO] [03-14|06:51:39.735] [txpool] stat                            block=0 pending=0 baseFee=0 queued=22 alloc=149.7MB sys=211.8MB
[INFO] [03-14|06:52:39.710] [p2p] GoodPeers                          eth67=11 eth66=1 eth68=11
[INFO] [03-14|06:52:39.734] [txpool] stat                            block=0 pending=0 baseFee=0 queued=22 alloc=185.3MB sys=238.0MB
[INFO] [03-14|06:53:39.734] [txpool] stat                            block=0 pending=0 baseFee=0 queued=22 alloc=128.0MB sys=266.4MB
[INFO] [03-14|06:54:39.711] [p2p] GoodPeers                          eth67=13 eth66=2 eth68=16
[INFO] [03-14|06:54:39.735] [txpool] stat                            block=0 pending=0 baseFee=0 queued=22 alloc=207.1MB sys=270.7MB
[INFO] [03-14|06:55:39.734] [txpool] stat                            block=0 pending=0 baseFee=0 queued=22 alloc=186.0MB sys=308.9MB
[INFO] [03-14|06:56:39.710] [p2p] GoodPeers                          eth66=2 eth67=15 eth68=18
[INFO] [03-14|06:56:39.734] [txpool] stat                            block=0 pending=0 baseFee=0 queued=22 alloc=206.2MB sys=309.2MB

ybstaked commented 1 year ago

Same issue for us, but even restart is not fixing it, it's still stuck here:

{"jsonrpc":"2.0","id":1,"result":{"currentBlock":"0xf4263a","highestBlock":"0x1006711","stages":[{"stage_name":"Snapshots","block_number":"0xf4263a"},{"stage_name":"Headers","block_number":"0x1006711"},{"stage_name":"BlockHashes","block_number":"0x1006711"},{"stage_name":"Bodies","block_number":"0x1006711"},{"stage_name":"Senders","block_number":"0x1006711"},{"stage_name":"Execution","block_number":"0xf8dd17"},{"stage_name":"Translation","block_number":"0x0"},{"stage_name":"HashState","block_number":"0xf4263a"},{"stage_name":"IntermediateHashes","block_number":"0xf4263a"},{"stage_name":"AccountHistoryIndex","block_number":"0xf4263a"},{"stage_name":"StorageHistoryIndex","block_number":"0xf4263a"},{"stage_name":"LogIndex","block_number":"0xf4263a"},{"stage_name":"CallTraces","block_number":"0xf4263a"},{"stage_name":"TxLookup","block_number":"0xf4263a"},{"stage_name":"Finish","block_number":"0xf4263a"}]}}

ybstaked commented 1 year ago

any update here and/or new release? we have an erigon node that is still unable to sync to tip.

MrLouzao commented 1 year ago

Just to clarify, this only affects us on Goerli. We haven't seen this behaviour configuring Ropsten (until it was deprecated) or in Mainnet.

whidrasl commented 1 year ago

same for me for some months already (since v2.39 or smth like that). Looks like not enough nodes for Consensus layer (i'm using internal consensus client):

warning: read logs from bottom to top.

  |   | [INFO] [03-16\|11:54:25.048] [LightClient] P2P                        peers=0
  |   | [INFO] [03-16\|11:54:25.048] [txpool] stat                            block=8664571 pending=3396 baseFee=677 queued=3228 alloc=1.4GB sys=1.6GB
  |   | [INFO] [03-16\|11:53:25.050] [txpool] stat                            block=8664571 pending=3238 baseFee=638 queued=3062 alloc=1.5GB sys=1.6GB
  |   | [INFO] [03-16\|11:53:25.048] [LightClient] P2P                        peers=0
  |   | [INFO] [03-16\|11:53:14.283] [p2p] GoodPeers                          eth68=26 eth67=20
  |   | [INFO] [03-16\|11:52:25.048] [txpool] stat                            block=8664571 pending=3032 baseFee=629 queued=2354 alloc=1.0GB sys=1.6GB
  |   | [INFO] [03-16\|11:52:25.048] [LightClient] P2P                        peers=0
  |   | [INFO] [03-16\|11:51:25.050] [txpool] stat                            block=8664571 pending=2822 baseFee=619 queued=2205 alloc=1.3GB sys=1.6GB
  |   | [INFO] [03-16\|11:51:25.048] [LightClient] P2P                        peers=1
  |   | [INFO] [03-16\|11:51:14.284] [p2p] GoodPeers                          eth67=8 eth68=21
  |   | [INFO] [03-16\|11:50:25.048] [LightClient] P2P                        peers=0
  |   | [INFO] [03-16\|11:50:25.048] [txpool] stat                            block=8664571 pending=2621 baseFee=562 queued=2062 alloc=874.4MB sys=1.6GB
  |   | [INFO] [03-16\|11:49:25.050] [txpool] stat                            block=8664571 pending=2432 baseFee=505 queued=1888 alloc=983.9MB sys=1.5GB
  |   | [INFO] [03-16\|11:49:25.049] [LightClient] P2P                        peers=0
  |   | [INFO] [03-16\|11:49:14.283] [p2p] GoodPeers                          eth67=4 eth68=12
  |   | [INFO] [03-16\|11:48:25.049] [LightClient] P2P                        peers=0
  |   | [INFO] [03-16\|11:48:25.048] [txpool] stat                            block=8664571 pending=258 baseFee=146 queued=303 alloc=954.5MB sys=1.5GB
  |   | [INFO] [03-16\|11:47:25.064] [2/15 Headers] Waiting for Consensus Layer...
  |   | [INFO] [03-16\|11:47:25.063] [snapshots] Blocks Stat                  blocks=8574k indices=8574k alloc=1.2GB sys=1.5GB
  |   | [INFO] [03-16\|11:47:25.060] [1/15 Snapshots] Fetching torrent files metadata
  |   | [INFO] [03-16\|11:47:25.058] [txpool] Started
...

pending blocks count increasing, but no blocks downloaded

whidrasl commented 1 year ago

Yesterday my patience was over because restarts didn't help for a long time and I've created an issue :-D https://github.com/ledgerwatch/erigon/issues/7114

keithchew commented 1 year ago

I have had the same issue for quite a while now, since v2.37, and decided to look into this again today. After enabling the firewall to allow the TCP and UDP ports for the light client, I can now see healthy looking peers count, and both goerli and mainnet sync is much better. Don't know if this is co-incidence (I tweaked a few other things), so will keep monitoring. In the meantime, perhaps you can also give this a go and let us know if it helps.

whidrasl commented 1 year ago

@keithchew which ports needs to be opened for Goerli? (ports for incoming connections (listening ports)?)

whidrasl commented 1 year ago

port 4000/4001? I've not exposing this ports even on mainnet, but lightclient has 16-18 peers on mainnet...

whidrasl commented 1 year ago

setting --lightclient.discovery.port=10500 and --lightclient.discovery.tcpport=10500 and opening 10500 tcp/udp to public doesn't solve problem on Goerli (peers=0 almost always)

  |   | [INFO] [03-16\|15:19:11.664] [p2p] GoodPeers                          eth67=34 eth68=35
  |   | [INFO] [03-16\|15:19:01.178] [txpool] stat                            block=8665467 pending=1676 baseFee=1214 queued=8209 alloc=1.1GB sys=2.9GB
  |   | [INFO] [03-16\|15:19:01.177] [LightClient] P2P                        peers=0
  |   | [INFO] [03-16\|15:18:01.178] [txpool] stat                            block=8665467 pending=1391 baseFee=1148 queued=8161 alloc=1.6GB sys=2.9GB
  |   | [INFO] [03-16\|15:18:01.177] [LightClient] P2P                        peers=1
  |   | [INFO] [03-16\|15:17:11.665] [p2p] GoodPeers                          eth68=35 eth67=34
  |   | [INFO] [03-16\|15:17:01.179] [txpool] stat                            block=8665467 pending=1150 baseFee=1114 queued=8096 alloc=1.2GB sys=2.9GB
  |   | [INFO] [03-16\|15:17:01.177] [LightClient] P2P                        peers=3
  |   | [INFO] [03-16\|15:16:01.178] [txpool] stat                            block=8665467 pending=936 baseFee=1039 queued=8039 alloc=1.7GB sys=2.9GB
  |   | [INFO] [03-16\|15:16:01.178] [LightClient] P2P                        peers=0

keithchew commented 1 year ago

Did u check the network traffic to see that they are used? Sticking to erigon's port convention, I would keep the ports separate, eg 10500 and 10501 in your example. I have not checked the p2p protocol implementation, but back in the days, it is common for p2p protocols to use both types when fallback is needed. So, just to eliminate that possibility, keep them separate while troubleshooting your issue.

Here's are my Goerli logs (grep'ing just peers) after starting up for 30mins from cold boot::

[INFO] [03-16|18:44:53.978] [p2p] GoodPeers                          eth67=33 eth68=33
[INFO] [03-16|18:45:16.364] [LightClient] P2P                        peers=5
[INFO] [03-16|18:46:16.364] [LightClient] P2P                        peers=6
[INFO] [03-16|18:46:53.978] [p2p] GoodPeers                          eth67=33 eth68=33
[INFO] [03-16|18:47:16.656] [LightClient] P2P                        peers=5
[INFO] [03-16|18:48:16.363] [LightClient] P2P                        peers=5
[INFO] [03-16|18:48:53.978] [p2p] GoodPeers                          eth67=33 eth68=33
[INFO] [03-16|18:49:16.363] [LightClient] P2P                        peers=5
[INFO] [03-16|18:50:16.366] [LightClient] P2P                        peers=7
[INFO] [03-16|18:50:53.978] [p2p] GoodPeers                          eth67=33 eth68=33
[INFO] [03-16|18:51:16.363] [LightClient] P2P                        peers=7
[INFO] [03-16|18:52:16.364] [LightClient] P2P                        peers=10
[INFO] [03-16|18:52:53.978] [p2p] GoodPeers                          eth67=33 eth68=33
[INFO] [03-16|18:53:16.369] [LightClient] P2P                        peers=11
[INFO] [03-16|18:54:16.364] [LightClient] P2P                        peers=12

PS: peers count goes up to around 25 later on, and mainnet, around 130 peers...

whidrasl commented 1 year ago

@keithchew if you opened ports for lightclient, then what's your setting for --lightclient.discovery.addr? Because by default it is '127.0.0.1' and it not listening for external world connections

keithchew commented 1 year ago

Mine is set to 0.0.0.0 and sitting within a docker container. External traffic is port forwarded to the docker host server. You can use Wireshark or tcpdump to verify traffic is flowing through those ports.

whidrasl commented 1 year ago

finally thanks, @keithchew! Exposing lightclient ports solves the problem. --lightclient.discovery.addr must be exposed too (0.0.0.0 for example) :-) Even if ports are non-default (like 10500/10501) lightclient peers count is quickly incrementing :-)

keithchew commented 1 year ago

Perfect, you are most welcome! Yup, definitely a great feeling watching erigon sync up and keep up with the chain.

fenidik commented 1 year ago

I have same issue

[INFO] [03-23|18:23:09.578] [p2p] GoodPeers                          eth67=30 eth66=3 eth68=33
[INFO] [03-23|18:23:55.310] [txpool] stat                            block=16865544 pending=10000 baseFee=30000 queued=30000 alloc=4.2GB sys=8.6GB
[INFO] [03-23|18:24:55.310] [txpool] stat                            block=16865544 pending=10000 baseFee=30000 queued=30000 alloc=4.5GB sys=8.6GB
[INFO] [03-23|18:25:09.579] [p2p] GoodPeers                          eth67=30 eth66=3 eth68=33
[INFO] [03-23|18:25:55.310] [txpool] stat                            block=16865544 pending=10000 baseFee=30000 queued=30000 alloc=3.1GB sys=8.6GB
[INFO] [03-23|18:26:55.310] [txpool] stat                            block=16865544 pending=10000 baseFee=30000 queued=30000 alloc=3.4GB sys=8.6GB
[INFO] [03-23|18:27:09.579] [p2p] GoodPeers                          eth66=3 eth67=30 eth68=33
[INFO] [03-23|18:27:55.310] [txpool] stat                            block=16865544 pending=10000 baseFee=30000 queued=30000 alloc=3.0GB sys=8.6GB
[INFO] [03-23|18:28:55.309] [txpool] stat                            block=16865544 pending=10000 baseFee=30000 queued=30000 alloc=3.4GB sys=8.6GB
[INFO] [03-23|18:29:09.579] [p2p] GoodPeers                          eth68=33 eth67=30 eth66=3
[INFO] [03-23|18:29:56.196] [txpool] stat                            block=16865544 pending=10000 baseFee=30000 queued=30000 alloc=3.0GB sys=8.6GB
[INFO] [03-23|18:30:55.318] [txpool] stat                            block=16865544 pending=10000 baseFee=30000 queued=30000 alloc=4.2GB sys=8.6GB
[INFO] [03-23|18:31:09.583] [p2p] GoodPeers                          eth67=30 eth66=3 eth68=33

Just stuck. erigon:v2.40.1

run options: --http --http.api=engine,eth,erigon,web3,net,debug,trace,txpool,shh --bodies.cache=21474836480

last logs before stuck

[INFO] [03-20|01:13:38.671] RPC Daemon notified of new headers       from=16865540 to=16865541 hash=0x145ef732c53a748ebeaf623c462f2a568abc556d55e18695523173a161f624c8 header sending=15.754µs log sending=257ns
[INFO] [03-20|01:13:40.585] [2/15 Headers] Waiting for Consensus Layer...
[WARN] [03-20|01:13:40.840] Could not send beacon block to ETH1      err="missing attestation data"
[INFO] [03-20|01:13:48.471] [2/15 Headers] Handling new payload      height=16865542 hash=0x8608eac1159aee186155a726056ef75c148f821a2cf68c369325b5c882ce8251
[INFO] [03-20|01:13:49.350] [2/15 Headers] Waiting for Consensus Layer...
[INFO] [03-20|01:13:49.851] [2/15 Headers] Fork choice update: flushing in-memory state (built by previous newPayload)
[INFO] [03-20|01:13:50.855] Commit cycle                             in=225.823477ms
[INFO] [03-20|01:13:50.855] Timings (slower than 50ms)               Headers=639ms CallTraces=128ms AccountHistoryIndex=92ms StorageHistoryIndex=196ms LogIndex=105ms TxLookup=116ms
[INFO] [03-20|01:13:50.855] Tables                                   PlainState=88.0GB AccountChangeSet=184.4GB StorageChangeSet=281.3GB BlockTransaction=9.7GB TransactionLog=546.2GB FreeList=35.2MB ReclaimableSpace=35.2GB
[INFO] [03-20|01:13:50.855] RPC Daemon notified of new headers       from=16865541 to=16865542 hash=0x8608eac1159aee186155a726056ef75c148f821a2cf68c369325b5c882ce8251 header sending=17.543µs log sending=221ns
[INFO] [03-20|01:13:53.421] [2/15 Headers] Waiting for Consensus Layer...
[WARN] [03-20|01:13:53.674] Could not send beacon block to ETH1      err="missing attestation data"
[INFO] [03-20|01:13:55.307] [LightClient] P2P                        peers=2
[INFO] [03-20|01:13:55.320] [txpool] stat                            block=16865542 pending=6629 baseFee=6643 queued=29751 alloc=3.5GB sys=8.6GB
[INFO] [03-20|01:14:00.334] [2/15 Headers] Handling new payload      height=16865543 hash=0xef6b83881fff7b427f9b0264c282bd0c6ddd79948fc12a9df2d7bf5306d52e3c
[INFO] [03-20|01:14:01.368] [2/15 Headers] Waiting for Consensus Layer...
[INFO] [03-20|01:14:01.868] [2/15 Headers] Fork choice update: flushing in-memory state (built by previous newPayload)
[INFO] [03-20|01:14:02.974] Commit cycle                             in=299.037612ms
[INFO] [03-20|01:14:02.974] Timings (slower than 50ms)               Headers=671ms CallTraces=124ms AccountHistoryIndex=71ms StorageHistoryIndex=221ms LogIndex=110ms TxLookup=107ms
[INFO] [03-20|01:14:02.974] Tables                                   PlainState=88.0GB AccountChangeSet=184.4GB StorageChangeSet=281.3GB BlockTransaction=9.7GB TransactionLog=546.2GB FreeList=35.2MB ReclaimableSpace=35.2GB
[INFO] [03-20|01:14:02.974] RPC Daemon notified of new headers       from=16865542 to=16865543 hash=0xef6b83881fff7b427f9b0264c282bd0c6ddd79948fc12a9df2d7bf5306d52e3c header sending=17.971µs log sending=405ns
[INFO] [03-20|01:14:05.541] [2/15 Headers] Waiting for Consensus Layer...
[INFO] [03-20|01:14:12.370] [2/15 Headers] Handling new payload      height=16865544 hash=0x5a7f473d8f38a62877ba113a20a20dadc4f62b59c7b44075d478aef7b071981c
[INFO] [03-20|01:14:13.801] [2/15 Headers] Waiting for Consensus Layer...
[INFO] [03-20|01:14:14.302] [2/15 Headers] Fork choice update: flushing in-memory state (built by previous newPayload)
[INFO] [03-20|01:14:15.723] Commit cycle                             in=201.655616ms
[INFO] [03-20|01:14:15.723] Timings (slower than 50ms)               Headers=717ms CallTraces=274ms AccountHistoryIndex=223ms StorageHistoryIndex=124ms LogIndex=170ms TxLookup=207ms
[INFO] [03-20|01:14:15.723] Tables                                   PlainState=88.0GB AccountChangeSet=184.4GB StorageChangeSet=281.3GB BlockTransaction=9.7GB TransactionLog=546.2GB FreeList=35.2MB ReclaimableSpace=35.2GB
[INFO] [03-20|01:14:15.723] RPC Daemon notified of new headers       from=16865543 to=16865544 hash=0x5a7f473d8f38a62877ba113a20a20dadc4f62b59c7b44075d478aef7b071981c header sending=24.134µs log sending=243ns
[INFO] [03-20|01:14:19.056] [2/15 Headers] Waiting for Consensus Layer...
[WARN] [03-20|01:14:19.356] Could not send beacon block to ETH1      err="missing attestation data"
[INFO] [03-20|01:14:19.556] [Lightclient] Fetching Sync Committee Period period=737
[INFO] [03-20|01:14:55.309] [txpool] stat                            block=16865544 pending=6976 baseFee=6669 queued=29755 alloc=3.9GB sys=8.6GB
[INFO] [03-20|01:15:09.579] [p2p] GoodPeers                          eth66=5 eth67=27 eth68=32
[INFO] [03-20|01:15:55.317] [txpool] stat                            block=16865544 pending=7574 baseFee=6718 queued=29787 alloc=3.3GB sys=8.6GB
[INFO] [03-20|01:16:55.306] [txpool] stat                            block=16865544 pending=8003 baseFee=6746 queued=29856 alloc=3.6GB sys=8.6GB
[INFO] [03-20|01:17:09.579] [p2p] GoodPeers                          eth67=27 eth68=32 eth66=5

keithchew commented 1 year ago

Your logs above look like a different issue, in particular it stops after:

[INFO] [03-20|01:14:19.556] [Lightclient] Fetching Sync Committee Period period=737

I have reported this in #7162

tanmng commented 1 year ago

Hello everyone,

It appears I have the same issue, despite already setting all the necessary values

--lightclient.discovery.port 10500 --lightclient.discovery.tcpport 10501 --lightclient.discovery.addr 0.0.0.0

Log:

Mar 27 20:40:06 HOST_NAME erigon[836115]: [INFO] [03-27|20:40:06.685] [Checkpoint Sync] P2P                    peers=0
Mar 27 20:40:16 HOST_NAME erigon[836115]: [INFO] [03-27|20:40:16.685] [Checkpoint Sync] P2P                    peers=0
Mar 27 20:40:26 HOST_NAME erigon[836115]: [INFO] [03-27|20:40:26.685] [Checkpoint Sync] P2P                    peers=0
Mar 27 20:40:36 HOST_NAME erigon[836115]: [INFO] [03-27|20:40:36.686] [Checkpoint Sync] P2P                    peers=0
Mar 27 20:40:46 HOST_NAME erigon[836115]: [INFO] [03-27|20:40:46.685] [Checkpoint Sync] P2P                    peers=0
Mar 27 20:40:48 HOST_NAME erigon[836115]: [INFO] [03-27|20:40:48.308] Store Initialized successfully           slot=6093664 root=fc3b0af3ad33dcded22e7b5d9492444cf99e85f2ee80d0cc
521b60b4facf2d21
Mar 27 20:40:48 HOST_NAME erigon[836115]: [INFO] [03-27|20:40:48.308] rpc filters: subscribing to Erigon events
Mar 27 20:40:48 HOST_NAME erigon[836115]: [INFO] [03-27|20:40:48.308] [Lightclient] Fetching Sync Committee Period period=743
Mar 27 20:40:48 HOST_NAME erigon[836115]: [INFO] [03-27|20:40:48.308] New txs subscriber joined
Mar 27 20:40:48 HOST_NAME erigon[836115]: [INFO] [03-27|20:40:48.308] new subscription to newHeaders established
Mar 27 20:40:48 HOST_NAME erigon[836115]: [INFO] [03-27|20:40:48.308] [LightClient Gossip] Started Gossip
Mar 27 20:40:48 HOST_NAME erigon[836115]: [INFO] [03-27|20:40:48.308] Reading JWT secret                       path=/data/ethereum/jwt.hex
Mar 27 20:40:48 HOST_NAME erigon[836115]: [INFO] [03-27|20:40:48.309] HTTP endpoint opened for Engine API      url=127.0.0.1:8551 ws=true ws.compression=true
Mar 27 20:40:48 HOST_NAME erigon[836115]: [INFO] [03-27|20:40:48.309] HTTP endpoint opened                     url=[::]:8545 ws=true ws.compression=true grpc=false
Mar 27 20:40:48 HOST_NAME erigon[836115]: [INFO] [03-27|20:40:48.312] Started P2P networking                   version=68 self=enode://a00e2279dd9679269696c1b9d83d3da2b30e0bb5070c99899fe4114a71f914df73c45fcdf443989d3cef5ef0f0184671de29d45a32ad561ec3f8d871bb09edc0@127.0.0.1:0 name=erigon/v2.41.0-stable-52d5f37b/linux-amd64/go1.19.4
Mar 27 20:40:48 HOST_NAME erigon[836115]: [INFO] [03-27|20:40:48.314] Started P2P networking                   version=67 self=enode://a00e2279dd9679269696c1b9d83d3da2b30e0bb507
0c99899fe4114a71f914df73c45fcdf443989d3cef5ef0f0184671de29d45a32ad561ec3f8d871bb09edc0@127.0.0.1:0 name=erigon/v2.41.0-stable-52d5f37b/linux-amd64/go1.19.4
Mar 27 20:40:48 HOST_NAME erigon[836115]: [INFO] [03-27|20:40:48.321] [1/15 Snapshots] Fetching torrent files metadata
Mar 27 20:40:48 HOST_NAME erigon[836115]: [INFO] [03-27|20:40:48.325] [snapshots] Blocks Stat                  blocks=16767k indices=16767k alloc=2.8GB sys=3.7GB
Mar 27 20:40:48 HOST_NAME erigon[836115]: [INFO] [03-27|20:40:48.377] [2/15 Headers] Waiting for Consensus Layer...
Mar 27 20:41:48 HOST_NAME erigon[836115]: [INFO] [03-27|20:41:48.197] [p2p] GoodPeers                          eth66=9 eth67=3 eth68=4
Mar 27 20:41:48 HOST_NAME erigon[836115]: [INFO] [03-27|20:41:48.309] [LightClient] P2P                        peers=0
Mar 27 20:42:48 HOST_NAME erigon[836115]: [INFO] [03-27|20:42:48.308] [LightClient] P2P                        peers=0
Mar 27 20:43:48 HOST_NAME erigon[836115]: [INFO] [03-27|20:43:48.197] [p2p] GoodPeers                          eth66=10 eth67=12 eth68=8
Mar 27 20:43:48 HOST_NAME erigon[836115]: [INFO] [03-27|20:43:48.308] [LightClient] P2P                        peers=0
Mar 27 20:44:48 HOST_NAME erigon[836115]: [INFO] [03-27|20:44:48.309] [LightClient] P2P                        peers=1
Mar 27 20:45:48 HOST_NAME erigon[836115]: [INFO] [03-27|20:45:48.197] [p2p] GoodPeers                          eth67=18 eth66=10 eth68=13
Mar 27 20:45:48 HOST_NAME erigon[836115]: [INFO] [03-27|20:45:48.308] [LightClient] P2P                        peers=1
Mar 27 20:46:48 HOST_NAME erigon[836115]: [INFO] [03-27|20:46:48.309] [LightClient] P2P                        peers=1
Mar 27 20:47:48 HOST_NAME erigon[836115]: [INFO] [03-27|20:47:48.197] [p2p] GoodPeers                          eth67=22 eth66=10 eth68=13
Mar 27 20:47:48 HOST_NAME erigon[836115]: [INFO] [03-27|20:47:48.309] [LightClient] P2P                        peers=0

We are using Erigon 2.41.0-stable, the machine is an EC2 instance running inside of a VPC in AWS. Security groups allow access to both port 10500 and 10501 (UDP and TCP). I verified that the traffic works by nc-ing from my machine into the public IP address of the instance.

I believe the network is somewhat similar to what @keithchew has.

The problem only started since 2.37, earlier the node worked perfectly.

Any help is greatly appreciated.

keithchew commented 1 year ago

I can confirm that although exposing the ports on the firewall does help gather peers on startup, I can never get my nodes (goerli and mainnet) lasting (keeping up with the latest block) more than a few days. They will either (1) get stuck after "Fetching Sync Committee Period" (in #7162), or (2) peers count dropping to 0, as described by @tanmng.

Currently looking at running a dedicated external CL, to see if we can narrow this issue down to erigon's internal CL, or something else within erigon's EL logic.

tanmng commented 1 year ago

Thanks for the confirmation @keithchew. In that case I'll try to enable lighthouse for my node and see if that help resolves the issue. Will be a long time to wait for lighthouse to sync and catch up though.

keithchew commented 1 year ago

Yeah, I started up prysm on both goerli and mainnet last Friday, and they are around 50% sycned. Will post any updates here, and would also be good if you can report your progress with lighthouse.

winchell commented 1 year ago

I'm in GKE container and maybe need something like this? https://docs.prylabs.network/docs/prysm-usage/p2p-host-ip#broadcast-your-public-ip-address. Easy to expose any port on a public IP but there doesn't seem to be a way to tell the native Erigon CL about an address other than the ones it wants to bind directly.

https://github.com/ethereum/consensus-specs/blob/dev/specs/phase0/p2p-interface.md

"Nodes operating behind a NAT, or otherwise undialable by default (e.g. container runtime, firewall, etc.), MUST have their infrastructure configured to enable inbound traffic on the announced public listening endpoint."

Is the "announced public listening endpoint" configurable in Erigon CL, separate from the service bound IP?

keithchew commented 1 year ago

From my observation, I only need to expose the p2p tcp/udp ports on the firewall. When exposed correctly, you can observe (using tcpdump/wireshark/etc) traffic on the ports, and the peer count in erigon/prysm will be healthy. For example in prysm, without ports exposed you will only see 1 sided peer count (ie you are not a seeder), but when ports exposed you will see bi-directional peers.

winchell commented 1 year ago

From my observation, I only need to expose the p2p tcp/udp ports on the firewall. When exposed correctly, you can observe (using tcpdump/wireshark/etc) traffic on the ports, and the peer count in erigon/prysm will be healthy. For example in prysm, without ports exposed you will only see 1 sided peer count (ie you are not a seeder), but when ports exposed you will see bi-directional peers.

Doesn't work like that in a K8s environment.

keithchew commented 1 year ago

Even if you you could announce your IP in a swarm, how do you control/map the port assignments? I currently have a machines (combination of baremetal and docker containers), and to keep things simple for now, I am NAT'ing unique ports across the cluster. Eventually I will be looking at mapping to different source/dest ports, once I have stable setup (need to establish a baseline before making improvements).

When you get it going, please post here, it would certainly be useful knowledge for all.

winchell commented 1 year ago

Even if you you could announce your IP in a swarm, how do you control/map the port assignments? I currently have a machines (combination of baremetal and docker containers), and to keep things simple for now, I am NAT'ing unique ports across the cluster. Eventually I will be looking at mapping to different source/dest ports, once I have stable setup.

When you get it going, please post here, it would certainly be useful knowledge for all.

Without getting into details of K8s, it's just a situation where the Erigon container doesn't know its public IP and isn't binding to its public IP, so I would have to give that info at runtime. You're right, it doesn't know its public port either so that would be part of it. --lightclient.discovery.addr seems close but Erigon is actually binding to this IP so it's not the config I need.

keithchew commented 1 year ago

afaiks, the announcement IP is to make the peer more discoverable, but not required for p2p to function. If that is the case, a good chunk of users would not be able to get their nodes running.

winchell commented 1 year ago

afaiks, the announcement IP is to make the peer more discoverable, but not required for p2p to function. If that is the case, a good chunk of users would not be able to get their nodes running.

I'm also assuming that [LightClient] P2P peers should be well above the values of 0 to 3 which are typical of my node (Goerli). There were apparently multiple regressions in v2.41.0 in this area so maybe https://github.com/ledgerwatch/erigon/releases/tag/v2.42.0 is going to work better with my limitations.

Update: I've concluded that Erigon CL currently requires host networking to be well-connected properly. There's currently no way to tell it about an IP for discoverability that it isn't binding directly. It's bad practice and disallowed in GKE to say network=host for the Docker container, so it's a dead end in my environment. Prysm has a flag that appears to address this issue, so I assume it's a matter of feature add in Erigon CL. To be clear, it can find a peer or two without discoverability but when the peer count is <5 the node performance is very bad and often stuck.

ybstaked commented 1 year ago

I'm still seeing same issue even with git_branch=HEAD git_tag=v2.42.0-dirty git_commit=beb97784d43ece5acde365a74efe8763692ebdba

[INFO] [03-29|14:23:52.550] [2/15 Headers] Waiting for Consensus Layer...
[INFO] [03-29|14:24:01.019] [2/15 Headers] Handling new payload      height=16933519 hash=0xc3e8476632bbb842e1e781e2c5823ca41ba18ec31ac977fb5f5c60368459bd92
[INFO] [03-29|14:24:01.766] [2/15 Headers] Waiting for Consensus Layer...
[INFO] [03-29|14:24:02.266] [2/15 Headers] Fork choice update: flushing in-memory state (built by previous newPayload)
[INFO] [03-29|14:24:02.921] Commit cycle                             in=56.565277ms
[INFO] [03-29|14:24:02.921] Timings (slower than 50ms)               Headers=566ms CallTraces=121ms AccountHistoryIndex=86ms StorageHistoryIndex=142ms LogIndex=84ms TxLookup=97ms
[INFO] [03-29|14:24:02.921] Tables                                   PlainState=88.8GB AccountChangeSet=185.6GB StorageChangeSet=283.3GB BlockTransaction=10.8GB TransactionLog=551.7GB FreeList=143.1MB ReclaimableSpace=143.1GB
[INFO] [03-29|14:24:02.921] RPC Daemon notified of new headers       from=16933518 to=16933519 hash=0xc3e8476632bbb842e1e781e2c5823ca41ba18ec31ac977fb5f5c60368459bd92 header sending=12.998µs log sending=211ns
[INFO] [03-29|14:24:04.893] [2/15 Headers] Waiting for Consensus Layer...

keithchew commented 1 year ago

Hi @tanmng, how are you going with Lighthouse? I have an update. My goerli prysm just synced up, and Erigon <-> Prysm is working super well so far. Always on the latest block and peers count are all healthy. Waiting for mainnet prysm to sync up and will post update here.

@winchell, it is most certainly not an IP announcement issue with LightClient. My prysm instance is running without IP announcement and both inbound/outbound peers are healthy. It is more likely that there are less users running Erigon's LightClient or there are bugs/instability within LightClient that is the issue. Until LightClient is more performant/stable, I am sticking to an external CL for now.

winchell commented 1 year ago

@winchell, it is most certainly not an IP announcement issue with LightClient. My prysm instance is running without IP announcement and both inbound/outbound peers are healthy. It is more likely that there are less users running Erigon's LightClient or there are bugs/instability within in LightClient that is the issue. Until LightClient is more performant/stable, I am sticking to an external CL for now.

Good to know. Yeah, everything was fine when I was running Lighthouse. Same bad experience on v2.42.0 here.

fenidik commented 1 year ago

almost 1 month and still no solution?

keithchew commented 1 year ago

hi @tanmng my prysm mainnet has synced up and Erigon <-> prysm for mainnet works beautifully. My initial attempt was to wait for prysm to sync from genesis, which can take a while. After learning more about Erigon and Prysm, I have found a faster way to get both EL and CL synced up:

1. Run erigon with internal CL until synced to head
2. Run prysm with snapshot sync until synced to head
3. Restart erigon with externalcl flag
4. Restart prysm without checkpoint sync and connect to erigon

Just need to be careful when cutting over from steps 1 & 2 to 3 & 4. best to ensure EL is slightly ahead of CL before connecting them up. Note that using the above method means Prysm is synced from checkpoint and not genesis, so ensure that your project's requirements can still be met with this.

PS: My goerli erigon <-> prysm has been running solid for over 2 weeks, always at latest block, amazing stuff.

Maksat2022 commented 1 year ago

Hi same problem on v2.42.0 with erigons lightclient CL, restarts helps some times, but then again sync stucks after some time, --lightclient.discovery.addr=0.0.0.0 do not helping.

whidrasl commented 1 year ago

yep. It worked until hardfork. Now I'm "migrating" to prysm

suspended commented 1 year ago

Same problem with v2.43.0

jun0tpyrc commented 1 year ago

seem same for matic mainnet archive node , looks having peers but not moving forawrd

[INFO] [05-29|02:58:57.156] [8/15 HashState] ETL [1/2] Extracting    from=AccountChangeSet current_prefix=00000000027dbfa9
[INFO] [05-29|02:59:27.156] [8/15 HashState] ETL [1/2] Extracting    from=AccountChangeSet current_prefix=00000000027dc41c
[INFO] [05-29|02:59:34.392] [p2p] GoodPeers                          eth66=39 eth68=6 eth67=3
[INFO] [05-29|02:59:40.611] [txpool] stat                            pending=10000 baseFee=12112 queued=30000 alloc=4.0GB sys=8.0GB
[INFO] [05-29|02:59:57.156] [8/15 HashState] ETL [1/2] Extracting    from=AccountChangeSet current_prefix=00000000027dca22

unless restart

simonblack commented 1 year ago

I have a similar but different issue: upon interruption of the Tx Lookup Index stage on initial sync, it doesnt appear to return to the sync (the index build wasnt complete and the temp directory went from 40gb to 0 in case thats relevant, but im assuming thats just flushed to db or something during erigon shutdown)

its been about three hours of :

INFO[06-25|12:29:14.435] [p2p] GoodPeers eth68=33 eth67=28 INFO[06-25|12:29:14.446] [txpool] stat pending=0 baseFee=0 queued=24000 alloc=2.5GB sys=4.5GB INFO[06-25|12:30:14.442] [txpool] stat pending=0 baseFee=0 queued=24000 alloc=2.3GB sys=4.5GB

seems to be stuck here:

[2/15 Headers] Waiting for Consensus Layer..

edit: im assuming I don't understand the log output but i did a test via jsonrpc and the node isnt returning data from any non snapshot blocks still

simonblack commented 1 year ago

fixed by adding --internalcl, should anyone have this issue. syncing is continuing

my guess is something happens after the block execution stage, consensus layer turned off until then, maybe?

cheers

keithchew commented 1 year ago

hi @simonblack the Internal CL has issues keeping in sync reliably, and it used to be the default CL. Recently the internalcl flag was introduced to make the external CL the default. From the other issues posted, I can see that the erigon dev team has a plan to rework and improve the internal CL. For my setup, I have been running on prysm for almost 3 months now, sync is always to latest block.

simonblack commented 1 year ago

got it, thank you

so for now the foolproof approach is definitely to install an external CL right? im new to the post merge world

keithchew commented 1 year ago

I would recommend you try various options (internal, prysm, lighthouse, etc). evaluate their performance/stability, and settle for the one which meets your project requirements.

github-actions[bot] commented 1 year ago

This issue is stale because it has been open for 40 days with no activity. Remove stale label or comment, or this will be closed in 7 days.

github-actions[bot] commented 1 year ago

This issue was closed because it has been stalled for 7 days with no activity.

erigontech / erigon

Erigon client stops syncing intermittently until it's restarted #7077