Closed Adewagold closed 1 year ago
Hello. I have the same error. I started a new client with Goerli network and even if this is connected to a Lighthouse client that is in sync, the process of downloading blocks seems to be stuck at block 0.
I am deploying the node on Kubernetes with the following settings:
Erigon version: v2.40.1
Docker image: FROM docker.io/library/alpine:3.16
I build the image and everything seems to be ok. When I start the container the sync process should start and I can see the same traces but remains the txpool on block 0, and no sync process has started. I have peers too.
This is the command that I use to start the Erigon process:
"erigon --chain goerli --datadir='/home/node/.local/share/erigon' --snapshots=false \
--private.api.addr=0.0.0.0:9090 \
--metrics --metrics.addr=0.0.0.0 \
--pprof --pprof.addr=0.0.0.0 --pprof.port=6061 \
--authrpc.addr=0.0.0.0 --authrpc.port=8551 --authrpc.vhosts=* --authrpc.jwtsecret='/tmp/.jwt/jwt.hex' \
--externalcl & \
sleep 10 && \
rpcdaemon --datadir='/home/node/.local/share/erigon' --snapshot=false \
--private.api.addr=0.0.0.0:9090 \
--http.addr=0.0.0.0 --http.port=8845 --http.vhosts=* --http.corsdomain=* --http.api=eth,erigon,web3,net,debug,trace,txpool,engine \
--ws"
Is the txpool process making the node to stop? Should I remove it from the API list?
not enough logs
not enough logs
What kind of logs are useful? This ones? https://github.com/ledgerwatch/erigon#logging
erigon has only one logs now, just usual output. first message has some: https://github.com/ledgerwatch/erigon/issues/7077#issue-1619047935 - but it's not enough.
erigon has only one logs now, just usual output. first message has some: #7077 (comment) - but it's not enough.
I have similar traces to @Adewagold . The process is showing always that txpool has no data and is not syncing. But I can't find any error on logs.
[WARN] [03-14|06:48:37.556] no log dir set, console logging only
[INFO] [03-14|06:48:37.557] Starting metrics server addr=http://0.0.0.0:6060/debug/metrics/prometheus
[INFO] [03-14|06:48:37.557] Starting pprof server cpu="go tool pprof -lines -http=: http://0.0.0.0:6061/debug/pprof/profile?seconds=20" heap="go tool pprof -lines -http=: http://0.0.0.0:6061/debug/pprof/heap"
[WARN] [03-14|06:48:37.570] no log dir set, console logging only
[INFO] [03-14|06:48:37.570] Build info git_branch=heads/v2.40.1 git_tag=v2.40.1 git_commit=94b4cee5a898a47b9241ce824622af6350e5dc5b
[INFO] [03-14|06:48:37.570] Starting Erigon on Görli testnet...
[INFO] [03-14|06:48:37.575] Maximum peer count ETH=100 total=100
[INFO] [03-14|06:48:37.575] starting HTTP APIs APIs=eth,erigon,engine
[INFO] [03-14|06:48:37.575] torrent verbosity level=WRN
[INFO] [03-14|06:48:39.678] Set global gas cap cap=50000000
[INFO] [03-14|06:48:39.680] [Downloader] Runnning with ipv6-enabled=true ipv4-enabled=true download.rate=16mb upload.rate=4mb
[INFO] [03-14|06:48:39.680] Opening Database label=chaindata path=/home/node/.local/share/erigon/chaindata
[INFO] [03-14|06:48:39.697] Initialised chain configuration config="{ChainID: 5, Homestead: 0, DAO: <nil>, DAO Support: true, Tangerine Whistle: 0, Spurious Dragon: 0, Byzantium: 0, Constantinople: 0, Petersburg: 0, Istanbul: 1561651, Muir Glacier: <nil>, Berlin: 4460644, London: 5062605, Arrow Glacier: <nil>, Gray Glacier: <nil>, Terminal Total Difficulty: 10790000, Merge Netsplit: <nil>, Shanghai: 1678832736, Cancun: <nil>, Sharding: <nil>, Prague: <nil>, Engine: clique}" genesis=0xbf7e331f7f7c1dd2e05159666b3bf8bc7a8a3a9eb1d518969eab529dd9b88c1a
[INFO] [03-14|06:48:39.697] Effective prune_flags= snapshot_flags= history.v3=false
[INFO] [03-14|06:48:39.710] Initialising Ethereum protocol network=5
[INFO] [03-14|06:48:39.733] Starting private RPC server on=0.0.0.0:9090
[INFO] [03-14|06:48:39.733] new subscription to logs established
[INFO] [03-14|06:48:39.734] rpc filters: subscribing to Erigon events
[INFO] [03-14|06:48:39.734] new subscription to newHeaders established
[INFO] [03-14|06:48:39.734] New txs subscriber joined
[INFO] [03-14|06:48:39.735] Reading JWT secret path=/tmp/.jwt/jwt.hex
[INFO] [03-14|06:48:39.735] HTTP endpoint opened for Engine API url=[::]:8551 ws=true ws.compression=true
[INFO] [03-14|06:48:39.735] HTTP endpoint opened url=127.0.0.1:8545 ws=false ws.compression=true grpc=false
[INFO] [03-14|06:48:39.746] [txpool] Started
[INFO] [03-14|06:48:39.774] Started P2P networking version=68 self=enode://635862a06b72f6543a11e89fe5cb89f9029f223ffe37f3388bb7b1e97495bca21d2b25471ed421d50b1416c8614b5bdfb3e8cd7cb25114a92046c4d6a5a90607@127.0.0.1:30304 name=erigon/v2.40.1-stable-94b4cee5/linux-amd64/go1.19.7
[INFO] [03-14|06:48:39.783] Started P2P networking version=67 self=enode://635862a06b72f6543a11e89fe5cb89f9029f223ffe37f3388bb7b1e97495bca21d2b25471ed421d50b1416c8614b5bdfb3e8cd7cb25114a92046c4d6a5a90607@127.0.0.1:30303 name=erigon/v2.40.1-stable-94b4cee5/linux-amd64/go1.19.7
[INFO] [03-14|06:48:39.784] [2/15 Headers] Waiting for Consensus Layer...
[WARN] [03-14|06:48:47.493] no log dir set, console logging only
[WARN] [03-14|06:48:47.493] no log dir set, console logging only
[INFO] [03-14|06:48:47.495] DB schemas compatible reader=6.0.0 database=6.0.0
[INFO] [03-14|06:48:47.496] Use --snapshots=false
[INFO] [03-14|06:48:47.496] [snapshots] Blocks Stat blocks=0k indices=0k alloc=20.6MB sys=35.9MB
[INFO] [03-14|06:48:47.498] [snapshots] Blocks Stat blocks=0k indices=0k alloc=20.8MB sys=35.9MB
[INFO] [03-14|06:48:47.516] rpc filters: subscribing to Erigon events
[INFO] [03-14|06:48:47.518] new subscription to newHeaders established
[INFO] [03-14|06:48:47.518] HTTP endpoint opened url=[::]:8845 ws=true ws.compression=false grpc=false
[INFO] [03-14|06:48:47.518] interfaces compatible remote_db= client=6.2.0 server=6.2.0
[INFO] [03-14|06:48:47.519] New txs subscriber joined
[INFO] [03-14|06:48:47.519] interfaces compatible remote_service=eth_backend client=3.2.0 server=3.2.0
[INFO] [03-14|06:48:47.519] [snapshots] Blocks Stat blocks=0k indices=0k alloc=21.7MB sys=39.9MB
[INFO] [03-14|06:48:47.520] interfaces compatible remote_service=mining client=1.0.0 server=1.0.0
[INFO] [03-14|06:48:47.520] interfaces compatible remote_service=tx_pool client=1.0.0 server=1.0.0
[INFO] [03-14|06:49:39.735] [txpool] stat block=0 pending=0 baseFee=0 queued=22 alloc=97.1MB sys=123.7MB
[INFO] [03-14|06:50:39.710] [p2p] GoodPeers eth66=2 eth67=7 eth68=6
[INFO] [03-14|06:50:39.735] [txpool] stat block=0 pending=0 baseFee=0 queued=22 alloc=152.5MB sys=182.6MB
[INFO] [03-14|06:51:39.735] [txpool] stat block=0 pending=0 baseFee=0 queued=22 alloc=149.7MB sys=211.8MB
[INFO] [03-14|06:52:39.710] [p2p] GoodPeers eth67=11 eth66=1 eth68=11
[INFO] [03-14|06:52:39.734] [txpool] stat block=0 pending=0 baseFee=0 queued=22 alloc=185.3MB sys=238.0MB
[INFO] [03-14|06:53:39.734] [txpool] stat block=0 pending=0 baseFee=0 queued=22 alloc=128.0MB sys=266.4MB
[INFO] [03-14|06:54:39.711] [p2p] GoodPeers eth67=13 eth66=2 eth68=16
[INFO] [03-14|06:54:39.735] [txpool] stat block=0 pending=0 baseFee=0 queued=22 alloc=207.1MB sys=270.7MB
[INFO] [03-14|06:55:39.734] [txpool] stat block=0 pending=0 baseFee=0 queued=22 alloc=186.0MB sys=308.9MB
[INFO] [03-14|06:56:39.710] [p2p] GoodPeers eth66=2 eth67=15 eth68=18
[INFO] [03-14|06:56:39.734] [txpool] stat block=0 pending=0 baseFee=0 queued=22 alloc=206.2MB sys=309.2MB
Same issue for us, but even restart is not fixing it, it's still stuck here:
{"jsonrpc":"2.0","id":1,"result":{"currentBlock":"0xf4263a","highestBlock":"0x1006711","stages":[{"stage_name":"Snapshots","block_number":"0xf4263a"},{"stage_name":"Headers","block_number":"0x1006711"},{"stage_name":"BlockHashes","block_number":"0x1006711"},{"stage_name":"Bodies","block_number":"0x1006711"},{"stage_name":"Senders","block_number":"0x1006711"},{"stage_name":"Execution","block_number":"0xf8dd17"},{"stage_name":"Translation","block_number":"0x0"},{"stage_name":"HashState","block_number":"0xf4263a"},{"stage_name":"IntermediateHashes","block_number":"0xf4263a"},{"stage_name":"AccountHistoryIndex","block_number":"0xf4263a"},{"stage_name":"StorageHistoryIndex","block_number":"0xf4263a"},{"stage_name":"LogIndex","block_number":"0xf4263a"},{"stage_name":"CallTraces","block_number":"0xf4263a"},{"stage_name":"TxLookup","block_number":"0xf4263a"},{"stage_name":"Finish","block_number":"0xf4263a"}]}}
any update here and/or new release? we have an erigon node that is still unable to sync to tip.
Just to clarify, this only affects us on Goerli. We haven't seen this behaviour configuring Ropsten (until it was deprecated) or in Mainnet.
same for me for some months already (since v2.39 or smth like that). Looks like not enough nodes for Consensus layer (i'm using internal consensus client):
warning: read logs from bottom to top.
| | [INFO] [03-16\|11:54:25.048] [LightClient] P2P peers=0
| | [INFO] [03-16\|11:54:25.048] [txpool] stat block=8664571 pending=3396 baseFee=677 queued=3228 alloc=1.4GB sys=1.6GB
| | [INFO] [03-16\|11:53:25.050] [txpool] stat block=8664571 pending=3238 baseFee=638 queued=3062 alloc=1.5GB sys=1.6GB
| | [INFO] [03-16\|11:53:25.048] [LightClient] P2P peers=0
| | [INFO] [03-16\|11:53:14.283] [p2p] GoodPeers eth68=26 eth67=20
| | [INFO] [03-16\|11:52:25.048] [txpool] stat block=8664571 pending=3032 baseFee=629 queued=2354 alloc=1.0GB sys=1.6GB
| | [INFO] [03-16\|11:52:25.048] [LightClient] P2P peers=0
| | [INFO] [03-16\|11:51:25.050] [txpool] stat block=8664571 pending=2822 baseFee=619 queued=2205 alloc=1.3GB sys=1.6GB
| | [INFO] [03-16\|11:51:25.048] [LightClient] P2P peers=1
| | [INFO] [03-16\|11:51:14.284] [p2p] GoodPeers eth67=8 eth68=21
| | [INFO] [03-16\|11:50:25.048] [LightClient] P2P peers=0
| | [INFO] [03-16\|11:50:25.048] [txpool] stat block=8664571 pending=2621 baseFee=562 queued=2062 alloc=874.4MB sys=1.6GB
| | [INFO] [03-16\|11:49:25.050] [txpool] stat block=8664571 pending=2432 baseFee=505 queued=1888 alloc=983.9MB sys=1.5GB
| | [INFO] [03-16\|11:49:25.049] [LightClient] P2P peers=0
| | [INFO] [03-16\|11:49:14.283] [p2p] GoodPeers eth67=4 eth68=12
| | [INFO] [03-16\|11:48:25.049] [LightClient] P2P peers=0
| | [INFO] [03-16\|11:48:25.048] [txpool] stat block=8664571 pending=258 baseFee=146 queued=303 alloc=954.5MB sys=1.5GB
| | [INFO] [03-16\|11:47:25.064] [2/15 Headers] Waiting for Consensus Layer...
| | [INFO] [03-16\|11:47:25.063] [snapshots] Blocks Stat blocks=8574k indices=8574k alloc=1.2GB sys=1.5GB
| | [INFO] [03-16\|11:47:25.060] [1/15 Snapshots] Fetching torrent files metadata
| | [INFO] [03-16\|11:47:25.058] [txpool] Started
...
pending blocks count increasing, but no blocks downloaded
Yesterday my patience was over because restarts didn't help for a long time and I've created an issue :-D https://github.com/ledgerwatch/erigon/issues/7114
I have had the same issue for quite a while now, since v2.37, and decided to look into this again today. After enabling the firewall to allow the TCP and UDP ports for the light client, I can now see healthy looking peers count, and both goerli and mainnet sync is much better. Don't know if this is co-incidence (I tweaked a few other things), so will keep monitoring. In the meantime, perhaps you can also give this a go and let us know if it helps.
@keithchew which ports needs to be opened for Goerli? (ports for incoming connections (listening ports)?)
port 4000/4001? I've not exposing this ports even on mainnet, but lightclient has 16-18 peers on mainnet...
setting --lightclient.discovery.port=10500 and --lightclient.discovery.tcpport=10500 and opening 10500 tcp/udp to public doesn't solve problem on Goerli (peers=0 almost always)
| | [INFO] [03-16\|15:19:11.664] [p2p] GoodPeers eth67=34 eth68=35
| | [INFO] [03-16\|15:19:01.178] [txpool] stat block=8665467 pending=1676 baseFee=1214 queued=8209 alloc=1.1GB sys=2.9GB
| | [INFO] [03-16\|15:19:01.177] [LightClient] P2P peers=0
| | [INFO] [03-16\|15:18:01.178] [txpool] stat block=8665467 pending=1391 baseFee=1148 queued=8161 alloc=1.6GB sys=2.9GB
| | [INFO] [03-16\|15:18:01.177] [LightClient] P2P peers=1
| | [INFO] [03-16\|15:17:11.665] [p2p] GoodPeers eth68=35 eth67=34
| | [INFO] [03-16\|15:17:01.179] [txpool] stat block=8665467 pending=1150 baseFee=1114 queued=8096 alloc=1.2GB sys=2.9GB
| | [INFO] [03-16\|15:17:01.177] [LightClient] P2P peers=3
| | [INFO] [03-16\|15:16:01.178] [txpool] stat block=8665467 pending=936 baseFee=1039 queued=8039 alloc=1.7GB sys=2.9GB
| | [INFO] [03-16\|15:16:01.178] [LightClient] P2P peers=0
Did u check the network traffic to see that they are used? Sticking to erigon's port convention, I would keep the ports separate, eg 10500 and 10501 in your example. I have not checked the p2p protocol implementation, but back in the days, it is common for p2p protocols to use both types when fallback is needed. So, just to eliminate that possibility, keep them separate while troubleshooting your issue.
Here's are my Goerli logs (grep'ing just peers) after starting up for 30mins from cold boot::
[INFO] [03-16|18:44:53.978] [p2p] GoodPeers eth67=33 eth68=33
[INFO] [03-16|18:45:16.364] [LightClient] P2P peers=5
[INFO] [03-16|18:46:16.364] [LightClient] P2P peers=6
[INFO] [03-16|18:46:53.978] [p2p] GoodPeers eth67=33 eth68=33
[INFO] [03-16|18:47:16.656] [LightClient] P2P peers=5
[INFO] [03-16|18:48:16.363] [LightClient] P2P peers=5
[INFO] [03-16|18:48:53.978] [p2p] GoodPeers eth67=33 eth68=33
[INFO] [03-16|18:49:16.363] [LightClient] P2P peers=5
[INFO] [03-16|18:50:16.366] [LightClient] P2P peers=7
[INFO] [03-16|18:50:53.978] [p2p] GoodPeers eth67=33 eth68=33
[INFO] [03-16|18:51:16.363] [LightClient] P2P peers=7
[INFO] [03-16|18:52:16.364] [LightClient] P2P peers=10
[INFO] [03-16|18:52:53.978] [p2p] GoodPeers eth67=33 eth68=33
[INFO] [03-16|18:53:16.369] [LightClient] P2P peers=11
[INFO] [03-16|18:54:16.364] [LightClient] P2P peers=12
PS: peers count goes up to around 25 later on, and mainnet, around 130 peers...
@keithchew if you opened ports for lightclient, then what's your setting for --lightclient.discovery.addr
? Because by default it is '127.0.0.1' and it not listening for external world connections
Mine is set to 0.0.0.0 and sitting within a docker container. External traffic is port forwarded to the docker host server. You can use Wireshark or tcpdump to verify traffic is flowing through those ports.
finally thanks, @keithchew! Exposing lightclient ports solves the problem. --lightclient.discovery.addr
must be exposed too (0.0.0.0 for example) :-) Even if ports are non-default (like 10500/10501) lightclient peers count is quickly incrementing :-)
Perfect, you are most welcome! Yup, definitely a great feeling watching erigon sync up and keep up with the chain.
I have same issue
[INFO] [03-23|18:23:09.578] [p2p] GoodPeers eth67=30 eth66=3 eth68=33
[INFO] [03-23|18:23:55.310] [txpool] stat block=16865544 pending=10000 baseFee=30000 queued=30000 alloc=4.2GB sys=8.6GB
[INFO] [03-23|18:24:55.310] [txpool] stat block=16865544 pending=10000 baseFee=30000 queued=30000 alloc=4.5GB sys=8.6GB
[INFO] [03-23|18:25:09.579] [p2p] GoodPeers eth67=30 eth66=3 eth68=33
[INFO] [03-23|18:25:55.310] [txpool] stat block=16865544 pending=10000 baseFee=30000 queued=30000 alloc=3.1GB sys=8.6GB
[INFO] [03-23|18:26:55.310] [txpool] stat block=16865544 pending=10000 baseFee=30000 queued=30000 alloc=3.4GB sys=8.6GB
[INFO] [03-23|18:27:09.579] [p2p] GoodPeers eth66=3 eth67=30 eth68=33
[INFO] [03-23|18:27:55.310] [txpool] stat block=16865544 pending=10000 baseFee=30000 queued=30000 alloc=3.0GB sys=8.6GB
[INFO] [03-23|18:28:55.309] [txpool] stat block=16865544 pending=10000 baseFee=30000 queued=30000 alloc=3.4GB sys=8.6GB
[INFO] [03-23|18:29:09.579] [p2p] GoodPeers eth68=33 eth67=30 eth66=3
[INFO] [03-23|18:29:56.196] [txpool] stat block=16865544 pending=10000 baseFee=30000 queued=30000 alloc=3.0GB sys=8.6GB
[INFO] [03-23|18:30:55.318] [txpool] stat block=16865544 pending=10000 baseFee=30000 queued=30000 alloc=4.2GB sys=8.6GB
[INFO] [03-23|18:31:09.583] [p2p] GoodPeers eth67=30 eth66=3 eth68=33
Just stuck. erigon:v2.40.1
run options: --http --http.api=engine,eth,erigon,web3,net,debug,trace,txpool,shh --bodies.cache=21474836480
last logs before stuck
[INFO] [03-20|01:13:38.671] RPC Daemon notified of new headers from=16865540 to=16865541 hash=0x145ef732c53a748ebeaf623c462f2a568abc556d55e18695523173a161f624c8 header sending=15.754µs log sending=257ns
[INFO] [03-20|01:13:40.585] [2/15 Headers] Waiting for Consensus Layer...
[WARN] [03-20|01:13:40.840] Could not send beacon block to ETH1 err="missing attestation data"
[INFO] [03-20|01:13:48.471] [2/15 Headers] Handling new payload height=16865542 hash=0x8608eac1159aee186155a726056ef75c148f821a2cf68c369325b5c882ce8251
[INFO] [03-20|01:13:49.350] [2/15 Headers] Waiting for Consensus Layer...
[INFO] [03-20|01:13:49.851] [2/15 Headers] Fork choice update: flushing in-memory state (built by previous newPayload)
[INFO] [03-20|01:13:50.855] Commit cycle in=225.823477ms
[INFO] [03-20|01:13:50.855] Timings (slower than 50ms) Headers=639ms CallTraces=128ms AccountHistoryIndex=92ms StorageHistoryIndex=196ms LogIndex=105ms TxLookup=116ms
[INFO] [03-20|01:13:50.855] Tables PlainState=88.0GB AccountChangeSet=184.4GB StorageChangeSet=281.3GB BlockTransaction=9.7GB TransactionLog=546.2GB FreeList=35.2MB ReclaimableSpace=35.2GB
[INFO] [03-20|01:13:50.855] RPC Daemon notified of new headers from=16865541 to=16865542 hash=0x8608eac1159aee186155a726056ef75c148f821a2cf68c369325b5c882ce8251 header sending=17.543µs log sending=221ns
[INFO] [03-20|01:13:53.421] [2/15 Headers] Waiting for Consensus Layer...
[WARN] [03-20|01:13:53.674] Could not send beacon block to ETH1 err="missing attestation data"
[INFO] [03-20|01:13:55.307] [LightClient] P2P peers=2
[INFO] [03-20|01:13:55.320] [txpool] stat block=16865542 pending=6629 baseFee=6643 queued=29751 alloc=3.5GB sys=8.6GB
[INFO] [03-20|01:14:00.334] [2/15 Headers] Handling new payload height=16865543 hash=0xef6b83881fff7b427f9b0264c282bd0c6ddd79948fc12a9df2d7bf5306d52e3c
[INFO] [03-20|01:14:01.368] [2/15 Headers] Waiting for Consensus Layer...
[INFO] [03-20|01:14:01.868] [2/15 Headers] Fork choice update: flushing in-memory state (built by previous newPayload)
[INFO] [03-20|01:14:02.974] Commit cycle in=299.037612ms
[INFO] [03-20|01:14:02.974] Timings (slower than 50ms) Headers=671ms CallTraces=124ms AccountHistoryIndex=71ms StorageHistoryIndex=221ms LogIndex=110ms TxLookup=107ms
[INFO] [03-20|01:14:02.974] Tables PlainState=88.0GB AccountChangeSet=184.4GB StorageChangeSet=281.3GB BlockTransaction=9.7GB TransactionLog=546.2GB FreeList=35.2MB ReclaimableSpace=35.2GB
[INFO] [03-20|01:14:02.974] RPC Daemon notified of new headers from=16865542 to=16865543 hash=0xef6b83881fff7b427f9b0264c282bd0c6ddd79948fc12a9df2d7bf5306d52e3c header sending=17.971µs log sending=405ns
[INFO] [03-20|01:14:05.541] [2/15 Headers] Waiting for Consensus Layer...
[INFO] [03-20|01:14:12.370] [2/15 Headers] Handling new payload height=16865544 hash=0x5a7f473d8f38a62877ba113a20a20dadc4f62b59c7b44075d478aef7b071981c
[INFO] [03-20|01:14:13.801] [2/15 Headers] Waiting for Consensus Layer...
[INFO] [03-20|01:14:14.302] [2/15 Headers] Fork choice update: flushing in-memory state (built by previous newPayload)
[INFO] [03-20|01:14:15.723] Commit cycle in=201.655616ms
[INFO] [03-20|01:14:15.723] Timings (slower than 50ms) Headers=717ms CallTraces=274ms AccountHistoryIndex=223ms StorageHistoryIndex=124ms LogIndex=170ms TxLookup=207ms
[INFO] [03-20|01:14:15.723] Tables PlainState=88.0GB AccountChangeSet=184.4GB StorageChangeSet=281.3GB BlockTransaction=9.7GB TransactionLog=546.2GB FreeList=35.2MB ReclaimableSpace=35.2GB
[INFO] [03-20|01:14:15.723] RPC Daemon notified of new headers from=16865543 to=16865544 hash=0x5a7f473d8f38a62877ba113a20a20dadc4f62b59c7b44075d478aef7b071981c header sending=24.134µs log sending=243ns
[INFO] [03-20|01:14:19.056] [2/15 Headers] Waiting for Consensus Layer...
[WARN] [03-20|01:14:19.356] Could not send beacon block to ETH1 err="missing attestation data"
[INFO] [03-20|01:14:19.556] [Lightclient] Fetching Sync Committee Period period=737
[INFO] [03-20|01:14:55.309] [txpool] stat block=16865544 pending=6976 baseFee=6669 queued=29755 alloc=3.9GB sys=8.6GB
[INFO] [03-20|01:15:09.579] [p2p] GoodPeers eth66=5 eth67=27 eth68=32
[INFO] [03-20|01:15:55.317] [txpool] stat block=16865544 pending=7574 baseFee=6718 queued=29787 alloc=3.3GB sys=8.6GB
[INFO] [03-20|01:16:55.306] [txpool] stat block=16865544 pending=8003 baseFee=6746 queued=29856 alloc=3.6GB sys=8.6GB
[INFO] [03-20|01:17:09.579] [p2p] GoodPeers eth67=27 eth68=32 eth66=5
Your logs above look like a different issue, in particular it stops after:
[INFO] [03-20|01:14:19.556] [Lightclient] Fetching Sync Committee Period period=737
I have reported this in #7162
Hello everyone,
It appears I have the same issue, despite already setting all the necessary values
--lightclient.discovery.port 10500 --lightclient.discovery.tcpport 10501 --lightclient.discovery.addr 0.0.0.0
Log:
Mar 27 20:40:06 HOST_NAME erigon[836115]: [INFO] [03-27|20:40:06.685] [Checkpoint Sync] P2P peers=0
Mar 27 20:40:16 HOST_NAME erigon[836115]: [INFO] [03-27|20:40:16.685] [Checkpoint Sync] P2P peers=0
Mar 27 20:40:26 HOST_NAME erigon[836115]: [INFO] [03-27|20:40:26.685] [Checkpoint Sync] P2P peers=0
Mar 27 20:40:36 HOST_NAME erigon[836115]: [INFO] [03-27|20:40:36.686] [Checkpoint Sync] P2P peers=0
Mar 27 20:40:46 HOST_NAME erigon[836115]: [INFO] [03-27|20:40:46.685] [Checkpoint Sync] P2P peers=0
Mar 27 20:40:48 HOST_NAME erigon[836115]: [INFO] [03-27|20:40:48.308] Store Initialized successfully slot=6093664 root=fc3b0af3ad33dcded22e7b5d9492444cf99e85f2ee80d0cc
521b60b4facf2d21
Mar 27 20:40:48 HOST_NAME erigon[836115]: [INFO] [03-27|20:40:48.308] rpc filters: subscribing to Erigon events
Mar 27 20:40:48 HOST_NAME erigon[836115]: [INFO] [03-27|20:40:48.308] [Lightclient] Fetching Sync Committee Period period=743
Mar 27 20:40:48 HOST_NAME erigon[836115]: [INFO] [03-27|20:40:48.308] New txs subscriber joined
Mar 27 20:40:48 HOST_NAME erigon[836115]: [INFO] [03-27|20:40:48.308] new subscription to newHeaders established
Mar 27 20:40:48 HOST_NAME erigon[836115]: [INFO] [03-27|20:40:48.308] [LightClient Gossip] Started Gossip
Mar 27 20:40:48 HOST_NAME erigon[836115]: [INFO] [03-27|20:40:48.308] Reading JWT secret path=/data/ethereum/jwt.hex
Mar 27 20:40:48 HOST_NAME erigon[836115]: [INFO] [03-27|20:40:48.309] HTTP endpoint opened for Engine API url=127.0.0.1:8551 ws=true ws.compression=true
Mar 27 20:40:48 HOST_NAME erigon[836115]: [INFO] [03-27|20:40:48.309] HTTP endpoint opened url=[::]:8545 ws=true ws.compression=true grpc=false
Mar 27 20:40:48 HOST_NAME erigon[836115]: [INFO] [03-27|20:40:48.312] Started P2P networking version=68 self=enode://a00e2279dd9679269696c1b9d83d3da2b30e0bb5070c99899fe4114a71f914df73c45fcdf443989d3cef5ef0f0184671de29d45a32ad561ec3f8d871bb09edc0@127.0.0.1:0 name=erigon/v2.41.0-stable-52d5f37b/linux-amd64/go1.19.4
Mar 27 20:40:48 HOST_NAME erigon[836115]: [INFO] [03-27|20:40:48.314] Started P2P networking version=67 self=enode://a00e2279dd9679269696c1b9d83d3da2b30e0bb507
0c99899fe4114a71f914df73c45fcdf443989d3cef5ef0f0184671de29d45a32ad561ec3f8d871bb09edc0@127.0.0.1:0 name=erigon/v2.41.0-stable-52d5f37b/linux-amd64/go1.19.4
Mar 27 20:40:48 HOST_NAME erigon[836115]: [INFO] [03-27|20:40:48.321] [1/15 Snapshots] Fetching torrent files metadata
Mar 27 20:40:48 HOST_NAME erigon[836115]: [INFO] [03-27|20:40:48.325] [snapshots] Blocks Stat blocks=16767k indices=16767k alloc=2.8GB sys=3.7GB
Mar 27 20:40:48 HOST_NAME erigon[836115]: [INFO] [03-27|20:40:48.377] [2/15 Headers] Waiting for Consensus Layer...
Mar 27 20:41:48 HOST_NAME erigon[836115]: [INFO] [03-27|20:41:48.197] [p2p] GoodPeers eth66=9 eth67=3 eth68=4
Mar 27 20:41:48 HOST_NAME erigon[836115]: [INFO] [03-27|20:41:48.309] [LightClient] P2P peers=0
Mar 27 20:42:48 HOST_NAME erigon[836115]: [INFO] [03-27|20:42:48.308] [LightClient] P2P peers=0
Mar 27 20:43:48 HOST_NAME erigon[836115]: [INFO] [03-27|20:43:48.197] [p2p] GoodPeers eth66=10 eth67=12 eth68=8
Mar 27 20:43:48 HOST_NAME erigon[836115]: [INFO] [03-27|20:43:48.308] [LightClient] P2P peers=0
Mar 27 20:44:48 HOST_NAME erigon[836115]: [INFO] [03-27|20:44:48.309] [LightClient] P2P peers=1
Mar 27 20:45:48 HOST_NAME erigon[836115]: [INFO] [03-27|20:45:48.197] [p2p] GoodPeers eth67=18 eth66=10 eth68=13
Mar 27 20:45:48 HOST_NAME erigon[836115]: [INFO] [03-27|20:45:48.308] [LightClient] P2P peers=1
Mar 27 20:46:48 HOST_NAME erigon[836115]: [INFO] [03-27|20:46:48.309] [LightClient] P2P peers=1
Mar 27 20:47:48 HOST_NAME erigon[836115]: [INFO] [03-27|20:47:48.197] [p2p] GoodPeers eth67=22 eth66=10 eth68=13
Mar 27 20:47:48 HOST_NAME erigon[836115]: [INFO] [03-27|20:47:48.309] [LightClient] P2P peers=0
We are using Erigon 2.41.0-stable
, the machine is an EC2 instance running inside of a VPC in AWS. Security groups allow access to both port 10500 and 10501 (UDP and TCP). I verified that the traffic works by nc
-ing from my machine into the public IP address of the instance.
I believe the network is somewhat similar to what @keithchew has.
The problem only started since 2.37
, earlier the node worked perfectly.
Any help is greatly appreciated.
I can confirm that although exposing the ports on the firewall does help gather peers on startup, I can never get my nodes (goerli and mainnet) lasting (keeping up with the latest block) more than a few days. They will either (1) get stuck after "Fetching Sync Committee Period" (in #7162), or (2) peers count dropping to 0, as described by @tanmng.
Currently looking at running a dedicated external CL, to see if we can narrow this issue down to erigon's internal CL, or something else within erigon's EL logic.
Thanks for the confirmation @keithchew. In that case I'll try to enable lighthouse for my node and see if that help resolves the issue. Will be a long time to wait for lighthouse to sync and catch up though.
Yeah, I started up prysm on both goerli and mainnet last Friday, and they are around 50% sycned. Will post any updates here, and would also be good if you can report your progress with lighthouse.
I'm in GKE container and maybe need something like this? https://docs.prylabs.network/docs/prysm-usage/p2p-host-ip#broadcast-your-public-ip-address. Easy to expose any port on a public IP but there doesn't seem to be a way to tell the native Erigon CL about an address other than the ones it wants to bind directly.
https://github.com/ethereum/consensus-specs/blob/dev/specs/phase0/p2p-interface.md
"Nodes operating behind a NAT, or otherwise undialable by default (e.g. container runtime, firewall, etc.), MUST have their infrastructure configured to enable inbound traffic on the announced public listening endpoint."
Is the "announced public listening endpoint" configurable in Erigon CL, separate from the service bound IP?
From my observation, I only need to expose the p2p tcp/udp ports on the firewall. When exposed correctly, you can observe (using tcpdump/wireshark/etc) traffic on the ports, and the peer count in erigon/prysm will be healthy. For example in prysm, without ports exposed you will only see 1 sided peer count (ie you are not a seeder), but when ports exposed you will see bi-directional peers.
From my observation, I only need to expose the p2p tcp/udp ports on the firewall. When exposed correctly, you can observe (using tcpdump/wireshark/etc) traffic on the ports, and the peer count in erigon/prysm will be healthy. For example in prysm, without ports exposed you will only see 1 sided peer count (ie you are not a seeder), but when ports exposed you will see bi-directional peers.
Doesn't work like that in a K8s environment.
Even if you you could announce your IP in a swarm, how do you control/map the port assignments? I currently have a machines (combination of baremetal and docker containers), and to keep things simple for now, I am NAT'ing unique ports across the cluster. Eventually I will be looking at mapping to different source/dest ports, once I have stable setup (need to establish a baseline before making improvements).
When you get it going, please post here, it would certainly be useful knowledge for all.
Even if you you could announce your IP in a swarm, how do you control/map the port assignments? I currently have a machines (combination of baremetal and docker containers), and to keep things simple for now, I am NAT'ing unique ports across the cluster. Eventually I will be looking at mapping to different source/dest ports, once I have stable setup.
When you get it going, please post here, it would certainly be useful knowledge for all.
Without getting into details of K8s, it's just a situation where the Erigon container doesn't know its public IP and isn't binding to its public IP, so I would have to give that info at runtime. You're right, it doesn't know its public port either so that would be part of it. --lightclient.discovery.addr
seems close but Erigon is actually binding to this IP so it's not the config I need.
afaiks, the announcement IP is to make the peer more discoverable, but not required for p2p to function. If that is the case, a good chunk of users would not be able to get their nodes running.
afaiks, the announcement IP is to make the peer more discoverable, but not required for p2p to function. If that is the case, a good chunk of users would not be able to get their nodes running.
I'm also assuming that [LightClient] P2P peers
should be well above the values of 0 to 3 which are typical of my node (Goerli). There were apparently multiple regressions in v2.41.0 in this area so maybe https://github.com/ledgerwatch/erigon/releases/tag/v2.42.0 is going to work better with my limitations.
Update: I've concluded that Erigon CL currently requires host networking to be well-connected properly. There's currently no way to tell it about an IP for discoverability that it isn't binding directly. It's bad practice and disallowed in GKE to say network=host
for the Docker container, so it's a dead end in my environment. Prysm has a flag that appears to address this issue, so I assume it's a matter of feature add in Erigon CL. To be clear, it can find a peer or two without discoverability but when the peer count is <5 the node performance is very bad and often stuck.
I'm still seeing same issue even with git_branch=HEAD git_tag=v2.42.0-dirty git_commit=beb97784d43ece5acde365a74efe8763692ebdba
[INFO] [03-29|14:23:52.550] [2/15 Headers] Waiting for Consensus Layer...
[INFO] [03-29|14:24:01.019] [2/15 Headers] Handling new payload height=16933519 hash=0xc3e8476632bbb842e1e781e2c5823ca41ba18ec31ac977fb5f5c60368459bd92
[INFO] [03-29|14:24:01.766] [2/15 Headers] Waiting for Consensus Layer...
[INFO] [03-29|14:24:02.266] [2/15 Headers] Fork choice update: flushing in-memory state (built by previous newPayload)
[INFO] [03-29|14:24:02.921] Commit cycle in=56.565277ms
[INFO] [03-29|14:24:02.921] Timings (slower than 50ms) Headers=566ms CallTraces=121ms AccountHistoryIndex=86ms StorageHistoryIndex=142ms LogIndex=84ms TxLookup=97ms
[INFO] [03-29|14:24:02.921] Tables PlainState=88.8GB AccountChangeSet=185.6GB StorageChangeSet=283.3GB BlockTransaction=10.8GB TransactionLog=551.7GB FreeList=143.1MB ReclaimableSpace=143.1GB
[INFO] [03-29|14:24:02.921] RPC Daemon notified of new headers from=16933518 to=16933519 hash=0xc3e8476632bbb842e1e781e2c5823ca41ba18ec31ac977fb5f5c60368459bd92 header sending=12.998µs log sending=211ns
[INFO] [03-29|14:24:04.893] [2/15 Headers] Waiting for Consensus Layer...
Hi @tanmng, how are you going with Lighthouse? I have an update. My goerli prysm just synced up, and Erigon <-> Prysm is working super well so far. Always on the latest block and peers count are all healthy. Waiting for mainnet prysm to sync up and will post update here.
@winchell, it is most certainly not an IP announcement issue with LightClient. My prysm instance is running without IP announcement and both inbound/outbound peers are healthy. It is more likely that there are less users running Erigon's LightClient or there are bugs/instability within LightClient that is the issue. Until LightClient is more performant/stable, I am sticking to an external CL for now.
@winchell, it is most certainly not an IP announcement issue with LightClient. My prysm instance is running without IP announcement and both inbound/outbound peers are healthy. It is more likely that there are less users running Erigon's LightClient or there are bugs/instability within in LightClient that is the issue. Until LightClient is more performant/stable, I am sticking to an external CL for now.
Good to know. Yeah, everything was fine when I was running Lighthouse. Same bad experience on v2.42.0 here.
almost 1 month and still no solution?
hi @tanmng my prysm mainnet has synced up and Erigon <-> prysm for mainnet works beautifully. My initial attempt was to wait for prysm to sync from genesis, which can take a while. After learning more about Erigon and Prysm, I have found a faster way to get both EL and CL synced up:
1. Run erigon with internal CL until synced to head
2. Run prysm with snapshot sync until synced to head
3. Restart erigon with externalcl flag
4. Restart prysm without checkpoint sync and connect to erigon
Just need to be careful when cutting over from steps 1 & 2 to 3 & 4. best to ensure EL is slightly ahead of CL before connecting them up. Note that using the above method means Prysm is synced from checkpoint and not genesis, so ensure that your project's requirements can still be met with this.
PS: My goerli erigon <-> prysm has been running solid for over 2 weeks, always at latest block, amazing stuff.
Hi same problem on v2.42.0 with erigons lightclient CL, restarts helps some times, but then again sync stucks after some time, --lightclient.discovery.addr=0.0.0.0 do not helping.
yep. It worked until hardfork. Now I'm "migrating" to prysm
Same problem with v2.43.0
seem same for matic mainnet archive node , looks having peers but not moving forawrd
[INFO] [05-29|02:58:57.156] [8/15 HashState] ETL [1/2] Extracting from=AccountChangeSet current_prefix=00000000027dbfa9
[INFO] [05-29|02:59:27.156] [8/15 HashState] ETL [1/2] Extracting from=AccountChangeSet current_prefix=00000000027dc41c
[INFO] [05-29|02:59:34.392] [p2p] GoodPeers eth66=39 eth68=6 eth67=3
[INFO] [05-29|02:59:40.611] [txpool] stat pending=10000 baseFee=12112 queued=30000 alloc=4.0GB sys=8.0GB
[INFO] [05-29|02:59:57.156] [8/15 HashState] ETL [1/2] Extracting from=AccountChangeSet current_prefix=00000000027dca22
unless restart
I have a similar but different issue: upon interruption of the Tx Lookup Index stage on initial sync, it doesnt appear to return to the sync (the index build wasnt complete and the temp directory went from 40gb to 0 in case thats relevant, but im assuming thats just flushed to db or something during erigon shutdown)
its been about three hours of :
INFO[06-25|12:29:14.435] [p2p] GoodPeers eth68=33 eth67=28 INFO[06-25|12:29:14.446] [txpool] stat pending=0 baseFee=0 queued=24000 alloc=2.5GB sys=4.5GB INFO[06-25|12:30:14.442] [txpool] stat pending=0 baseFee=0 queued=24000 alloc=2.3GB sys=4.5GB
seems to be stuck here:
[2/15 Headers] Waiting for Consensus Layer..
edit: im assuming I don't understand the log output but i did a test via jsonrpc and the node isnt returning data from any non snapshot blocks still
fixed by adding --internalcl, should anyone have this issue. syncing is continuing
my guess is something happens after the block execution stage, consensus layer turned off until then, maybe?
cheers
hi @simonblack the Internal CL has issues keeping in sync reliably, and it used to be the default CL. Recently the internalcl flag was introduced to make the external CL the default. From the other issues posted, I can see that the erigon dev team has a plan to rework and improve the internal CL. For my setup, I have been running on prysm for almost 3 months now, sync is always to latest block.
got it, thank you
so for now the foolproof approach is definitely to install an external CL right? im new to the post merge world
I would recommend you try various options (internal, prysm, lighthouse, etc). evaluate their performance/stability, and settle for the one which meets your project requirements.
This issue is stale because it has been open for 40 days with no activity. Remove stale label or comment, or this will be closed in 7 days.
This issue was closed because it has been stalled for 7 days with no activity.
Hello, we are experiencing some issues with our erigon node in Goerli, where the node stops syncing and continues after a restart. We are wondering if this is a general issue or if there's a fix or flag for this that we are not setting. Below are the logs for the time it got stuck.