Closed Aracki closed 1 year ago
Same happened to us. Anyone knows how to fix this?
After we upgraded from
1.1.20
to1.1.21
, the nodes are randomly becoming stuck. No new blocks at all:This is logs example. You can see that after ~21:44 it became unhealthy & not getting any new blocks.
INFO [04-10|21:43:08.507] Imported new chain segment blocks=20 txs=1689 mgas=182.628 elapsed=5.672s mgasps=32.193 number=27,237,294 hash=6331a2..f6a57d dirty=2.25MiB ignored=1 INFO [04-10|21:43:13.026] Deep froze chain segment blocks=20 elapsed=22.551ms number=27,147,294 hash=88801b..dff61c INFO [04-10|21:44:02.862] Downloader queue stats receiptTasks=0 blockTasks=0 itemSize=23.17KiB throttle=8192 ERROR[04-10|21:44:04.925] Invalid merkle root remote=1b9937..aaef75 local=b604ae..6e2805 WARN [04-10|21:44:04.927] Synchronisation failed, dropping peer peer=448fccf3fcaaad193fc352f3eb4be4166e8ee05f235d847604def13addb73b5a err="retrieved hash chain is invalid: invalid merkle root (remote: 1b99374ba56e64254d0a9d68555ac74ed9e36d92268b97bedf12186d3caaef75 local: b604aee4eed8459ca3ae4061bd98739a8c44b6115b6b9afd1c89f6ea296e2805)" WARN [04-10|21:44:04.928] Synchronisation failed, retrying peer=448fccf3fcaaad193fc352f3eb4be4166e8ee05f235d847604def13addb73b5a err="shutting down" WARN [04-10|21:44:04.929] Synchronisation failed, retrying peer=448fccf3fcaaad193fc352f3eb4be4166e8ee05f235d847604def13addb73b5a err="write tcp 10.2.21.177:45634->142.132.208.184:30309: use of closed network connection" WARN [04-10|21:44:04.929] Synchronisation failed, retrying peer=448fccf3fcaaad193fc352f3eb4be4166e8ee05f235d847604def13addb73b5a err="shutting down" WARN [04-10|21:44:04.929] Synchronisation failed, retrying peer=448fccf3fcaaad193fc352f3eb4be4166e8ee05f235d847604def13addb73b5a err="peer is unknown or unhealthy" INFO [04-10|21:44:13.047] Deep froze chain segment blocks=8 elapsed=19.940ms number=27,147,302 hash=9a5c96..66fb82 INFO [04-10|22:09:29.965] Writing clean trie cache to disk path=/bsc/.ethereum/geth/triecache threads=1 INFO [04-10|22:09:29.968] Regenerated local transaction journal transactions=0 accounts=0 INFO [04-10|22:09:43.627] Persisted the clean trie cache path=/bsc/.ethereum/geth/triecache elapsed=13.662s INFO [04-10|23:09:29.965] Writing clean trie cache to disk path=/bsc/.ethereum/geth/triecache threads=1 INFO [04-10|23:09:29.966] Regenerated local transaction journal transactions=0 accounts=0 INFO [04-10|23:09:43.101] Persisted the clean trie cache path=/bsc/.ethereum/geth/triecache elapsed=13.135s INFO [04-11|00:09:29.965] Writing clean trie cache to disk path=/bsc/.ethereum/geth/triecache threads=1 INFO [04-11|00:09:29.966] Regenerated local transaction journal transactions=0 accounts=0 INFO [04-11|00:09:43.030] Persisted the clean trie cache path=/bsc/.ethereum/geth/triecache elapsed=13.064s INFO [04-11|01:09:29.966] Writing clean trie cache to disk path=/bsc/.ethereum/geth/triecache threads=1 INFO [04-11|01:09:29.967] Regenerated local transaction journal transactions=0 accounts=0 INFO [04-11|01:09:43.273] Persisted the clean trie cache path=/bsc/.ethereum/geth/triecache elapsed=13.306s INFO [04-11|02:09:29.965] Writing clean trie cache to disk path=/bsc/.ethereum/geth/triecache threads=1 INFO [04-11|02:09:29.966] Regenerated local transaction journal transactions=0 accounts=0 INFO [04-11|02:09:43.361] Persisted the clean trie cache path=/bsc/.ethereum/geth/triecache elapsed=13.395s INFO [04-11|03:09:29.966] Writing clean trie cache to disk path=/bsc/.ethereum/geth/triecache threads=1 INFO [04-11|03:09:29.967] Regenerated local transaction journal transactions=0 accounts=0 INFO [04-11|03:09:43.146] Persisted the clean trie cache path=/bsc/.ethereum/geth/triecache elapsed=13.179s INFO [04-11|04:09:29.966] Writing clean trie cache to disk path=/bsc/.ethereum/geth/triecache threads=1 INFO [04-11|04:09:29.966] Regenerated local transaction journal transactions=0 accounts=0 INFO [04-11|04:09:43.364] Persisted the clean trie cache path=/bsc/.ethereum/geth/triecache elapsed=13.397s INFO [04-11|05:09:29.966] Writing clean trie cache to disk path=/bsc/.ethereum/geth/triecache threads=1 INFO [04-11|05:09:29.966] Regenerated local transaction journal transactions=0 accounts=0 INFO [04-11|05:09:43.403] Persisted the clean trie cache path=/bsc/.ethereum/geth/triecache elapsed=13.437s INFO [04-11|06:09:29.965] Writing clean trie cache to disk path=/bsc/.ethereum/geth/triecache threads=1 INFO [04-11|06:09:29.967] Regenerated local transaction journal transactions=0 accounts=0 INFO [04-11|06:09:42.979] Persisted the clean trie cache path=/bsc/.ethereum/geth/triecache elapsed=13.013s INFO [04-11|07:09:29.966] Writing clean trie cache to disk path=/bsc/.ethereum/geth/triecache threads=1 INFO [04-11|07:09:29.966] Regenerated local transaction journal transactions=0 accounts=0 INFO [04-11|07:09:43.384] Persisted the clean trie cache path=/bsc/.ethereum/geth/triecache elapsed=13.417s INFO [04-11|08:09:29.966] Writing clean trie cache to disk path=/bsc/.ethereum/geth/triecache threads=1 INFO [04-11|08:09:29.966] Regenerated local transaction journal transactions=0 accounts=0
Hi there, what's your start up command? geth version? And Which snapshot you're using?
Also, Please share eth.syncing
, eth.blockNumber
, and net.peerCount
outputs? Thanks.
Happen to me too. There seem to be some bad peers. I need to remove all the peers and start the node again.
geth --exec 'reset = function (){ for (i=0; i<admin.peers.length; i++) admin.removePeer(admin.peers[0]["enode"]) }; reset()' attach data/geth.ipc
@deepcrazy
We are using ghcr.io/bnb-chain/bsc:1.1.21
version. Startup command args are like this:
"--networkid=56",
"--datadir=/bsc/.ethereum/",
"--config=/bsc/.ethereum/config.toml",
"--syncmode=full",
"--gcmode=archive",
"--cache=65536",
"--maxpeers=100",
"--http",
"--http.addr=0.0.0.0",
"--http.port=8545",
"--http.api=eth,net,web3,admin,debug,personal,txpool",
"--http.corsdomain=*",
"--http.vhosts=*",
"--ws",
"--ws.addr=0.0.0.0",
"--ws.port=8546",
"--ws.api=eth,net,web3,admin,debug,personal,miner,txpool",
"--ws.origins=*",
"--port=30311",
"--txlookuplimit=0",
"--rpc.allow-unprotected-txs",
"--cache.preimages",
"--bootnodes=enode://1cc4534b14cfe351ab740a1418ab944a234ca2f702915eadb7e558a02010cb7c5a8c295a3b56bcefa7701c07752acd5539cb13df2aab8ae2d98934d712611443@52.71.43.172:30311,enode://28b1d16562dac280dacaaf45d54516b85bc6c994252a9825c5cc4e080d3e53446d05f63ba495ea7d44d6c316b54cd92b245c5c328c37da24605c4a93a0d099c4@34.246.65.14:30311,enode://5a7b996048d1b0a07683a949662c87c09b55247ce774aeee10bb886892e586e3c604564393292e38ef43c023ee9981e1f8b335766ec4f0f256e57f8640b079d5@35.73.137.11:30311",
For eth.syncing
I'm not sure what was the value at the time, as we don't have a metric for that.
eth.blockNumber
was stuck at 27.237.302
net.peerCount
was stuck at 33 (although before the issue it was always in the range of 34-36)
What do you exactly mean by "which snapshot we are using"...?
@deepcrazy
We are using
ghcr.io/bnb-chain/bsc:1.1.21
version. Startup command args are like this:"--networkid=56", "--datadir=/bsc/.ethereum/", "--config=/bsc/.ethereum/config.toml", "--syncmode=full", "--gcmode=archive", "--cache=65536", "--maxpeers=100", "--http", "--http.addr=0.0.0.0", "--http.port=8545", "--http.api=eth,net,web3,admin,debug,personal,txpool", "--http.corsdomain=*", "--http.vhosts=*", "--ws", "--ws.addr=0.0.0.0", "--ws.port=8546", "--ws.api=eth,net,web3,admin,debug,personal,miner,txpool", "--ws.origins=*", "--port=30311", "--txlookuplimit=0", "--rpc.allow-unprotected-txs", "--cache.preimages", "--bootnodes=enode://1cc4534b14cfe351ab740a1418ab944a234ca2f702915eadb7e558a02010cb7c5a8c295a3b56bcefa7701c07752acd5539cb13df2aab8ae2d98934d712611443@52.71.43.172:30311,enode://28b1d16562dac280dacaaf45d54516b85bc6c994252a9825c5cc4e080d3e53446d05f63ba495ea7d44d6c316b54cd92b245c5c328c37da24605c4a93a0d099c4@34.246.65.14:30311,enode://5a7b996048d1b0a07683a949662c87c09b55247ce774aeee10bb886892e586e3c604564393292e38ef43c023ee9981e1f8b335766ec4f0f256e57f8640b079d5@35.73.137.11:30311",
For
eth.syncing
I'm not sure what was the value at the time, as we don't have a metric for that.eth.blockNumber
was stuck at 27.237.302net.peerCount
was stuck at 33 (although before the issue it was always in the range of 34-36)What do you exactly mean by "which snapshot we are using"...?
Hi there, In this case, please try to set the maxpeers
to 200,300 or 500 and Also, append the static nodes from here: https://api.binance.org/v1/discovery/peers. Then restart the node and let it run for a while.
Try doing this:
StaticPeers
and BootNodes
and replace those in your old config or replace whole config altogethersystemctl stop bsc
or whatever script you're usinggeth/nodes
: cd /path/to/bsc_datadir/geth/; rm -rf nodes
systemctl start bsc
it should be working correctlyI am having the same issue as everyone else. I have tried @gituser recommendation of loading StaticPeers
and BootNodes
out of the mainnet.zip
linked above. I disabled peer discovery to weed out all of the bad nodes running older versions of BNB and I have been able to connect to 2 out of the 32 static nodes I grabbed out of the config.toml
in the mainnet.zip
.
> admin.peers
[{
caps: ["diff/1", "eth/66", "eth/67", "les/2", "les/3", "les/4", "snap/1"],
enode: "enode://ace8e3b7e96290392a9c6238e005539dd1d1ca7c18aeedd979087789812e0656407e2c4baabcd83b50d583b583aa959ff42e5c95571d76b84995aad722a9a85c@44.198.55.182:30311",
id: "25b173c00d58eac53442faee5a67f1543e496bd7df818e062cee202b8ec5582e",
name: "Geth/v1.1.21-6e06ed10/linux-amd64/go1.19.7",
network: {
inbound: false,
localAddress: "172.20.0.15:60284",
remoteAddress: "44.198.55.182:30311",
static: true,
trusted: false
},
protocols: {
diff: {
diff_sync: false,
version: 1
},
eth: {
difficulty: 54357728,
head: "0xd79d41baf9ebac436b2fd3279dc1d764e1eafe88d839d6f4a1ce7842e2ff3ff9",
version: 67
},
snap: {
version: 1
}
}
}, {
caps: ["diff/1", "eth/66", "eth/67", "les/2", "les/3", "les/4", "snap/1"],
enode: "enode://bdbcb42ff17a52af7ac120f23ee86f525ffbd854ce76e54bad858cf741fcd524d0f810be399d437bea939682a919c5af5df31811c43ccc270d04485caf4ddaeb@52.206.226.126:30311",
id: "67950c087377c72d2ec8f38f3a5fc0683097756e994ee546ef5fee3b53400b2a",
name: "Geth/v1.1.21-6e06ed10/linux-amd64/go1.19.7",
network: {
inbound: false,
localAddress: "172.20.0.15:60570",
remoteAddress: "52.206.226.126:30311",
static: true,
trusted: false
},
protocols: {
diff: {
diff_sync: false,
version: 1
},
eth: {
difficulty: 54357856,
head: "0x9568f5c3fd2a2380e553c386bff7a3282061f7185827a31a0268743a10be17ed",
version: 67
},
snap: {
version: 1
}
}
}]
The node does appear to be working now, but only being able to connect to 2 peers is not ideal for a production environment. How would you recommend connecting to more than 2 peers out of the static nodes I have listed in my config.toml
?
Okay, so maybe the answer is time. I let the node run over the weekend and eventually it connected to 10 peers and the node never locked up once.
Happen to me too. There seem to be some bad peers. I need to remove all the peers and start the node again.
geth --exec 'reset = function (){ for (i=0; i<admin.peers.length; i++) admin.removePeer(admin.peers[0]["enode"]) }; reset()' attach data/geth.ipc
Any known fix? It seems impossible to keep my node synced because of this, it happens very frequently. Cleaning peers with the script gets it back alive for a while until it gets stuck again or starts importing blocks in batches instead of one by one every three seconds.
After we upgraded from
1.1.20
to1.1.21
, the nodes are randomly becoming stuck. No new blocks at all:This is logs example. You can see that after ~21:44 it became unhealthy & not getting any new blocks.