Open anthonyoliai opened 2 years ago
Furthermore, i'm not sure if this is related, but could it be that due to this sidechain ghost-state attack, the following error occurs?
DEBUG[10-24|10:21:26.001] Discarded delivered header or block, too far away peer=d64565f54f769356073dbba162538f4b5d469209c29e5535fde022dbce540648 number=65812 hash=f73e9b..ff04bf distance=43496
DEBUG[10-24|10:21:26.001] Peer discarded announcement peer=27ee362ec1dedff967343917a753d2998e9af0d99879bdc400b3d6354fbacd31 number=65812 hash=f73e9b..ff04bf distance=43496
DEBUG[10-24|10:21:26.001] Discarded delivered header or block, too far away peer=cc9586812a41bf3a7a9ac1a5aea898f942acaf73f4d2d0692fe93c739c09c964 number=65812 hash=f73e9b..ff04bf distance=43496
I think the problem is that you have a very long sidechain where nothing happens, so when we import the long sidechain, but the state doesn't change, then we stop the chain import. This is only a concern in pow, in pos it can happen that multiple blocks don't update the state, so it might be okay to just import the blocks here? Would be good to talk a bit about this on triage
For a quick fix you can probably delete the datadir of the affected nodes, so they sync the correct chain directly and not as a sidechain
I think the problem is that you have a very long sidechain where nothing happens, so when we import the long sidechain, but the state doesn't change, then we stop the chain import. This is only a concern in pow, in pos it can happen that multiple blocks don't update the state, so it might be okay to just import the blocks here? Would be good to talk a bit about this on triage
First of all thanks @MariusVanDerWijden ! That does make sense, often the miners are not incorporating state changes as the txs come in bursts. Hence, a lot of empty blocks are imported. It could be for example that for 100 blocks there are no state changes.
I'm not well versed with what you mean by sidechain in this context. Do you just refer to any other "state" coming from other peers other than the node itself? Just for clarification.
Looking at my nodes I do see that they are perfectly in sync at the start, as in, e.g everytime a miner mines a new block it gets properly propagated and imported. So I assume somehow at some point, due to there not being any state changes for a longer period of time this error might occur. (So far it happens around 24 hours in).
I'm just trying to understand exactly what is happening here. Looking at https://github.com/ethereum/go-ethereum/blob/067bac3f2409aec16994163e7a635d36bdb9b956/core/blockchain.go#L1851.
I assume that if we have for example state S, which already exists on the canon chain, and we import a new state N; somehow N == S and hence the state already exists, so we can't just proceed to importing these blocks.
I do have to mention though that I am running all these nodes on AWS, EKS, using proof of authority. They are running on seperate pods. Whenever I notice this "ghost state attack issue" I simply tear down the pod, hence, the container which runs the node is restarted, and the datadir is deleted. Doing so the node properly syncs back up, with the already running nodes.
My initial thought was to write a shell script which listens to
`> eth.syncing.currentBlock 22316
eth.syncing.highestBlock 65604`
and simply restarts if the absolute value between current and highest is very big. However, I would like to prevent this, as scheduled txs might be lost.
I have not yet found my old write-up, but I did find some shorter tldr;s about the issue
So, the TLDR; is, if we can
[B..Bn]
inserted into the database, with only header validationBx
, which has the same stateroot as an existing state. block
or state
on the blocks.The attack needs to
state
is pruned, andhead - 127
. I have not yet found my old write-up, but I did find some shorter tldr;s about the issue
So, the TLDR; is, if we can
create a side chain, which is old enough so that the ancestor is pruned
- We get blocks
[B..Bn]
inserted into the database, with only header validation- Create a block
Bx
, which has the same stateroot as an existing state.- And then Geth will switch out the canonical chain for the invalid sidechain, if it has higher TD, despite not having validated the
block
orstate
on the blocks.The attack needs to
- start on a fork-point far enough back that the
state
is pruned, and- be long enough to reach
head - 127
.- and, of course, continue along in order to have higher TD than the chain to overtake.
Thanks, and interesting, I think it's very odd then that this error is happening. To give some more context; All geth nodes run on a seperate container. However, each container contains a geth image which was snapshotted. Meaning, that they have a carbon copy of the same data dir from initialization. (They are now individual dirs, but they were copied over). Thus, they all start at a specific a block. But after that, they stay in sync, and I don't see any nodes falling behind to the point that your second point is reached (head - 127). What do you mean with point 1? The fork point.
bump
System information
Geth version:
v1.10.25
OS & Version: UbuntuExpected behaviour
I'm currently running a private POA network on AWS using Kubernetes EKS. I've successfully deployed all nodes, and the network is operational.
I'm currently running 3 miners, 1 RPC, and 1 full bootnode. Output from ethstats:
How my network currently operates is that there are most of the time empty blocks, with some burst of txs from time-to-time.
Actual behaviour
I notice that at some point, some of the nodes start to drop peers and unsync. I made sure that the hardware requirements are there, and I'm closely monitoring all my nodes through Prometheus/Grafana.
For example, yesterday, my RPC node stopped syncing at block 8000, and was therefore stuck at that block. Interestingly, statically adding the peers did not work either. I was forced to kill the node and have it restart from scratch.
The reason for failure is quite interesting however, this is example output:
I can't find much documentation on this!
What does
retrieved hash chain is invalid: sidechain ghost-state attack
mean? And how would I go about preventing this?Steps to reproduce the behaviour
I can't share much information regarding how to reproduce this, but i have set up a POA network with a block time of 5 seconds, with 3x miners, 1 RPC and 1 full bootnode.
All nodes connect to the bootnode as entrypoint.
Backtrace
See up.
When submitting logs: please submit them as text and not screenshots.