ethereum-optimism / developers

This repository is to serve as a place where builders on every level of the OP Stack can come to collaborate.
Creative Commons Zero v1.0 Universal
73 stars 44 forks source link

Easy wrong forks on op-geth/v1.101304.2 X op-node/v1.4.0 on mainnet?! #468

Open juno-yu opened 10 months ago

juno-yu commented 10 months ago

Describe the bug Easy hardforks on op-geth/v1.101304.2 X op-node/v1.4.0 on mainnet - hit 3 times in 48hours (All on different heights)

To Reproduce Run with op-geth/v1.101304.2 X op-node/v1.4.0 on mainnet , archive node , sync from L1

Expected behavior Nodes won't finalise wrong block

Screenshots Nodes finalize and stop at wrong blocks that didn't match with https://optimistic.etherscan.io/ && can't rollback by debugSetHead on op-geth (nodes would want rollback to older heights... which sounds taking very long time for ETA)

System Specs:

Additional context

mslipper commented 10 months ago

Hi Juno,

What do you mean by "nodes finalize and stop at wrong blocks" - are you saying that the node diverges from what's on Etherscan, or that the node halts outright? Or both? Can you please post some logs for us to see?

Lastly, are you running with l1.trustrpc set to true?

juno-yu commented 10 months ago

Hi Juno,

What do you mean by "nodes finalize and stop at wrong blocks" - are you saying that the node diverges from what's on Etherscan, or that the node halts outright? Or both? Can you please post some logs for us to see?

Lastly, are you running with l1.trustrpc set to true?

Yes

This was one of the node going wrong

 t=2024-01-04T23:07:01+0000 lvl=info msg="no peers ready to handle block requests for more P2P requests for L2 block history" target=114,388,037 end=0x00008fbf237170d75421d65fad1bc435c91d5
 246aec4b4169b02fa5782f9c143:114393561 current=114,393,330
that one stucking at 114,388,037

seem that node hardforked to some where , hash mismatch

   "checkTime": "2024-01-04T23:45:58.456Z",
   "blockHeight": 114388037,
   "blockHash": "0xdf8af4ce7e93aa91b3e3f9a1e667155baab5b34a17b0de7fe16d8704fdf7ab46",
   "blockTime": "2024-01-04T13:27:31.000Z",

https://optimistic.etherscan.io/block/114388037 expect 0xd915e7daf532a72aa626207501b44727d95af206283237f01386ec084902bc95

juno-yu commented 10 months ago

merely rollback by 1 block (the fork was 1 block only) seemed not the correct way out

debug.setHead 1 block back for op-geth on that node then restart op-node -> doing some Walking back L1Block by hash for multi-day data older than the problematic block , (don't know need how long & how much it rolling back), so I had to stop it ,restore by alternative ways as catch up is slow nowadays

juno-yu commented 10 months ago

l1.trustrpc

not with l1.trustrpc that time , how does this affect node behavior on fork / rollback?! we pointing to private geth (eth) so should be safe to enable too , but not at the forked moment

sebastianst commented 10 months ago

@juno-yu we've recently received reports that receipts fetching with l1.trustrpc == true could lead to missing receipts during block derivation, probably coming from a temporary problem with the L1 connection, in turn deriving a block with missing user deposit transactions. We are still investigating why receipts fetching could return less receipts for a block without an error. In the meantime, we've released an image op-node/v1.4.3-rc.3 that puts back receipts validation even if l1.trustrpc is enabled. We will finalize this release candidate this week. You can safely use this image with l1.trustrpc == true or disable it in your existing setup.

Can you confirm that block you mentioned above ("blockHeight": 114388037, "blockHash": "0xdf8af4ce7e93aa91b3e3f9a1e667155baab5b34a17b0de7fe16d8704fdf7ab46") was the first diverging block?

If yes, can you access this block's transactions/tx hashes? How many does it contain? Is it less that the correct block's number of transactions? If it is missing some, can you confirm that these are missing user deposits?

jun0tpyrc commented 10 months ago

thanks , good to know cause being addressed

I don't keep the broken datadir with us as archive datadir are like 6TB+ nowadays and i needed to recover things for our production workload - can't check those broken blocks boundary again

smartcontracts commented 9 months ago

@sebastianst what's the status of this?