ethereum-optimism / op-geth

GNU Lesser General Public License v3.0
287 stars 746 forks source link

Database corruption during restarts since switching to upstream v1.14.7 #360

Closed sebastianst closed 2 months ago

sebastianst commented 2 months ago

We've recently merged in upstream v1.14.7 and during a sequencer restart, the following errors popped up:

t=2024-08-13T15:18:31+0000 lvl=info msg="Loaded most recent local block" number=8539128 hash=0x3cef05fd474e7e7dc7eaff71e135604e47985a394a59b8dccd4fe408f3c2888b td=0 age=7s
t=2024-08-13T15:18:31+0000 lvl=info msg="Loaded most recent local finalized block" number=8538465 hash=0x810a9d113721b03303e0daffd8a6e6edb62e7baf6bb1481fee6c586959d4112a td=0 age=22m13s
t=2024-08-13T15:18:31+0000 lvl=warn msg="Head state missing, repairing" number=8539128 hash=0x3cef05fd474e7e7dc7eaff71e135604e47985a394a59b8dccd4fe408f3c2888b snaproot=0xc8380132dab17c437b74e60250d0e854e15af86a97bd3158d79caf6eee78540e
t=2024-08-13T15:18:36+0000 lvl=info msg="Rewound to block with state" number=8460000 hash=0xe2d7f61e7b3f63a8e57662e28a22a0092b03fd6b5ca135ca7e9322fa9b3ca589
t=2024-08-13T15:19:29+0000 lvl=error msg="Error in block freeze operation" err="canonical hash missing, can't freeze block 8534620"
t=2024-08-13T15:20:29+0000 lvl=error msg="Error in block freeze operation" err="canonical hash missing, can't freeze block 8530106"
t=2024-08-13T15:21:29+0000 lvl=error msg="Error in block freeze operation" err="canonical hash missing, can't freeze block 8525615"
t=2024-08-13T15:22:29+0000 lvl=error msg="Error in block freeze operation" err="canonical hash missing, can't freeze block 8521225"
t=2024-08-13T15:23:29+0000 lvl=error msg="Error in block freeze operation" err="canonical hash missing, can't freeze block 8516828"
t=2024-08-13T15:24:29+0000 lvl=error msg="Error in block freeze operation" err="canonical hash missing, can't freeze block 8512200"
t=2024-08-13T15:25:29+0000 lvl=error msg="Error in block freeze operation" err="canonical hash missing, can't freeze block 8507555"
t=2024-08-13T15:26:29+0000 lvl=error msg="Error in block freeze operation" err="canonical hash missing, can't freeze block 8503264"
... (repeating every minute)

So at startup, safe and finalized are reset to genesis. The op-node will then currently walk back to genesis. This was fixed locally by shutting down op-node, and using op-wheel engine rewind --set-head --to 8460000, but the source for this db corruption isn't clear yet.

sebastianst commented 2 months ago

It happened again a couple of times. That's a log excerpt from inactive sepolia seq-2.

t=2024-08-14T19:54:49+0000 lvl=info msg="Enabling metrics collection"
t=2024-08-14T19:54:49+0000 lvl=info msg="Enabling stand-alone metrics HTTP endpoint" address=0.0.0.0:6060
t=2024-08-14T19:54:49+0000 lvl=info msg="Starting metrics server" addr=http://0.0.0.0:6060/debug/metrics
t=2024-08-14T19:54:49+0000 lvl=info msg="Maximum peer count" ETH=30 total=30
t=2024-08-14T19:54:49+0000 lvl=info msg="Smartcard socket not found, disabling" err="stat /run/pcscd/pcscd.comm: no such file or directory"
t=2024-08-14T19:54:49+0000 lvl=info msg="Using pebble as db engine"
t=2024-08-14T19:54:49+0000 lvl=info msg="Set global gas cap" cap=50000000
t=2024-08-14T19:54:50+0000 lvl=info msg="Initializing the KZG library" backend=gokzg
t=2024-08-14T19:54:50+0000 lvl=info msg="Allocated trie memory caches" clean="1.20 GiB" dirty="2.00 GiB"
t=2024-08-14T19:54:50+0000 lvl=info msg="Using pebble as the backing database"
t=2024-08-14T19:54:50+0000 lvl=info msg="Allocated cache and file handles" database=/db/geth/chaindata cache="4.00 GiB" handles=524288
t=2024-08-14T19:54:51+0000 lvl=info msg="Opened ancient database" database=/db/geth/chaindata/ancient/chain readonly=false
t=2024-08-14T19:54:51+0000 lvl=info msg="State scheme set by user" scheme=path
t=2024-08-14T19:54:51+0000 lvl=warn msg="Sanitizing invalid node buffer size" provided="2.00 GiB" updated="256.00 MiB"
t=2024-08-14T19:54:51+0000 lvl=info msg="Failed to load journal, discard it" err="unexpected journal version want 1 got 0"
t=2024-08-14T19:54:51+0000 lvl=info msg="Opened ancient database" database=/db/geth/chaindata/ancient/state readonly=false
t=2024-08-14T19:54:51+0000 lvl=warn msg="Truncating freezer table" database=/db/geth/chaindata/ancient/state table=account.index items=3889669 limit=3887492
t=2024-08-14T19:54:51+0000 lvl=warn msg="Truncating freezer table" database=/db/geth/chaindata/ancient/state table=storage.index items=3889669 limit=3887492
t=2024-08-14T19:54:51+0000 lvl=warn msg="Truncating freezer table" database=/db/geth/chaindata/ancient/state table=account.data items=3889669 limit=3887492
t=2024-08-14T19:54:51+0000 lvl=warn msg="Truncating freezer table" database=/db/geth/chaindata/ancient/state table=storage.data items=3889669 limit=3887492
t=2024-08-14T19:54:51+0000 lvl=warn msg="Truncating freezer table" database=/db/geth/chaindata/ancient/state table=history.meta items=3889669 limit=3887492
t=2024-08-14T19:54:51+0000 lvl=warn msg="Truncated extra state histories" number=2177
t=2024-08-14T19:55:12+0000 lvl=info msg=""
t=2024-08-14T19:55:12+0000 lvl=info msg=---------------------------------------------------------------------------------------------------------------------------------------------------------
t=2024-08-14T19:55:12+0000 lvl=info msg="Chain ID:  11155420 (OP Sepolia Testnet)"
t=2024-08-14T19:55:12+0000 lvl=info msg="Consensus: Optimism"
t=2024-08-14T19:55:12+0000 lvl=info msg=""
t=2024-08-14T19:55:12+0000 lvl=info msg="Pre-Merge hard forks (block based):"
t=2024-08-14T19:55:12+0000 lvl=info msg=" - Homestead:                   #0        (https://github.com/ethereum/execution-specs/blob/master/network-upgrades/mainnet-upgrades/homestead.md)"
t=2024-08-14T19:55:12+0000 lvl=info msg=" - Tangerine Whistle (EIP 150): #0        (https://github.com/ethereum/execution-specs/blob/master/network-upgrades/mainnet-upgrades/tangerine-whistle.md)"
t=2024-08-14T19:55:12+0000 lvl=info msg=" - Spurious Dragon/1 (EIP 155): #0        (https://github.com/ethereum/execution-specs/blob/master/network-upgrades/mainnet-upgrades/spurious-dragon.md)"
t=2024-08-14T19:55:12+0000 lvl=info msg=" - Spurious Dragon/2 (EIP 158): #0        (https://github.com/ethereum/execution-specs/blob/master/network-upgrades/mainnet-upgrades/spurious-dragon.md)"
t=2024-08-14T19:55:12+0000 lvl=info msg=" - Byzantium:                   #0        (https://github.com/ethereum/execution-specs/blob/master/network-upgrades/mainnet-upgrades/byzantium.md)"
t=2024-08-14T19:55:12+0000 lvl=info msg=" - Constantinople:              #0        (https://github.com/ethereum/execution-specs/blob/master/network-upgrades/mainnet-upgrades/constantinople.md)"
t=2024-08-14T19:55:12+0000 lvl=info msg=" - Petersburg:                  #0        (https://github.com/ethereum/execution-specs/blob/master/network-upgrades/mainnet-upgrades/petersburg.md)"
t=2024-08-14T19:55:12+0000 lvl=info msg=" - Istanbul:                    #0        (https://github.com/ethereum/execution-specs/blob/master/network-upgrades/mainnet-upgrades/istanbul.md)"
t=2024-08-14T19:55:12+0000 lvl=info msg=" - Muir Glacier:                #0        (https://github.com/ethereum/execution-specs/blob/master/network-upgrades/mainnet-upgrades/muir-glacier.md)"
t=2024-08-14T19:55:12+0000 lvl=info msg=" - Berlin:                      #0        (https://github.com/ethereum/execution-specs/blob/master/network-upgrades/mainnet-upgrades/berlin.md)"
t=2024-08-14T19:55:12+0000 lvl=info msg=" - London:                      #0        (https://github.com/ethereum/execution-specs/blob/master/network-upgrades/mainnet-upgrades/london.md)"
t=2024-08-14T19:55:12+0000 lvl=info msg=" - Arrow Glacier:               #0        (https://github.com/ethereum/execution-specs/blob/master/network-upgrades/mainnet-upgrades/arrow-glacier.md)"
t=2024-08-14T19:55:12+0000 lvl=info msg=" - Gray Glacier:                #0        (https://github.com/ethereum/execution-specs/blob/master/network-upgrades/mainnet-upgrades/gray-glacier.md)"
t=2024-08-14T19:55:12+0000 lvl=info msg=""
t=2024-08-14T19:55:12+0000 lvl=info msg="Merge configured:"
t=2024-08-14T19:55:12+0000 lvl=info msg=" - Hard-fork specification:    https://github.com/ethereum/execution-specs/blob/master/network-upgrades/mainnet-upgrades/paris.md"
t=2024-08-14T19:55:12+0000 lvl=info msg=" - Network known to be merged: true"
t=2024-08-14T19:55:12+0000 lvl=info msg=" - Total terminal difficulty:  0"
t=2024-08-14T19:55:12+0000 lvl=info msg=" - Merge netsplit block:       #0       "
t=2024-08-14T19:55:12+0000 lvl=info msg=""
t=2024-08-14T19:55:12+0000 lvl=info msg="Post-Merge hard forks (timestamp based):"
t=2024-08-14T19:55:12+0000 lvl=info msg=" - Shanghai:                    @1699981200 (https://github.com/ethereum/execution-specs/blob/master/network-upgrades/mainnet-upgrades/shanghai.md)"
t=2024-08-14T19:55:12+0000 lvl=info msg=" - Cancun:                      @1708534800 (https://github.com/ethereum/execution-specs/blob/master/network-upgrades/mainnet-upgrades/cancun.md)"
t=2024-08-14T19:55:12+0000 lvl=info msg=" - Regolith:                    @0         "
t=2024-08-14T19:55:12+0000 lvl=info msg=" - Canyon:                      @1699981200"
t=2024-08-14T19:55:12+0000 lvl=info msg=" - Ecotone:                     @1708534800"
t=2024-08-14T19:55:12+0000 lvl=info msg=" - Fjord:                       @1716998400"
t=2024-08-14T19:55:12+0000 lvl=info msg=" - Granite:                     @1723478400"
t=2024-08-14T19:55:12+0000 lvl=info msg=""
t=2024-08-14T19:55:12+0000 lvl=info msg=---------------------------------------------------------------------------------------------------------------------------------------------------------
t=2024-08-14T19:55:12+0000 lvl=info msg=""
t=2024-08-14T19:55:13+0000 lvl=info msg="Loaded most recent local block" number=15931368 hash=0x3c42264d97297745ee9fdb8de3583e0abb86b9328b8e5ff1d0be5998e5dc7f9a td=0 age=37s
t=2024-08-14T19:55:13+0000 lvl=info msg="Loaded most recent local finalized block" number=15930652 hash=0xcacd6a6921079b6f87d05e81dbdf53616dc2ae92162817f56bf6ad1843b6f4fb td=0 age=24m29s
t=2024-08-14T19:55:13+0000 lvl=info msg="Loaded last snap-sync pivot marker" number=12041571
t=2024-08-14T19:55:13+0000 lvl=warn msg="Head state missing, repairing" number=15931368 hash=0x3c42264d97297745ee9fdb8de3583e0abb86b9328b8e5ff1d0be5998e5dc7f9a snaproot=0x3bcc4bbbdf638894f82d4dcf10e1654a357266c765344863b94f39b5b18f2574
t=2024-08-14T19:55:31+0000 lvl=info msg="Block state missing, rewinding further" number=15930447 hash=0x389279f3c431ee5b8e53ef16d560538b8df0bc6f83811c013ca5183d3cfed4fa elapsed=18.412s
t=2024-08-14T19:55:49+0000 lvl=info msg="Block state missing, rewinding further" number=15929815 hash=0xe23fda5a618c5a1631a3a5ce430bea581dbf6e93b49a252cfce3f209a6943b7c elapsed=36.009s
t=2024-08-14T19:55:49+0000 lvl=info msg="Rewound to block with state" number=15929063 hash=0xa574d6ddc91d920271ae3e15e61f6e43f61d70ff9a314cfcc9c21653e3a600fb
t=2024-08-14T19:55:51+0000 lvl=error msg="Error in block freeze operation" err="canonical hash missing, can't freeze block 15930535"
t=2024-08-14T19:56:10+0000 lvl=info msg="Loaded most recent local block" number=15929063 hash=0xa574d6ddc91d920271ae3e15e61f6e43f61d70ff9a314cfcc9c21653e3a600fb td=0 age=1h18m24s
t=2024-08-14T19:56:10+0000 lvl=info msg="Loaded most recent local finalized block" number=15930652 hash=0xcacd6a6921079b6f87d05e81dbdf53616dc2ae92162817f56bf6ad1843b6f4fb td=<nil> age=25m26s
t=2024-08-14T19:56:10+0000 lvl=info msg="Loaded last snap-sync pivot marker" number=12041571
t=2024-08-14T19:56:10+0000 lvl=warn msg="Enabling snapshot recovery" chainhead=15929063 diskbase=15930615
t=2024-08-14T19:56:10+0000 lvl=warn msg="Snapshot is not continuous with chain" snaproot=0x839974d8a9d9852588fd32eebfde41e3f07f2d0b4806c953db41a44c22137807 chainroot=0xaef4e98a80d75974e247db4e0a9bb355657d404955416f7652ae49333cf87c55
t=2024-08-14T19:56:10+0000 lvl=info msg="Initialized transaction indexer" range="last 2350000 blocks"
t=2024-08-14T19:56:10+0000 lvl=info msg="Initialising Ethereum protocol" network=11155420 dbversion=8
t=2024-08-14T19:56:10+0000 lvl=info msg="Loaded local transaction journal" transactions=951 dropped=0
t=2024-08-14T19:56:10+0000 lvl=info msg="Regenerated local transaction journal" transactions=811 accounts=145
t=2024-08-14T19:56:10+0000 lvl=warn msg="Switch sync mode from snap sync to full sync"
t=2024-08-14T19:56:10+0000 lvl=info msg="Gasprice oracle is ignoring threshold set" threshold=2
t=2024-08-14T19:56:10+0000 lvl=warn msg="Engine API enabled" protocol=eth
t=2024-08-14T19:56:10+0000 lvl=info msg="Starting peer-to-peer node" instance=Geth/v1.101407.0-rc.1-ee83530a-20240808/linux-amd64/go1.22.6
t=2024-08-14T19:56:10+0000 lvl=info msg="New local node record" seq=1719261383223 id=89b03f986538b49bad99202e4429aac50b3870c6f1e14c8e9fe27ac9d5536c71 ip=10.17.64.82 udp=30303 tcp=30304
t=2024-08-14T19:56:10+0000 lvl=info msg="Started P2P networking" self="enode://1a5d373dbc1cf14646d68dc6cd396fea57409e51088a0249ce96c5daeeb5453421f59c6dc3ce67cc60450ea8acc95cd8a44bcd4f4fdc25f6ed55901e7b4756ed@10.17.64.82:30304?discport=30303"
t=2024-08-14T19:56:10+0000 lvl=info msg="IPC endpoint opened" url=/db/geth.ipc
t=2024-08-14T19:56:10+0000 lvl=info msg="Loaded JWT secret file" path=/etc/op-geth/jwt-secret.txt crc32=0x815e9662
t=2024-08-14T19:56:10+0000 lvl=info msg="HTTP server started" endpoint=[::]:8545 auth=false prefix="" cors=* vhosts=*
t=2024-08-14T19:56:10+0000 lvl=info msg="WebSocket enabled" url=ws://[::]:8546
t=2024-08-14T19:56:10+0000 lvl=info msg="WebSocket enabled" url=ws://[::]:8551
t=2024-08-14T19:56:10+0000 lvl=info msg="HTTP server started" endpoint=[::]:8551 auth=true prefix="" cors=localhost vhosts=localhost
t=2024-08-14T19:56:11+0000 lvl=info msg="Indexed transactions" blocks=2305 txs=14023 tail=13579064 elapsed=887.910ms
t=2024-08-14T19:56:51+0000 lvl=error msg="Number of finalized block is missing"
t=2024-08-14T19:57:51+0000 lvl=error msg="Number of finalized block is missing"
t=2024-08-14T19:58:51+0000 lvl=error msg="Number of finalized block is missing"
t=2024-08-14T19:59:51+0000 lvl=error msg="Number of finalized block is missing"
t=2024-08-14T20:00:51+0000 lvl=error msg="Number of finalized block is missing"
t=2024-08-14T20:01:51+0000 lvl=error msg="Number of finalized block is missing"
t=2024-08-14T20:02:51+0000 lvl=error msg="Number of finalized block is missing"
sebastianst commented 2 months ago

Related: https://github.com/ethereum/go-ethereum/issues/30119

sebastianst commented 2 months ago

Fix implemented in #368 - adds journal v0 to v1 migration code. https://github.com/ethereum/go-ethereum/pull/28940 introduced a new journal format, but didn't add migration code.