eqlabs / pathfinder

A Starknet full node written in Rust
https://eqlabs.github.io/pathfinder/
Other
613 stars 222 forks source link

Reorg on pruned databases not correctly updating system contract state root #2110

Closed kkovaacs closed 1 week ago

kkovaacs commented 2 weeks ago

With pathfinder 0.13.2 running in pruned mode a reorg of one block (probably just a feeder gateway endpoint inconsistency) leads to the following error:

Jul 04 07:20:05 ns3002489 pathfinder[25729]: 2024-07-04T07:20:05  INFO Updated Starknet state with block 654827
Jul 04 07:22:20 ns3002489 pathfinder[25729]: 2024-07-04T07:22:20  INFO Updated Starknet state with block 654828
Jul 04 07:22:20 ns3002489 pathfinder[25729]: 2024-07-04T07:22:20  INFO L2 reorg occurred, new L2 head is block 654827
Jul 04 07:22:21 ns3002489 pathfinder[25729]: 2024-07-04T07:22:21 ERROR Sync consumer task terminated with an error reason=Update L2 state to 654828
Jul 04 07:22:21 ns3002489 pathfinder[25729]: Caused by:
Jul 04 07:22:21 ns3002489 pathfinder[25729]:     0: Updating Starknet state
Jul 04 07:22:21 ns3002489 pathfinder[25729]:     1: Update system contract state
Jul 04 07:22:21 ns3002489 pathfinder[25729]:     2: Update contract storage tree
Jul 04 07:22:21 ns3002489 pathfinder[25729]:     3: Node 5893880691 at height 0 is missing
Jul 04 07:22:21 ns3002489 pathfinder[25729]: 2024-07-04T07:22:21  INFO Channel closed, exiting latest poll task
Jul 04 07:22:21 ns3002489 pathfinder[25729]: 2024-07-04T07:22:21 ERROR Sync process ended unexpected with: Err(Sync process terminated)
Jul 04 07:22:22 ns3002489 pathfinder[25729]: Error: Unexpected shutdown
Jul 04 07:22:22 ns3002489 systemd[1]: starknetd.service: Main process exited, code=exited, status=1/FAILURE
Jul 04 07:22:22 ns3002489 systemd[1]: starknetd.service: Failed with result 'exit-code'.
Jul 04 07:22:22 ns3002489 systemd[1]: starknetd.service: Consumed 4month 1w 3d 10h 17min 14.786s CPU time, 18.2G memory peak, 0B memory swap peak.
Jul 04 07:22:23 ns3002489 systemd[1]: starknetd.service: Scheduled restart job, restart counter is at 1.
Jul 04 07:22:23 ns3002489 systemd[1]: Started starknetd.service - StarkNet.

Looks like the system contract state root node id is not updated properly during the reorg.

kkovaacs commented 2 weeks ago

Reproducible using the feeder_gateway tool simulating a reorg. When reorging at block N to N-1 (ie. just a single block) pathfinder fails iff --storage.state-tries is set to zero.

kkovaacs commented 2 weeks ago

More detailed logs:

2024-07-05T12:01:24 DEBUG State tree root node exists target_block=4999
2024-07-05T12:01:24 DEBUG Committed class trie target_block=4999 class_commitment=0x070891B394DE1D3F5B6DAD9BC6423CE034D47B81540CC80BADB5CB904C1D59B3
2024-07-05T12:01:24  INFO L2 reorg occurred, new L2 head is block 4999
2024-07-05T12:01:24 ERROR Sync consumer task terminated with an error reason=Update L2 state to 5000

Caused by:
    0: Updating Starknet state
    1: Update system contract state
    2: Update contract storage tree
    3: Node 228608 at height 0 is missing
2024-07-05T12:01:24 DEBUG Shutting down L1 and L2 sync producer tasks
2024-07-05T12:01:24 DEBUG L1 sync task cancelled successfully
2024-07-05T12:01:24  INFO Channel closed, exiting latest poll task
2024-07-05T12:01:24 DEBUG L2 sync task cancelled successfully
2024-07-05T12:01:24 DEBUG Latest polling task cancelled successfully
2024-07-05T12:01:24 ERROR Sync process ended unexpected with: Err(Sync process terminated)

That is, no revert was done because the state tree root node at the target (4999) block does exist. Unfortunately the root node it refers to does not because it has been removed when updating the state trie for block 5000. It seems to be the case that we're not properly removing root node(s) for old state when updating to block 5000.