ethereum / go-ethereum

Go implementation of the Ethereum protocol
https://geth.ethereum.org
GNU Lesser General Public License v3.0
47.78k stars 20.23k forks source link

TrieTimeLimit & debug.setTrieFlushInterval don't seem to flush with other intervals #26564

Closed anthonyoliai closed 1 year ago

anthonyoliai commented 1 year ago

System information

Geth version: see commit hash OS & Version: Ubuntu 20.04 Commit hash : df52967ff6080a27243569020ff64cd956fb8362 (post 1.10.26-stable)

Expected behaviour

Using the newly introduced TrieFlushInterval in the debug namespace, I expect more frequent flushes from the dirty cache to disk. However, after further investigation, even setting the value to 0 does not seem to write any state changes to disk. (Setting the value to "0s" essentially means going into archive mode if I understood correctly).

I expected this to lead to more persistency when the node gets killed for some reason (e.g. a crash due to OOM).

I would like to use this functionality for a private chain which doesn't have too much traffic, hence I was playing around with the cache.

Actual behaviour

Persistency is not there. I started my miner node from block 88 or so and simulated a lot of on-chain traffic. This was followed by killing the node after some time had past (an unclean shutdown).

I expected after starting up the node again that the state it starts on would be further than block 80, however, it started right from the beginning. This gives me the impression that no flushing actually occurs from the dirty cache.

Perhaps i'm just missing some context and misunderstanding the way the cache operates.

Steps to reproduce the behaviour

  1. Boot up a miner node
  2. Geth attach data/geth.ipc
  3. debug.setTrieFlushInterval("0s") -> essentially setting flush interval to be equal to a block processing time of 0
  4. Simulate any chain traffic, e.g:

    const sendEth = () => {
    for (let i = 0; i < 2000; i++ ) {
    eth.sendTransaction({
      from: '0x93ac780E5c9044b708218643462d58d575b4a293',
      to: '0x93ac780E5c9044b708218643462d58d575b4a293',
      value: web3.toWei(1, 'ether')
    });
    
    }
    }
    setInterval(sendEth,5000)
  5. Run for a few minutes
  6. ps ax | grep geth
  7. Kill the geth process -> kill -9 geth PID
  8. Restart node -> Starts back from the beginning, does not include any supposed to be "flushed" changes

Backtrace

INFO [01-27|08:42:16.104] Loaded most recent local header       number=97 hash=a1b906..3c032a td=195 age=7s
INFO [01-27|08:42:16.104] Loaded most recent local full block      number=97 hash=a1b906..3c032a td=195 age=7s
INFO [01-27|08:42:16.104] Loaded most recent local fast block      number=97 hash=a1b906..3c032a td=195 age=7s
WARN [01-27|08:42:16.104] Head state missing, repairing            number=97 hash=a1b906..3c032a snaproot=0a3308..1c1018
INFO [01-27|08:42:16.134] Loaded most recent local headertje       number=97 hash=a1b906..3c032a td=195 age=7s
INFO [01-27|08:42:16.134] Loaded most recent local full block      number=88 hash=2dfa42..bae539 td=177 age=52s
INFO [01-27|08:42:16.134] Loaded most recent local fast block      number=97 hash=a1b906..3c032a td=195 age=7s
WARN [01-27|08:42:16.144] Enabling snapshot recovery               chainhead=88 diskbase=88
INFO [01-27|08:42:16.207] Setting new local account                address=0x93Ac780E5c9044B708218643462d58D575B4a293
INFO [01-27|08:42:21.634] Loaded local transaction journal         transactions=58002 dropped=0
INFO [01-27|08:42:21.826] Regenerated local transaction journal    transactions=58002 accounts=1
INFO [01-27|08:42:21.826] Gasprice oracle is ignoring threshold set threshold=2
WARN [01-27|08:42:21.826] Unclean shutdown detected                booted=2023-01-27T07:55:50-0800 age=46m31s
WARN [01-27|08:42:21.826] Unclean shutdown detected                booted=2023-01-27T07:59:57-0800 age=42m24s
WARN [01-27|08:42:21.826] Unclean shutdown detected                booted=2023-01-27T08:00:09-0800 age=42m12s
WARN [01-27|08:42:21.826] Unclean shutdown detected                booted=2023-01-27T08:10:13-0800 age=32m8s
WARN [01-27|08:42:21.826] Unclean shutdown detected                booted=2023-01-27T08:15:41-0800 age=26m40s
WARN [01-27|08:42:21.826] Unclean shutdown detected                booted=2023-01-27T08:17:16-0800 age=25m5s
WARN [01-27|08:42:21.826] Unclean shutdown detected                booted=2023-01-27T08:31:15-0800 age=11m6s
WARN [01-27|08:42:21.826] Unclean shutdown detected                booted=2023-01-27T08:41:03-0800 age=1m18s
WARN [01-27|08:42:21.826] Engine API enabled                       protocol=eth
WARN [01-27|08:42:21.826] Engine API started but chain not configured for merge yet 
INFO [01-27|08:42:21.826] Starting peer-to-peer node               instance=Geth/v1.11.0-unstable-df52967f-20230127/linux-amd64/go1.19.5
INFO [01-27|08:42:21.829] IPC endpoint opened                      url=/home/temp/go-ethereum/build/bin/data/geth.ipc

Updated backtrace:

Logs clean start:

./geth --datadir data --miner.gaslimit 32000000 --unlock "0x93ac780E5c9044b708218643462d58d575b4a293" --password password.txt --networkid 9090 --nodiscover --syncmode full
INFO [01-30|02:25:05.630] Maximum peer count                       ETH=50 LES=0 total=50
INFO [01-30|02:25:05.632] Smartcard socket not found, disabling    err="stat /run/pcscd/pcscd.comm: no such file or directory"
INFO [01-30|02:25:05.634] Set global gas cap                       cap=50,000,000
INFO [01-30|02:25:05.635] Allocated trie memory caches             clean=154.00MiB dirty=256.00MiB
INFO [01-30|02:25:05.635] Allocated cache and file handles         database=/home/temp/go-ethereum/build/bin/data/geth/chaindata cache=512.00MiB handles=524,288
INFO [01-30|02:25:05.643] Opened ancient database                  database=/home/temp/go-ethereum/build/bin/data/geth/chaindata/ancient/chain readonly=false
INFO [01-30|02:25:05.643] Initialising Ethereum protocol           network=9090 dbversion=8
INFO [01-30|02:25:05.644]  
INFO [01-30|02:25:05.644] --------------------------------------------------------------------------------------------------------------------------------------------------------- 
INFO [01-30|02:25:05.644] Chain ID:  9090 (unknown) 
INFO [01-30|02:25:05.644] Consensus: Clique (proof-of-authority) 
INFO [01-30|02:25:05.644]  
INFO [01-30|02:25:05.644] Pre-Merge hard forks (block based): 
INFO [01-30|02:25:05.644]  - Homestead:                   #0        (https://github.com/ethereum/execution-specs/blob/master/network-upgrades/mainnet-upgrades/homestead.md) 
INFO [01-30|02:25:05.645]  - Tangerine Whistle (EIP 150): #0        (https://github.com/ethereum/execution-specs/blob/master/network-upgrades/mainnet-upgrades/tangerine-whistle.md) 
INFO [01-30|02:25:05.645]  - Spurious Dragon/1 (EIP 155): #0        (https://github.com/ethereum/execution-specs/blob/master/network-upgrades/mainnet-upgrades/spurious-dragon.md) 
INFO [01-30|02:25:05.645]  - Spurious Dragon/2 (EIP 158): #0        (https://github.com/ethereum/execution-specs/blob/master/network-upgrades/mainnet-upgrades/spurious-dragon.md) 
INFO [01-30|02:25:05.645]  - Byzantium:                   #0        (https://github.com/ethereum/execution-specs/blob/master/network-upgrades/mainnet-upgrades/byzantium.md) 
INFO [01-30|02:25:05.645]  - Constantinople:              #0        (https://github.com/ethereum/execution-specs/blob/master/network-upgrades/mainnet-upgrades/constantinople.md) 
INFO [01-30|02:25:05.645]  - Petersburg:                  #0        (https://github.com/ethereum/execution-specs/blob/master/network-upgrades/mainnet-upgrades/petersburg.md) 
INFO [01-30|02:25:05.645]  - Istanbul:                    #0        (https://github.com/ethereum/execution-specs/blob/master/network-upgrades/mainnet-upgrades/istanbul.md) 
INFO [01-30|02:25:05.645]  - Berlin:                      #0        (https://github.com/ethereum/execution-specs/blob/master/network-upgrades/mainnet-upgrades/berlin.md) 
INFO [01-30|02:25:05.645]  - London:                      #0        (https://github.com/ethereum/execution-specs/blob/master/network-upgrades/mainnet-upgrades/london.md) 
INFO [01-30|02:25:05.645]  
INFO [01-30|02:25:05.645] The Merge is not yet available for this network! 
INFO [01-30|02:25:05.645]  - Hard-fork specification: https://github.com/ethereum/execution-specs/blob/master/network-upgrades/mainnet-upgrades/paris.md 
INFO [01-30|02:25:05.645]  
INFO [01-30|02:25:05.645] Post-Merge hard forks (timestamp based): 
INFO [01-30|02:25:05.645]  
INFO [01-30|02:25:05.645] --------------------------------------------------------------------------------------------------------------------------------------------------------- 
INFO [01-30|02:25:05.645]  
INFO [01-30|02:25:05.646] Loaded most recent local header          number=92 hash=cb7fe6..48d979 td=185 age=44s
INFO [01-30|02:25:05.646] Loaded most recent local full block      number=92 hash=cb7fe6..48d979 td=185 age=44s
INFO [01-30|02:25:05.646] Loaded most recent local fast block      number=92 hash=cb7fe6..48d979 td=185 age=44s
WARN [01-30|02:25:05.652] Sanitizing invalid trie cache journal time provided=1s updated=1m0s
INFO [01-30|02:25:05.705] Setting new local account                address=0x93Ac780E5c9044B708218643462d58D575B4a293
INFO [01-30|02:25:05.709] Loaded local transaction journal         transactions=676 dropped=0
INFO [01-30|02:25:05.711] Regenerated local transaction journal    transactions=676 accounts=1
INFO [01-30|02:25:05.711] Gasprice oracle is ignoring threshold set threshold=2
WARN [01-30|02:25:05.713] Unclean shutdown detected                booted=2023-01-27T07:55:50-0800 age=2d18h29m
WARN [01-30|02:25:05.713] Unclean shutdown detected                booted=2023-01-27T07:59:57-0800 age=2d18h25m
WARN [01-30|02:25:05.713] Unclean shutdown detected                booted=2023-01-27T08:00:09-0800 age=2d18h24m
WARN [01-30|02:25:05.713] Unclean shutdown detected                booted=2023-01-27T08:10:13-0800 age=2d18h14m
WARN [01-30|02:25:05.713] Unclean shutdown detected                booted=2023-01-27T08:15:41-0800 age=2d18h9m
WARN [01-30|02:25:05.713] Unclean shutdown detected                booted=2023-01-27T08:17:16-0800 age=2d18h7m
WARN [01-30|02:25:05.713] Unclean shutdown detected                booted=2023-01-27T08:31:15-0800 age=2d17h53m
WARN [01-30|02:25:05.713] Unclean shutdown detected                booted=2023-01-27T08:41:03-0800 age=2d17h44m
WARN [01-30|02:25:05.713] Unclean shutdown detected                booted=2023-01-30T02:21:24-0800 age=3m41s
WARN [01-30|02:25:05.713] Engine API enabled                       protocol=eth
WARN [01-30|02:25:05.713] Engine API started but chain not configured for merge yet 
INFO [01-30|02:25:05.713] Starting peer-to-peer node               instance=Geth/v1.11.0-unstable-df52967f-20230127/linux-amd64/go1.19.5
INFO [01-30|02:25:05.720] New local node record                    seq=1,674,834,777,172 id=d39dba6c0201db86 ip=127.0.0.1 udp=0 tcp=30303
INFO [01-30|02:25:05.720] Started P2P networking                   self="<enode hidden>"
INFO [01-30|02:25:05.721] IPC endpoint opened                      url=/home/temp/go-ethereum/build/bin/data/geth.ipc
INFO [01-30|02:25:05.721] Loaded JWT secret file                   path=/home/temp/go-ethereum/build/bin/data/geth/jwtsecret crc32=0x5702711b
INFO [01-30|02:25:05.721] WebSocket enabled                        url=ws://127.0.0.1:8551
INFO [01-30|02:25:05.721] HTTP server started                      endpoint=127.0.0.1:8551 auth=true prefix= cors=localhost vhosts=localhost
INFO [01-30|02:25:06.277] Unlocked account                         address=0x93Ac780E5c9044B708218643462d58D575B4a293
INFO [01-30|02:26:01.371] Updated mining threads                   threads=1
INFO [01-30|02:26:01.371] Transaction pool price threshold updated price=1,000,000,000
INFO [01-30|02:26:01.371] Commit new sealing work                  number=93 sealhash=33ea58..805301 uncles=0 txs=0 gas=0 fees=0 elapsed="320.155µs"
INFO [01-30|02:26:01.371] Commit new sealing work                  number=93 sealhash=33ea58..805301 uncles=0 txs=0 gas=0 fees=0 elapsed="553.201µs"
INFO [01-30|02:26:01.372] Successfully sealed new block            number=93 sealhash=33ea58..805301 hash=b12400..694b15 elapsed="803.401µs"
INFO [01-30|02:26:01.372] 🔨 mined potential block                  number=93 hash=b12400..694b15
INFO [01-30|02:26:01.372] Commit new sealing work                  number=94 sealhash=c28ae2..8fdf3d uncles=0 txs=0 gas=0 fees=0 elapsed="316.638µs"
INFO [01-30|02:26:01.373] Commit new sealing work                  number=94 sealhash=c28ae2..8fdf3d uncles=0 txs=0 gas=0 fees=0 elapsed="622.418µs"
INFO [01-30|02:26:05.654] Writing clean trie cache to disk         path=/home/temp/go-ethereum/build/bin/data/geth/triecache threads=1
INFO [01-30|02:26:05.659] Persisted the clean trie cache           path=/home/temp/go-ethereum/build/bin/data/geth/triecache elapsed=4.846ms
INFO [01-30|02:26:06.001] Successfully sealed new block            number=94 sealhash=c28ae2..8fdf3d hash=88d8f7..a061cb elapsed=4.628s
INFO [01-30|02:26:06.001] 🔨 mined potential block                  number=94 hash=88d8f7..a061cb
INFO [01-30|02:26:06.002] Commit new sealing work                  number=95 sealhash=3885fd..90b41b uncles=0 txs=0 gas=0 fees=0 elapsed="436.662µs"
INFO [01-30|02:26:06.002] Commit new sealing work                  number=95 sealhash=3885fd..90b41b uncles=0 txs=0 gas=0 fees=0 elapsed="948.368µs"

Letting the chain rune for 200 blocks or so, then killing the node:

INFO [01-30|02:48:51.003] 🔨 mined potential block                  number=367 hash=9b0463..4b9d5e
INFO [01-30|02:48:51.004] Commit new sealing work                  number=368 sealhash=384555..4a0de3 uncles=0 txs=0 gas=0       fees=0        elapsed="887.779µs"
INFO [01-30|02:48:51.004] Commit new sealing work                  number=368 sealhash=384555..4a0de3 uncles=0 txs=0 gas=0       fees=0        elapsed=1.361ms
INFO [01-30|02:48:56.003] Successfully sealed new block            number=368 sealhash=384555..4a0de3 hash=01228e..bd7e59 elapsed=4.999s
INFO [01-30|02:48:56.003] 🔗 block reached canonical chain          number=361 hash=05b626..40f2e9
INFO [01-30|02:48:56.003] 🔨 mined potential block                  number=368 hash=01228e..bd7e59
INFO [01-30|02:48:56.004] Commit new sealing work                  number=369 sealhash=dfc190..49b707 uncles=0 txs=0 gas=0       fees=0        elapsed="781.666µs"
INFO [01-30|02:48:56.004] Commit new sealing work                  number=369 sealhash=dfc190..49b707 uncles=0 txs=0 gas=0       fees=0        elapsed=1.225ms
Killed

Logs after killing node and starting back up:

INFO [01-30|02:49:22.319] Maximum peer count                       ETH=50 LES=0 total=50
INFO [01-30|02:49:22.320] Smartcard socket not found, disabling    err="stat /run/pcscd/pcscd.comm: no such file or directory"
INFO [01-30|02:49:22.324] Set global gas cap                       cap=50,000,000
INFO [01-30|02:49:22.325] Allocated trie memory caches             clean=154.00MiB dirty=256.00MiB
INFO [01-30|02:49:22.325] Allocated cache and file handles         database=/home/temp/go-ethereum/build/bin/data/geth/chaindata cache=512.00MiB handles=524,288
INFO [01-30|02:49:22.337] Opened ancient database                  database=/home/temp/go-ethereum/build/bin/data/geth/chaindata/ancient/chain readonly=false
INFO [01-30|02:49:22.338] Initialising Ethereum protocol           network=9090 dbversion=8
INFO [01-30|02:49:22.340]  
INFO [01-30|02:49:22.340] --------------------------------------------------------------------------------------------------------------------------------------------------------- 
INFO [01-30|02:49:22.340] Chain ID:  9090 (unknown) 
INFO [01-30|02:49:22.340] Consensus: Clique (proof-of-authority) 
INFO [01-30|02:49:22.340]  
INFO [01-30|02:49:22.341] Pre-Merge hard forks (block based): 
INFO [01-30|02:49:22.341]  - Homestead:                   #0        (https://github.com/ethereum/execution-specs/blob/master/network-upgrades/mainnet-upgrades/homestead.md) 
INFO [01-30|02:49:22.341]  - Tangerine Whistle (EIP 150): #0        (https://github.com/ethereum/execution-specs/blob/master/network-upgrades/mainnet-upgrades/tangerine-whistle.md) 
INFO [01-30|02:49:22.341]  - Spurious Dragon/1 (EIP 155): #0        (https://github.com/ethereum/execution-specs/blob/master/network-upgrades/mainnet-upgrades/spurious-dragon.md) 
INFO [01-30|02:49:22.341]  - Spurious Dragon/2 (EIP 158): #0        (https://github.com/ethereum/execution-specs/blob/master/network-upgrades/mainnet-upgrades/spurious-dragon.md) 
INFO [01-30|02:49:22.341]  - Byzantium:                   #0        (https://github.com/ethereum/execution-specs/blob/master/network-upgrades/mainnet-upgrades/byzantium.md) 
INFO [01-30|02:49:22.341]  - Constantinople:              #0        (https://github.com/ethereum/execution-specs/blob/master/network-upgrades/mainnet-upgrades/constantinople.md) 
INFO [01-30|02:49:22.341]  - Petersburg:                  #0        (https://github.com/ethereum/execution-specs/blob/master/network-upgrades/mainnet-upgrades/petersburg.md) 
INFO [01-30|02:49:22.341]  - Istanbul:                    #0        (https://github.com/ethereum/execution-specs/blob/master/network-upgrades/mainnet-upgrades/istanbul.md) 
INFO [01-30|02:49:22.341]  - Berlin:                      #0        (https://github.com/ethereum/execution-specs/blob/master/network-upgrades/mainnet-upgrades/berlin.md) 
INFO [01-30|02:49:22.341]  - London:                      #0        (https://github.com/ethereum/execution-specs/blob/master/network-upgrades/mainnet-upgrades/london.md) 
INFO [01-30|02:49:22.341]  
INFO [01-30|02:49:22.341] The Merge is not yet available for this network! 
INFO [01-30|02:49:22.341]  - Hard-fork specification: https://github.com/ethereum/execution-specs/blob/master/network-upgrades/mainnet-upgrades/paris.md 
INFO [01-30|02:49:22.341]  
INFO [01-30|02:49:22.341] Post-Merge hard forks (timestamp based): 
INFO [01-30|02:49:22.341]  
INFO [01-30|02:49:22.341] --------------------------------------------------------------------------------------------------------------------------------------------------------- 
INFO [01-30|02:49:22.341]  
INFO [01-30|02:49:22.342] Loaded most recent local header          number=368 hash=01228e..bd7e59 td=737 age=26s
INFO [01-30|02:49:22.342] Loaded most recent local full block      number=368 hash=01228e..bd7e59 td=737 age=26s
INFO [01-30|02:49:22.344] Loaded most recent local fast block      number=368 hash=01228e..bd7e59 td=737 age=26s
WARN [01-30|02:49:22.344] Head state missing, repairing            number=368 hash=01228e..bd7e59 snaproot=0a3308..1c1018
INFO [01-30|02:49:22.359] Loaded most recent local headertje       number=368 hash=01228e..bd7e59 td=737 age=26s
INFO [01-30|02:49:22.359] Loaded most recent local full block      number=90  hash=be1765..195a87 td=181 age=25m11s
INFO [01-30|02:49:22.359] Loaded most recent local fast block      number=368 hash=01228e..bd7e59 td=737 age=26s
WARN [01-30|02:49:22.396] Enabling snapshot recovery               chainhead=90 diskbase=90
WARN [01-30|02:49:22.396] Snapshot is not continuous with chain    snaproot=d4a015..0f8f49 chainroot=0a3308..1c1018
WARN [01-30|02:49:22.396] Sanitizing invalid trie cache journal time provided=1s updated=1m0s
INFO [01-30|02:49:22.486] Setting new local account                address=0x93Ac780E5c9044B708218643462d58D575B4a293
INFO [01-30|02:49:22.493] Loaded local transaction journal         transactions=701 dropped=0
INFO [01-30|02:49:22.500] Regenerated local transaction journal    transactions=701 accounts=1
INFO [01-30|02:49:22.500] Gasprice oracle is ignoring threshold set threshold=2
WARN [01-30|02:49:22.500] Unclean shutdown detected                booted=2023-01-27T07:55:50-0800 age=2d18h53m
WARN [01-30|02:49:22.501] Unclean shutdown detected                booted=2023-01-27T07:59:57-0800 age=2d18h49m
WARN [01-30|02:49:22.501] Unclean shutdown detected                booted=2023-01-27T08:00:09-0800 age=2d18h49m
WARN [01-30|02:49:22.501] Unclean shutdown detected                booted=2023-01-27T08:10:13-0800 age=2d18h39m
WARN [01-30|02:49:22.501] Unclean shutdown detected                booted=2023-01-27T08:15:41-0800 age=2d18h33m
WARN [01-30|02:49:22.501] Unclean shutdown detected                booted=2023-01-27T08:17:16-0800 age=2d18h32m
WARN [01-30|02:49:22.501] Unclean shutdown detected                booted=2023-01-27T08:31:15-0800 age=2d18h18m
WARN [01-30|02:49:22.501] Unclean shutdown detected                booted=2023-01-27T08:41:03-0800 age=2d18h8m
WARN [01-30|02:49:22.501] Unclean shutdown detected                booted=2023-01-30T02:21:24-0800 age=27m58s
WARN [01-30|02:49:22.501] Unclean shutdown detected                booted=2023-01-30T02:45:05-0800 age=4m17s
WARN [01-30|02:49:22.501] Engine API enabled                       protocol=eth
WARN [01-30|02:49:22.501] Engine API started but chain not configured for merge yet 
INFO [01-30|02:49:22.501] Starting peer-to-peer node               instance=Geth/v1.11.0-unstable-df52967f-20230127/linux-amd64/go1.19.5
INFO [01-30|02:49:22.507] IPC endpoint opened                      url=/home/temp/go-ethereum/build/bin/data/geth.ipc
INFO [01-30|02:49:22.507] Loaded JWT secret file                   path=/home/temp/go-ethereum/build/bin/data/geth/jwtsecret crc32=0x5702711b
INFO [01-30|02:49:22.508] New local node record                    seq=1,674,834,777,173 id=d39dba6c0201db86 ip=127.0.0.1 udp=0 tcp=30303
INFO [01-30|02:49:22.508] Started P2P networking                   self="<enode hidden>"
INFO [01-30|02:49:22.508] WebSocket enabled                        url=ws://127.0.0.1:8551
INFO [01-30|02:49:22.508] HTTP server started                      endpoint=127.0.0.1:8551 auth=true prefix= cors=localhost vhosts=localhost
INFO [01-30|02:49:23.119] Unlocked account                         address=0x93Ac780E5c9044B708218643462d58D575B4a293
INFO [01-30|02:50:20.586] Updated mining threads                   threads=1
INFO [01-30|02:50:20.586] Transaction pool price threshold updated price=1,000,000,000
INFO [01-30|02:50:20.587] Commit new sealing work                  number=91  sealhash=f8acb2..d68b3c uncles=0 txs=0 gas=0 fees=0 elapsed="334.539µs"
INFO [01-30|02:50:20.587] Commit new sealing work                  number=91  sealhash=f8acb2..d68b3c uncles=0 txs=0 gas=0 fees=0 elapsed="870.516µs"
INFO [01-30|02:50:20.588] Successfully sealed new block            number=91  sealhash=f8acb2..d68b3c hash=f21587..ab2acf elapsed=1.050ms
INFO [01-30|02:50:20.588] 🔨 mined potential block                  number=91  hash=f21587..ab2acf
INFO [01-30|02:50:20.588] Commit new sealing work                  number=92  sealhash=076491..58c1a9 uncles=0 txs=0 gas=0 fees=0 elapsed="296.315µs"
INFO [01-30|02:50:20.589] Commit new sealing work                  number=92  sealhash=076491..58c1a9 uncles=0 txs=0 gas=0 fees=0 elapsed="568.722µs"
INFO [01-30|02:50:22.397] Writing clean trie cache to disk         path=/home/temp/go-ethereum/build/bin/data/geth/triecache threads=1
INFO [01-30|02:50:22.400] Persisted the clean trie cache           path=/home/temp/go-ethereum/build/bin/data/geth/triecache elapsed=2.821ms
INFO [01-30|02:50:25.000] Successfully sealed new block            number=92  sealhash=076491..58c1a9 hash=60ab9f..958f39 elapsed=4.412s
INFO [01-30|02:50:25.001] 🔨 mined potential block                  number=92  hash=60ab9f..958f39
INFO [01-30|02:50:25.001] Commit new sealing work                  number=93  sealhash=3c98bb..8dd224 uncles=0 txs=0 gas=0 fees=0 elapsed="498.189µs"
INFO [01-30|02:50:25.002] Commit new sealing work                  number=93  sealhash=3c98bb..8dd224 uncles=0 txs=0 gas=0 fees=0 elapsed=1.057ms
rjl493456442 commented 1 year ago

https://github.com/ethereum/go-ethereum/blob/df52967ff6080a27243569020ff64cd956fb8362/core/blockchain.go#L1354

// Flush limits are not considered for the first TriesInMemory(128) blocks.

Since your private chain is very short, hits this special case though.

holiman commented 1 year ago

Nice catch @rjl493456442 ! Closing this

anthonyoliai commented 1 year ago

https://github.com/ethereum/go-ethereum/blob/df52967ff6080a27243569020ff64cd956fb8362/core/blockchain.go#L1354

// Flush limits are not considered for the first TriesInMemory(128) blocks.

Since your private chain is very short, hits this special case though.

@holiman @rjl493456442 I'll submit some more logs today, but this behaviour also seems the case when TriesInMemory > 128 blocks. I let the chain run for around 1000 blocks before killing it.

holiman commented 1 year ago

Yeah please add logs of running it. IIUC this node is mining the blocks, not "importing" through the regular api. I am not quite sure how that affects/interacts with the trie cache.

anthonyoliai commented 1 year ago

Yeah please add logs of running it. IIUC this node is mining the blocks, not "importing" through the regular api. I am not quite sure how that affects/interacts with the trie cache.

@holiman Great, I added some additional logs that might deem helpful.

Interesting enough I do see that the full block is up to date, followed by a head state missing error, which enables snapshot recovery

anthonyoliai commented 1 year ago

Yeah please add logs of running it. IIUC this node is mining the blocks, not "importing" through the regular api. I am not quite sure how that affects/interacts with the trie cache. @holiman I do see that the following condition: https://github.com/ethereum/go-ethereum/blob/df52967ff6080a27243569020ff64cd956fb8362/core/blockchain.go#L1369

does not get hit, because we are not importing any blocks, so potentially a flush does not occur due to this condition not being met

I've been digging a bit on the codebase, and your reply to my issues got me thinking. Is the block processing time (bc.gcproc) only incremented when blocks are imported? Because if that's the case it makes sense why my cache isn't flushing There's this if bc.gcproc > flushInterval check, meaning that if I have a node running which doesn't import blocks, bc.gcproc is always 0 (i think?) so even a flushinterval of 0 which would technically make the node an archive node doesn't fire off any flushing

s1na commented 1 year ago

Posting additional info from OP:

This is what happens when the node is started again:

image

anthonyoliai commented 1 year ago

To give some context, my setup now consists of two sealer nodes which I peer up with eachother. I've set the TrieTimeLimit to be 500ms -> meaning that 500ms of block processing time should lead to a flushed cache.

I logged the block processing time (gc.proc) as can be seen here. image (Sorry for the pics, these are old logs I only had stored as pics)

I've been running it just now once again, trying to get gcproc up, and got the following log:

INFO [02-02|01:40:57.036] Imported new chain segment               number=5444 hash=926acc..bc89c6 blocks=1   txs=412   mgas=8.652   elapsed=26.975ms    mgasps=320.740 dirty=30.00KiB
========================= NODE RELATED MEMORY INFORMATION ==================================
NODES SIZE 30.47 KiB, IMGS size 0.00 B
LIMIT: 256.00 MiB
=========================  ==================================
===============================GC PROCS = ============================================================================
498.501534ms
INFO [02-02|01:40:58.040] Imported new chain segment               number=5445 hash=172a0b..ea6116 blocks=1   txs=406   mgas=8.526   elapsed=34.413ms    mgasps=247.749 dirty=30.47KiB
========================= NODE RELATED MEMORY INFORMATION ==================================
NODES SIZE 30.94 KiB, IMGS size 0.00 B
LIMIT: 256.00 MiB
=========================  ==================================
===============================GC PROCS = ============================================================================
527.153921ms
INFO [02-02|01:40:59.045] Persisted trie from memory database      nodes=0 size=0.00B   time="3.228µs"   gcnodes=0   gcsize=0.00B    gctime="19.603µs"  livenodes=133 livesize=18.56KiB
LAST BLOCK FLUSHED TO DB:
5318

However, when I restart it doesn't start at 5318.

holiman commented 1 year ago

The problem seems to be that the unclean shutdown causes the in-memory snapshot layers to be lost, and we only have the diskbased layer left. This layer is at least 128 blocks old, but most likely a lot more than that.

What geth does, is go back from that block to a point where it has state: this means it won't have to regenerate the snapshot layer, once it reaches that block again, it can just pick up the snapshot layer again.

But yeah, this breaks the intended archive-behaviour. You'd have to disable snapshots in order to get around this behaviour.

anthonyoliai commented 1 year ago

The problem seems to be that the unclean shutdown causes the in-memory snapshot layers to be lost, and we only have the diskbased layer left. This layer is at least 128 blocks old, but most likely a lot more than that.

What geth does, is go back from that block to a point where it has state: this means it won't have to regenerate the snapshot layer, once it reaches that block again, it can just pick up the snapshot layer again.

But yeah, this breaks the intended archive-behaviour. You'd have to disable snapshots in order to get around this behaviour.

Thank you! This is still all new to me. This layer is at least 128 blocks old, but most likely a lot more than that. which layer do you mean here? Do you mean the diskbased layer? So would that mean that we wind back from this last block in the diskbased layer to the last known "in-memory" snapshot layer?

Regarding turning snapshot off, using --snapshot=false.

I assume turning snapshot to false is basically the same thing as just having the node run as an archive node? Hence defeating the purpose of trying to make a full node "more archivey" but not "a complete archive node"

holiman commented 1 year ago

This layer is at least 128 blocks old, but most likely a lot more than that.

which layer do you mean here?

The snapshots are in layers. At the bottom, the disk-based layer, representing the (flat) state at some block. Then comes the first memory-layer, which is a merged layer representing N blocks, and flushed to disk from time to time. Then comes 127 more memory layers, each representing one block. SO, in memory, there are layers representing at least 128 layers.

Regarding turning snapshot off, using --snapshot=false.

Yes

I assume turning snapshot to false is basically the same thing as just having the node run as an archive node?

That is not correct. The snapshots are "orthogonal" to trie gcmode, or, in other words: one thing has nothing to do with the other.

Hence defeating the purpose of trying to make a full node "more archivey" but not "a complete archive node"

Although, I don't see why you don't just run archive mode anyway.

anthonyoliai commented 1 year ago

This layer is at least 128 blocks old, but most likely a lot more than that.

which layer do you mean here?

The snapshots are in layers. At the bottom, the disk-based layer, representing the (flat) state at some block. Then comes the first memory-layer, which is a merged layer representing N blocks, and flushed to disk from time to time. Then comes 127 more memory layers, each representing one block. SO, in memory, there are layers representing at least 128 layers.

Regarding turning snapshot off, using --snapshot=false.

Yes

I assume turning snapshot to false is basically the same thing as just having the node run as an archive node?

That is not correct. The snapshots are "orthogonal" to trie gcmode, or, in other words: one thing has nothing to do with the other.

Hence defeating the purpose of trying to make a full node "more archivey" but not "a complete archive node"

Although, I don't see why you don't just run archive mode anyway.

Great, thank you, it's much clearer now. Regarding why I don't want to just run in archive mode anyway is because I didn't want the additional storage it brings compared to a regular full node. Hence I was looking for something "inbetween" more frequent persisting of states from memory to disk, but not for every block which if I understood an archive node does(?)

What i'll try to do is run two clean nodes, peer them up, leave snapshot off, and come back with some logs in case the issue still persists.

anthonyoliai commented 1 year ago

@holiman @s1na

I set up two new nodes, with a fresh data dir.

Command I ran was NODE 1 ./geth --datadir data --miner.gaslimit 32000000 --unlock 0x31dD68fad2F92e545d7bfDaCF69Af652b5feA95D --password password.txt --networkid 9090 --nodiscover --port 30304 --authrpc.port 8552 --snapshot=false --syncmode=full

and NODE 2./geth --datadir data --miner.gaslimit 32000000 --unlock d7d5d4f41e648ff45bd11523a0e948c82069778b --password password.txt --networkid 9090 --nodiscover --snapshot=false --syncmode=full

I did an admin.addPeer() on node 2's console to peer up with node 1 Started submitting TXs at a rate of 10K per 5 s -> const sendEth = () => {for (let i = 0; i < 10000; i++ ) {eth.sendTransaction({from: '0xd7d5d5F41E648Ff45bD11593A0E948D82069778B',to: '0xd7d5d5F41E648Ff45bD11593A0E948D82069778B',value: web3.toWei(1, 'ether')});}}

Once the block number reaches 128 we start flushing to cache and taking gc.proc into consideration:

INFO [02-02|03:07:04.037] Imported new chain segment               number=129 hash=effd7a..2ca779 blocks=1 txs=346 mgas=7.266 elapsed=28.964ms    mgasps=250.858 dirty=27.32KiB
========================= NODE RELATED MEMORY INFORMATION ==================================
NODES SIZE 27.58 KiB, IMGS size 0.00 B
LIMIT: 256.00 MiB
=========================  ==================================
===============================GC PROCS = ============================================================================
26.22769ms
INFO [02-02|03:07:05.055] Imported new chain segment               number=130 hash=ae19cf..113713 blocks=1 txs=302 mgas=6.342 elapsed=32.726ms    mgasps=193.790 dirty=27.58KiB
========================= NODE RELATED MEMORY INFORMATION ==================================
NODES SIZE 27.84 KiB, IMGS size 0.00 B
LIMIT: 256.00 MiB

I allow some more blocks and flushes to come in , and at block 282 i decided to kill the node with kill -9 <PID geth node>

===============================GC PROCS = ============================================================================
498.296044ms
INFO [02-02|03:09:37.072] Imported new chain segment               number=282 hash=958410..3b7a92 blocks=1 txs=369 mgas=7.749 elapsed=38.114ms    mgasps=203.309 dirty=30.51KiB
========================= NODE RELATED MEMORY INFORMATION ==================================
NODES SIZE 30.78 KiB, IMGS size 0.00 B
LIMIT: 256.00 MiB
=========================  ==================================
===============================GC PROCS = ============================================================================
525.140719ms
INFO [02-02|03:09:38.037] Persisted trie from memory database      nodes=1 size=173.00B time="135.776µs"  gcnodes=19 gcsize=3.21KiB gctime="55.365µs"  livenodes=117 livesize=19.64KiB
LAST BLOCK FLUSHED TO DB:
155

Logs show that the last block flushed to the db is 155 ^

Finally, I restart the node after killing it.

INFO [02-02|03:11:01.993]  
INFO [02-02|03:11:01.993] Loaded most recent local header          number=286 hash=23f6a5..6f7273 td=573 age=1m20s
INFO [02-02|03:11:01.993] Loaded most recent local full block      number=286 hash=23f6a5..6f7273 td=573 age=1m20s
INFO [02-02|03:11:01.993] Loaded most recent local fast block      number=286 hash=23f6a5..6f7273 td=573 age=1m20s
WARN [02-02|03:11:01.994] Head state missing, repairing            number=286 hash=23f6a5..6f7273
INFO [02-02|03:11:02.072] Loaded most recent local header          number=286 hash=23f6a5..6f7273 td=573 age=1m21s
INFO [02-02|03:11:02.072] Loaded most recent local full block      number=155 hash=93d037..6b3a98 td=311 age=3m32s
INFO [02-02|03:11:02.072] Loaded most recent local fast block      number=286 hash=23f6a5..6f7273 td=573 age=1m21s
INFO [02-02|03:11:02.092] Loaded local transaction journal         transactions=0 dropped=0
INFO [02-02|03:11:02.092] Regenerated local transaction journal    transactions=0 accounts=0
INFO [02-02|03:11:02.092] Gasprice oracle is ignoring threshold set threshold=2
WARN [02-02|03:11:02.092] Unclean shutdown detected                booted=2023-02-02T03:08:07-0800 age=2m55s
WARN [02-02|03:11:02.092] Engine API enabled                       protocol=eth
WARN [02-02|03:11:02.092] Engine API started but chain not configured for merge yet 
INFO [02-02|03:11:02.093] Starting peer-to-peer node               instance=Geth/v1.11.0-unstable-df52967f-20230127/linux-amd64/go1.19.5
INFO [02-02|03:11:02.100] IPC endpoint opened                      url=/home/temp/clean_gethv2/build/bin/data/geth.ipc

And behold, the node restarts at block 155. Now I assume that this is intended behaviour, as in, if we are at block N and a flush occurs, we flush the state up to N-128.

Now my question is... is this a good "middle ground" for running a node that is not fully an archive node? I simply just don't want my storage to get too full by having it run as an archive node.

anthonyoliai commented 1 year ago

I did a final run in which I let the node restart at block 155 and sync up with the other node that was still running. It synced all the way up to block 435, and did a cache flush on block 225. However, when I now killed the node it restarted back at 155, instead of the new flushed block 225. My speculation is that its because block 155 to 225 was imported, somehow this didn't get persisted in storage.

Here it's supposed to have most recent local full block at 225, not 155. Somehow syncing and then flushing messes it up

INFO [02-02|03:19:19.363] Loaded most recent local header          number=436 hash=5bab28..c11aa5 td=873 age=20s
INFO [02-02|03:19:19.363] Loaded most recent local full block      number=436 hash=5bab28..c11aa5 td=873 age=20s
INFO [02-02|03:19:19.363] Loaded most recent local fast block      number=436 hash=5bab28..c11aa5 td=873 age=20s
WARN [02-02|03:19:19.363] Head state missing, repairing            number=436 hash=5bab28..c11aa5
INFO [02-02|03:19:19.482] Loaded most recent local header          number=436 hash=5bab28..c11aa5 td=873 age=20s
INFO [02-02|03:19:19.482] Loaded most recent local full block      number=155 hash=93d037..6b3a98 td=311 age=11m49s
INFO [02-02|03:19:19.482] Loaded most recent local fast block      number=436 hash=5bab28..c11aa5 td=873 age=20s
rjl493456442 commented 1 year ago

As a summary, if you explicitly disable snapshot, the in-memory states will only be flushed out if

is this a good "middle ground" for running a node that is not fully an archive node?

Unfortunately we don't have a better solution right now. Setting a reasonable flush interval should be a good approach at this point, in the future we will build another mode of archive node which is smaller than the existing one and easier to retain historical states, but it needs time to rollout.

Re your case and did a cache flush on block 225., somehow this didn't get persisted in storage are you sure the state(255) is really flushed out? You can search log Persisted trie from memory database to see if the flush is indeed happened.

Close it which I believe the behavior is correct. Feel free to reopen if it still doesn't work.

anthonyoliai commented 1 year ago

As a summary, if you explicitly disable snapshot, the in-memory states will only be flushed out if

  • It has at least 128 blocks confirmation on top
  • The accumulated processing time reaches the limit you configured There are only these two conditions to determine if flush should happen

is this a good "middle ground" for running a node that is not fully an archive node?

Unfortunately we don't have a better solution right now. Setting a reasonable flush interval should be a good approach at this point, in the future we will build another mode of archive node which is smaller than the existing one and easier to retain historical states, but it needs time to rollout.

Re your case and did a cache flush on block 225., somehow this didn't get persisted in storage are you sure the state(255) is really flushed out? You can search log Persisted trie from memory database to see if the flush is indeed happened.

Close it which I believe the behavior is correct. Feel free to reopen if it still doesn't work.

So if I understand correctly we will flush regardless of what I set as an interval if snapshot is disabled? My only question would be then how this differs storage-size wise compared to something like an archive node. Thanks!

rjl493456442 commented 1 year ago

So if I understand correctly we will flush regardless of what I set as an interval if snapshot is disabled?

No, snapshot is another thing. Disable snapshot just simplifies the situation.

My only question would be then how this differs storage-size wise compared to something like an archive node.

Archive node persists all the states block by block. The flush frequency depends on your setting. It depends on two things:

If your network has lots of traffic(block is very full), the process time will be longer. The frequency will be higher.

You have to run it and get some numbers by yourselves,

anthonyoliai commented 1 year ago

So if I understand correctly we will flush regardless of what I set as an interval if snapshot is disabled?

No, snapshot is another thing. Disable snapshot just simplifies the situation.

My only question would be then how this differs storage-size wise compared to something like an archive node.

Archive node persists all the states block by block. The flush frequency depends on your setting. It depends on two things:

  • the average time to process a block
  • the interval configured by you

If your network has lots of traffic(block is very full), the process time will be longer. The frequency will be higher.

You have to run it and get some numbers by yourselves,

Gotcha! So, I guess on a POA network which doesn't have many txs per block it would take quite some time for the flush interval to equal the total block processing time; as the process block time is very low.

So to iterate, running as a full node with snapshot disabled and a relatively low flush interval means more flushes; however is a much more attractive solution storage-growth wise compared to running a full archive node.

Thus, creating a "more archivey" full node which is more crash resistant than a regular full node with the default flush interval of say 1h.