Closed anthonyoliai closed 1 year ago
// Flush limits are not considered for the first TriesInMemory(128) blocks.
Since your private chain is very short, hits this special case though.
Nice catch @rjl493456442 ! Closing this
// Flush limits are not considered for the first TriesInMemory(128) blocks.
Since your private chain is very short, hits this special case though.
@holiman @rjl493456442 I'll submit some more logs today, but this behaviour also seems the case when TriesInMemory > 128 blocks. I let the chain run for around 1000 blocks before killing it.
Yeah please add logs of running it. IIUC this node is mining the blocks, not "importing" through the regular api. I am not quite sure how that affects/interacts with the trie cache.
Yeah please add logs of running it. IIUC this node is mining the blocks, not "importing" through the regular api. I am not quite sure how that affects/interacts with the trie cache.
@holiman Great, I added some additional logs that might deem helpful.
Interesting enough I do see that the full block is up to date, followed by a head state missing error, which enables snapshot recovery
Yeah please add logs of running it. IIUC this node is mining the blocks, not "importing" through the regular api. I am not quite sure how that affects/interacts with the trie cache. @holiman I do see that the following condition: https://github.com/ethereum/go-ethereum/blob/df52967ff6080a27243569020ff64cd956fb8362/core/blockchain.go#L1369
does not get hit, because we are not importing any blocks, so potentially a flush does not occur due to this condition not being met
I've been digging a bit on the codebase, and your reply to my issues got me thinking. Is the block processing time (bc.gcproc) only incremented when blocks are imported? Because if that's the case it makes sense why my cache isn't flushing There's this if bc.gcproc > flushInterval check, meaning that if I have a node running which doesn't import blocks, bc.gcproc is always 0 (i think?) so even a flushinterval of 0 which would technically make the node an archive node doesn't fire off any flushing
Posting additional info from OP:
This is what happens when the node is started again:
To give some context, my setup now consists of two sealer nodes which I peer up with eachother. I've set the TrieTimeLimit to be 500ms -> meaning that 500ms of block processing time should lead to a flushed cache.
I logged the block processing time (gc.proc) as can be seen here. (Sorry for the pics, these are old logs I only had stored as pics)
I've been running it just now once again, trying to get gcproc up, and got the following log:
INFO [02-02|01:40:57.036] Imported new chain segment number=5444 hash=926acc..bc89c6 blocks=1 txs=412 mgas=8.652 elapsed=26.975ms mgasps=320.740 dirty=30.00KiB
========================= NODE RELATED MEMORY INFORMATION ==================================
NODES SIZE 30.47 KiB, IMGS size 0.00 B
LIMIT: 256.00 MiB
========================= ==================================
===============================GC PROCS = ============================================================================
498.501534ms
INFO [02-02|01:40:58.040] Imported new chain segment number=5445 hash=172a0b..ea6116 blocks=1 txs=406 mgas=8.526 elapsed=34.413ms mgasps=247.749 dirty=30.47KiB
========================= NODE RELATED MEMORY INFORMATION ==================================
NODES SIZE 30.94 KiB, IMGS size 0.00 B
LIMIT: 256.00 MiB
========================= ==================================
===============================GC PROCS = ============================================================================
527.153921ms
INFO [02-02|01:40:59.045] Persisted trie from memory database nodes=0 size=0.00B time="3.228µs" gcnodes=0 gcsize=0.00B gctime="19.603µs" livenodes=133 livesize=18.56KiB
LAST BLOCK FLUSHED TO DB:
5318
However, when I restart it doesn't start at 5318.
The problem seems to be that the unclean shutdown causes the in-memory snapshot layers to be lost, and we only have the diskbased layer left. This layer is at least 128 blocks old, but most likely a lot more than that.
What geth does, is go back from that block to a point where it has state: this means it won't have to regenerate the snapshot layer, once it reaches that block again, it can just pick up the snapshot layer again.
But yeah, this breaks the intended archive-behaviour. You'd have to disable snapshots in order to get around this behaviour.
The problem seems to be that the unclean shutdown causes the in-memory snapshot layers to be lost, and we only have the diskbased layer left. This layer is at least 128 blocks old, but most likely a lot more than that.
What geth does, is go back from that block to a point where it has state: this means it won't have to regenerate the snapshot layer, once it reaches that block again, it can just pick up the snapshot layer again.
But yeah, this breaks the intended archive-behaviour. You'd have to disable snapshots in order to get around this behaviour.
Thank you! This is still all new to me. This layer is at least 128 blocks old, but most likely a lot more than that.
which layer do you mean here? Do you mean the diskbased layer? So would that mean that we wind back from this last block in the diskbased layer to the last known "in-memory" snapshot layer?
Regarding turning snapshot off, using --snapshot=false
.
I assume turning snapshot to false is basically the same thing as just having the node run as an archive node? Hence defeating the purpose of trying to make a full node "more archivey" but not "a complete archive node"
This layer is at least 128 blocks old, but most likely a lot more than that.
which layer do you mean here?
The snapshots are in layers. At the bottom, the disk-based layer, representing the (flat) state at some block. Then comes the first memory-layer, which is a merged layer representing N
blocks, and flushed to disk from time to time. Then comes 127
more memory layers, each representing one block. SO, in memory, there are layers representing at least 128
layers.
Regarding turning snapshot off, using --snapshot=false.
Yes
I assume turning snapshot to false is basically the same thing as just having the node run as an archive node?
That is not correct. The snapshots are "orthogonal" to trie gcmode, or, in other words: one thing has nothing to do with the other.
Hence defeating the purpose of trying to make a full node "more archivey" but not "a complete archive node"
Although, I don't see why you don't just run archive
mode anyway.
This layer is at least 128 blocks old, but most likely a lot more than that.
which layer do you mean here?
The snapshots are in layers. At the bottom, the disk-based layer, representing the (flat) state at some block. Then comes the first memory-layer, which is a merged layer representing
N
blocks, and flushed to disk from time to time. Then comes127
more memory layers, each representing one block. SO, in memory, there are layers representing at least128
layers.Regarding turning snapshot off, using --snapshot=false.
Yes
I assume turning snapshot to false is basically the same thing as just having the node run as an archive node?
That is not correct. The snapshots are "orthogonal" to trie gcmode, or, in other words: one thing has nothing to do with the other.
Hence defeating the purpose of trying to make a full node "more archivey" but not "a complete archive node"
Although, I don't see why you don't just run
archive
mode anyway.
Great, thank you, it's much clearer now. Regarding why I don't want to just run in archive mode anyway is because I didn't want the additional storage it brings compared to a regular full node. Hence I was looking for something "inbetween" more frequent persisting of states from memory to disk, but not for every block which if I understood an archive node does(?)
What i'll try to do is run two clean nodes, peer them up, leave snapshot off, and come back with some logs in case the issue still persists.
@holiman @s1na
I set up two new nodes, with a fresh data dir.
Command I ran was
NODE 1 ./geth --datadir data --miner.gaslimit 32000000 --unlock 0x31dD68fad2F92e545d7bfDaCF69Af652b5feA95D --password password.txt --networkid 9090 --nodiscover --port 30304 --authrpc.port 8552 --snapshot=false --syncmode=full
and
NODE 2./geth --datadir data --miner.gaslimit 32000000 --unlock d7d5d4f41e648ff45bd11523a0e948c82069778b --password password.txt --networkid 9090 --nodiscover --snapshot=false --syncmode=full
I did an admin.addPeer() on node 2's console to peer up with node 1
Started submitting TXs at a rate of 10K per 5 s ->
const sendEth = () => {for (let i = 0; i < 10000; i++ ) {eth.sendTransaction({from: '0xd7d5d5F41E648Ff45bD11593A0E948D82069778B',to: '0xd7d5d5F41E648Ff45bD11593A0E948D82069778B',value: web3.toWei(1, 'ether')});}}
Once the block number reaches 128 we start flushing to cache and taking gc.proc into consideration:
INFO [02-02|03:07:04.037] Imported new chain segment number=129 hash=effd7a..2ca779 blocks=1 txs=346 mgas=7.266 elapsed=28.964ms mgasps=250.858 dirty=27.32KiB
========================= NODE RELATED MEMORY INFORMATION ==================================
NODES SIZE 27.58 KiB, IMGS size 0.00 B
LIMIT: 256.00 MiB
========================= ==================================
===============================GC PROCS = ============================================================================
26.22769ms
INFO [02-02|03:07:05.055] Imported new chain segment number=130 hash=ae19cf..113713 blocks=1 txs=302 mgas=6.342 elapsed=32.726ms mgasps=193.790 dirty=27.58KiB
========================= NODE RELATED MEMORY INFORMATION ==================================
NODES SIZE 27.84 KiB, IMGS size 0.00 B
LIMIT: 256.00 MiB
I allow some more blocks and flushes to come in , and at block 282 i decided to kill the node with kill -9 <PID geth node>
===============================GC PROCS = ============================================================================
498.296044ms
INFO [02-02|03:09:37.072] Imported new chain segment number=282 hash=958410..3b7a92 blocks=1 txs=369 mgas=7.749 elapsed=38.114ms mgasps=203.309 dirty=30.51KiB
========================= NODE RELATED MEMORY INFORMATION ==================================
NODES SIZE 30.78 KiB, IMGS size 0.00 B
LIMIT: 256.00 MiB
========================= ==================================
===============================GC PROCS = ============================================================================
525.140719ms
INFO [02-02|03:09:38.037] Persisted trie from memory database nodes=1 size=173.00B time="135.776µs" gcnodes=19 gcsize=3.21KiB gctime="55.365µs" livenodes=117 livesize=19.64KiB
LAST BLOCK FLUSHED TO DB:
155
Logs show that the last block flushed to the db is 155 ^
Finally, I restart the node after killing it.
INFO [02-02|03:11:01.993]
INFO [02-02|03:11:01.993] Loaded most recent local header number=286 hash=23f6a5..6f7273 td=573 age=1m20s
INFO [02-02|03:11:01.993] Loaded most recent local full block number=286 hash=23f6a5..6f7273 td=573 age=1m20s
INFO [02-02|03:11:01.993] Loaded most recent local fast block number=286 hash=23f6a5..6f7273 td=573 age=1m20s
WARN [02-02|03:11:01.994] Head state missing, repairing number=286 hash=23f6a5..6f7273
INFO [02-02|03:11:02.072] Loaded most recent local header number=286 hash=23f6a5..6f7273 td=573 age=1m21s
INFO [02-02|03:11:02.072] Loaded most recent local full block number=155 hash=93d037..6b3a98 td=311 age=3m32s
INFO [02-02|03:11:02.072] Loaded most recent local fast block number=286 hash=23f6a5..6f7273 td=573 age=1m21s
INFO [02-02|03:11:02.092] Loaded local transaction journal transactions=0 dropped=0
INFO [02-02|03:11:02.092] Regenerated local transaction journal transactions=0 accounts=0
INFO [02-02|03:11:02.092] Gasprice oracle is ignoring threshold set threshold=2
WARN [02-02|03:11:02.092] Unclean shutdown detected booted=2023-02-02T03:08:07-0800 age=2m55s
WARN [02-02|03:11:02.092] Engine API enabled protocol=eth
WARN [02-02|03:11:02.092] Engine API started but chain not configured for merge yet
INFO [02-02|03:11:02.093] Starting peer-to-peer node instance=Geth/v1.11.0-unstable-df52967f-20230127/linux-amd64/go1.19.5
INFO [02-02|03:11:02.100] IPC endpoint opened url=/home/temp/clean_gethv2/build/bin/data/geth.ipc
And behold, the node restarts at block 155. Now I assume that this is intended behaviour, as in, if we are at block N and a flush occurs, we flush the state up to N-128.
Now my question is... is this a good "middle ground" for running a node that is not fully an archive node? I simply just don't want my storage to get too full by having it run as an archive node.
I did a final run in which I let the node restart at block 155 and sync up with the other node that was still running. It synced all the way up to block 435, and did a cache flush on block 225. However, when I now killed the node it restarted back at 155, instead of the new flushed block 225. My speculation is that its because block 155 to 225 was imported, somehow this didn't get persisted in storage.
Here it's supposed to have most recent local full block at 225, not 155. Somehow syncing and then flushing messes it up
INFO [02-02|03:19:19.363] Loaded most recent local header number=436 hash=5bab28..c11aa5 td=873 age=20s
INFO [02-02|03:19:19.363] Loaded most recent local full block number=436 hash=5bab28..c11aa5 td=873 age=20s
INFO [02-02|03:19:19.363] Loaded most recent local fast block number=436 hash=5bab28..c11aa5 td=873 age=20s
WARN [02-02|03:19:19.363] Head state missing, repairing number=436 hash=5bab28..c11aa5
INFO [02-02|03:19:19.482] Loaded most recent local header number=436 hash=5bab28..c11aa5 td=873 age=20s
INFO [02-02|03:19:19.482] Loaded most recent local full block number=155 hash=93d037..6b3a98 td=311 age=11m49s
INFO [02-02|03:19:19.482] Loaded most recent local fast block number=436 hash=5bab28..c11aa5 td=873 age=20s
As a summary, if you explicitly disable snapshot, the in-memory states will only be flushed out if
is this a good "middle ground" for running a node that is not fully an archive node?
Unfortunately we don't have a better solution right now. Setting a reasonable flush interval should be a good approach at this point, in the future we will build another mode of archive node which is smaller than the existing one and easier to retain historical states, but it needs time to rollout.
Re your case and did a cache flush on block 225.
, somehow this didn't get persisted in storage
are you sure the state(255) is really flushed out? You can search log Persisted trie from memory database
to see if the flush is indeed happened.
Close it which I believe the behavior is correct. Feel free to reopen if it still doesn't work.
As a summary, if you explicitly disable snapshot, the in-memory states will only be flushed out if
- It has at least 128 blocks confirmation on top
- The accumulated processing time reaches the limit you configured There are only these two conditions to determine if flush should happen
is this a good "middle ground" for running a node that is not fully an archive node?
Unfortunately we don't have a better solution right now. Setting a reasonable flush interval should be a good approach at this point, in the future we will build another mode of archive node which is smaller than the existing one and easier to retain historical states, but it needs time to rollout.
Re your case
and did a cache flush on block 225.
,somehow this didn't get persisted in storage
are you sure the state(255) is really flushed out? You can search logPersisted trie from memory database
to see if the flush is indeed happened.Close it which I believe the behavior is correct. Feel free to reopen if it still doesn't work.
So if I understand correctly we will flush regardless of what I set as an interval if snapshot is disabled? My only question would be then how this differs storage-size wise compared to something like an archive node. Thanks!
So if I understand correctly we will flush regardless of what I set as an interval if snapshot is disabled?
No, snapshot is another thing. Disable snapshot just simplifies the situation.
My only question would be then how this differs storage-size wise compared to something like an archive node.
Archive node persists all the states block by block. The flush frequency depends on your setting. It depends on two things:
If your network has lots of traffic(block is very full), the process time will be longer. The frequency will be higher.
You have to run it and get some numbers by yourselves,
So if I understand correctly we will flush regardless of what I set as an interval if snapshot is disabled?
No, snapshot is another thing. Disable snapshot just simplifies the situation.
My only question would be then how this differs storage-size wise compared to something like an archive node.
Archive node persists all the states block by block. The flush frequency depends on your setting. It depends on two things:
- the average time to process a block
- the interval configured by you
If your network has lots of traffic(block is very full), the process time will be longer. The frequency will be higher.
You have to run it and get some numbers by yourselves,
Gotcha! So, I guess on a POA network which doesn't have many txs per block it would take quite some time for the flush interval to equal the total block processing time; as the process block time is very low.
So to iterate, running as a full node with snapshot disabled and a relatively low flush interval means more flushes; however is a much more attractive solution storage-growth wise compared to running a full archive node.
Thus, creating a "more archivey" full node which is more crash resistant than a regular full node with the default flush interval of say 1h.
System information
Geth version: see commit hash OS & Version: Ubuntu 20.04 Commit hash : df52967ff6080a27243569020ff64cd956fb8362 (post 1.10.26-stable)
Expected behaviour
Using the newly introduced TrieFlushInterval in the debug namespace, I expect more frequent flushes from the dirty cache to disk. However, after further investigation, even setting the value to 0 does not seem to write any state changes to disk. (Setting the value to "0s" essentially means going into archive mode if I understood correctly).
I expected this to lead to more persistency when the node gets killed for some reason (e.g. a crash due to OOM).
I would like to use this functionality for a private chain which doesn't have too much traffic, hence I was playing around with the cache.
Actual behaviour
Persistency is not there. I started my miner node from block 88 or so and simulated a lot of on-chain traffic. This was followed by killing the node after some time had past (an unclean shutdown).
I expected after starting up the node again that the state it starts on would be further than block 80, however, it started right from the beginning. This gives me the impression that no flushing actually occurs from the dirty cache.
Perhaps i'm just missing some context and misunderstanding the way the cache operates.
Steps to reproduce the behaviour
debug.setTrieFlushInterval("0s")
-> essentially setting flush interval to be equal to a block processing time of 0Simulate any chain traffic, e.g:
ps ax | grep geth
Backtrace
Updated backtrace:
Logs clean start:
Letting the chain rune for 200 blocks or so, then killing the node:
Logs after killing node and starting back up: