ethereum / go-ethereum

Go implementation of the Ethereum protocol
https://geth.ethereum.org
GNU Lesser General Public License v3.0
46.81k stars 19.76k forks source link

Feature request: ability to prune the old ancient blockchain data #26596

Closed jsvisa closed 1 year ago

jsvisa commented 1 year ago

Rationale

I'm running a new snap-sync node, after the syncing progress, found the local chaindata consuming 800+GB, and half of the disk is used to store the ancient data:

$ du --max-depth=1 -h data/geth/chaindata
422G    data/geth/chaindata/ancient
830G    data/geth/chaindata

The old ancient data is useless in most cases, so if we support the ancient data pruning, we can use fewer disks.

Implementation

Seems the binance smartchain has supported this feature(merged in #543) maybe we can backport this feature into go-ethereum.

$ ./bin/bsc snapshot prune-block --help
prune-block [command options]

geth offline prune-block for block data in ancientdb.
The amount of blocks expected for remaining after prune can be specified via block-amount-reserved in this command,
will prune and only remain the specified amount of old block data in ancientdb.
the brief workflow is to backup the the number of this specified amount blocks backward in original ancientdb
into new ancient_backup, then delete the original ancientdb dir and rename the ancient_backup to original one for replacement,
finally assemble the statedb and new ancientDb together.
The purpose of doing it is because the block data will be moved into the ancient store when it
becomes old enough(exceed the Threshold 90000), the disk usage will be very large over time, and is occupied mainly by ancientDb,
so it's very necessary to do block data prune, this feature will handle it.

ETHEREUM OPTIONS:
                                      --datadir value                       Data directory for the databases and keystore (default: "/home/amber/.ethereum")
                                      --datadir.ancient value               Data directory for ancient chain segments (default = inside chaindata, '${datadir}/geth/chaindata/ancient/')
                                      --block-amount-reserved value         Sets the expected remained amount of blocks for offline block prune (default: 0)
                                      --triesInMemory value                 The layer of tries trees that keep in memory (default: 128)
                                      --check-snapshot-with-mpt             Enable checking between snapshot and MPT
rjl493456442 commented 1 year ago

There is an EIP for it https://eips.ethereum.org/EIPS/eip-4444, the challenge of it is how can we have a strong guarantee that the dropped historical chain data can still be retrievable. I believe this challenge is not resolved yet.

While as a short-term solution, you can specify the ancient directory to a HDD-based location, it's still performant enough (our freezer design has O(1) read/write complexity) but kind of cheaper to use HDD.