ethereum / go-ethereum

Go implementation of the Ethereum protocol
https://geth.ethereum.org
GNU Lesser General Public License v3.0
47.18k stars 19.96k forks source link

leveldb filesystem structure with no subdirectories will become a problem over time #25122

Open c0deright opened 2 years ago

c0deright commented 2 years ago

geth uses leveldb to store the blockchain in $datadir/geth/chaindata.

Right now I'm running geth with --syncmode full and geth is at block 4,834,322 (last block right now: 14,979,825). With appr. 32% synced so far the directory $datadir/geth/chaindata has ~76,000 files in it:

find /data/geth/chaindata/ -mindepth 1 -maxdepth 1 -type f | wc -l
76470

Most of these leveldb files are only 2.1MB in size.

With ever increasing inodes in the directory chaindata it will become a problem for some filesystems to even list the contents of that directory. It would be much more useful to use a directory structure that places files in one or two level deep directories so there won't be a single directory with a million small files one day.

Bitcoin Core for example stores raw block data in files of size ~128MB and then uses leveldb to store an index only.

Running geth with --gcmode archive most definately will render the chaindata directory unmanageable (think of backups, rsync, ...). Each process that openes the directory to read the directory structure will take ages.

c0deright commented 2 years ago
echo 3 > /proc/sys/vm/drop_caches
time ls -l /data/geth/geth/chaindata >/dev/null

real    0m2.248s
user    0m0.229s
sys     0m0.622s

over 2.2 seconds to generate the directory listing for a little over 70.000 files on an AWS EBS volume (SSD backed).

karalabe commented 2 years ago

We're aware of this issue.

Using files larger than 2MB blows up disk IO as compaction becomes exponentially heavier. Would have been nice to split the files across multiple folders in leveldb, but it does not support that and I'm not confident enough about starting to implement a new storage engine, especially as the upstream project doesn't really accept contributions any more.

We're currently in progress of experimenting with Pebble, aiming to switch over to that fully eventually. I'm unsure if that supports nested dbs, but it might make more sense to try and get it into that. Raising the level sizes still causes insane writes in Pebble too.

g2px1 commented 1 year ago

We're aware of this issue.

Using files larger than 2MB blows up disk IO as compaction becomes exponentially heavier. Would have been nice to split the files across multiple folders in leveldb, but it does not support that and I'm not confident enough about starting to implement a new storage engine, especially as the upstream project doesn't really accept contributions any more.

We're currently in progress of experimenting with Pebble, aiming to switch over to that fully eventually. I'm unsure if that supports nested dbs, but it might make more sense to try and get it into that. Raising the level sizes still causes insane writes in Pebble too.

And the real updates from DBs which are leveldb-based - in rocksDB only. At least they've added column families...