ethereum / go-ethereum

Go implementation of the Ethereum protocol
https://geth.ethereum.org
GNU Lesser General Public License v3.0
47.6k stars 20.15k forks source link

Use finalized block as the difflayer/dislayer indicator in pathdb #29822

Closed fynnss closed 1 month ago

fynnss commented 5 months ago

Rationale

Why should this feature exist?

In the existing implementation, pathdb's difflayer stores 128 layers by default, which can facilitate fast memory rollback. However, the more diff layers, the deeper the call depth and the worse performance of getting Node will be.

Use finalized block as the difflayer/dislayer indicator in pathdb can reduce the number of difflayers in memory and improve efficiency of getting node.

pprof:

image

Implementation

Do you have ideas regarding the implementation of this feature?

func (db *Database) Update(root common.Hash, parentRoot common.Hash, block uint64, nodes *trienode.MergedNodeSet, states *triestate.Set) error {
    // Hold the lock to prevent concurrent mutations.
    db.lock.Lock()
    defer db.lock.Unlock()

    // Short circuit if the mutation is not allowed.
    if err := db.modifyAllowed(); err != nil {
        return err
    }
    if err := db.tree.add(root, parentRoot, block, nodes, states); err != nil {
        return err
    }
    // Keep 128 diff layers in the memory, persistent layer is 129th.
    // - head layer is paired with HEAD state
    // - head-1 layer is paired with HEAD-1 state
    // - head-127 layer(bottom-most diff layer) is paired with HEAD-127 state
    // - head-128 layer(disk layer) is paired with HEAD-128 state
    return db.tree.cap(root, maxDiffLayers)  // use head<->finalizedblock as the maxDifflayers
}

Are you willing to implement this feature?

rjl493456442 commented 5 months ago

This idea sounds reasonable, but unfortunately we have to keep the latest 128 states available for serving snap sync.

fynnss commented 5 months ago

This idea sounds reasonable, but unfortunately we have to keep the latest 128 states available for serving snap sync.

AFAIK, snapshot data serves snap-sync. Will pathdb also be used? If it will be, maybe the history state can serve it.

rjl493456442 commented 5 months ago

Will pathdb also be used?

Yes, we need to construct the range proof for the specific state range, via trie

maybe the history state can serve it

The state history only records the flat state diffs, the historic trie node are not accessible unless we do the state rollback with the state diffs.

fynnss commented 5 months ago

Yes, we need to construct the range proof for the specific state range, via trie

Thank you so much. I got it.

The snapshot use bloomfliter to reduce the call path of getAccount/Storage. But bloomfilter for path key seems not very suitable. Have you tried it before?

rjl493456442 commented 5 months ago

Yeah, we also thought about using a Bloom filter in a trie database, but we haven't tried it yet.

The reason is bloom filters need to be rebuilt for each layer once the dirty cache in "diskLayer" is flushed, we need to measure the cost and gain.

Given that the number of trie nodes contained in the trie database far exceeds the number of states contained in the state snapshot. We assume the cost for bloom filter construction is non-trivial.

Besides, we could foresee that the frequency of disk layer flushing is relatively higher than state snapshot (as the dirty cache size is limited). So the overhead for maintaining bloom filter could be even higher.

However, I think it would be worthwhile to experiment it a bit.

will-2012 commented 5 months ago

In some test scenarios, pbss 128difflayer may indeed be a potential bottleneck. Adding hashcache to difflayer may be an option. For details, see PR: https://github.com/ethereum/go-ethereum/pull/29991

fynnss commented 5 months ago

@will-2012 Sounds more reasonable than bloomfilter with less overhead and better overall performance.