Trie: batch() Performance Optimizations

holgerd77 commented 4 months ago

This issue is an extract from #3290 analyzing the different performance-taking parts from block executions on Holesky (but not being Holesky specific).

This https://github.com/ethereumjs/ethereumjs-monorepo/issues/3290#issuecomment-1958993840 points to that tx trie validation with the need for generating the tx trie takes some significant amount of time (100-300ms for ~1000 entries (txs)).

This guided my attention to the trie.batch() functionality. Currently tx trie generation uses single put() calls. This could easily (eventually with some small modifications) switched to use batch() though.

That's the pre-story.

Trie batch() currently has no optimizations at all. This simply iterates to the operations given and calls into single puts.

I am pretty pretty convinced that there are significant gains to make here, and that somewhat likely relatively easily.

The following is an extract of how long (taken as a snapshot during one of these tx trie validation executions) trie put operations take, and then splitting this up into the two main performance consumers in trie.put(), being the first-round path finding and then the node update (side note: _updateNode() actually seems to be a really bad name, so this should be - minimally - _updateNodes() to be less confusing, right? 🤔)

grafik

No conclusions here yet, just to get some feeling for the thing.

Two things I can think of worth exploring (there might be more though):

batch() Key Sorting Option

It might give a significant insertion speed up if keys (so: the applied ones of course) gets pre-sorted in a way that insertion is eased and as few nodes as possible needs to be shifted around in between.

I think this should likely (?) just be an option and not the default case for batch() though that keys get sorted (for tx trie generation e.g. this should have no side effects, right?).

Analyze batch() Caching Opportunities

Especially in the context of the tx trie generation (where the keys are just unhashed sorted numbers 1,2,3,4,... for the single txs) it seems that on put() operations the same paths need to be retrieved again and again and again, maybe something like this goes for the insertion part as well.

Here is another screenshot showing the keys (already in the high range), just going up one by one:

grafik

Do not have a fully clear picture but it should be possible to leverage this by some caching functionality, e.g. by caching (parts of?) the path to be found), at least in the scope of a batch() operation (might be the case though that caching possibilities are found which "hold" beyond the scope of a batch() operation).

General side note: both things should be able to be easily tested with the small script adding some couple of ten thousand key/value pairs to the trie on the trie itself! No need for a complex test setup including the client!!!

holgerd77 commented 4 months ago

Additional side note: I find these numbers from above generally somewhat high for "memory only" operations, so this tx trie generation operates with a MapDB. 🤔 Another direction, but might be generally worth a look if something is generally off here.

holgerd77 commented 4 months ago

One (rough) concrete idea for a cache that I have (actually for some time, this could be useful/improving for various other situations as well):

That the resulting path respectively node stack (so in the sense of: what findPath() delivers, namingly a stack with instantiated node (extension, leaf, branch) objects) from the await this._updateNode(appliedKey, value, remaining, stack) call from within the put() operation get saved in a cache (batch() or even general scope cache).

And then on the next findPath() call (likely also within put()) parts of this path finding can be omitted by using (and maybe passing over to findPath()) these kind of cache results.

So, practical example:

Something is added for key (from screenshot above) 0x8202fc and then 0x8202fd.

Let's say a findPath() call delivers as a stack:

8 -> Branch Node 202 -> Extension Node f -> Branch Node

And c for the remaining Nibble.

The updateNode call would then add a leaf node for c.

Maybe one could then save this stack in a cache in a way that one could then in the next round (put() call for 0x8202fd) look in the cache if there is a cache entry which at least partly matches the path.

In this case there would be a stack for 8202f returned (the one from above).

And these objects could then already be passed to the next findPath() call for 0x8202fd so that they do not have to be retrieved again.

This is not completely easy to achieve respectively trivial to integrate, but this could have an enormous impact.

ScottyPoi commented 4 months ago

If I understand correctly, this would be an added optional parameter for findPath() in cases where a partial path is already known? That would indeed be helpful.

I think your solution is smart and likely could be applied in many places, both in our client and in ultralight. i'll give this some thought and tinker with it today

ScottyPoi commented 3 months ago

I started on this issue today by refactoring findPath to accept an optional partialPath parameter. This allows it to jump ahead in the walk, while returning the same path as a full walk.

holgerd77 commented 3 months ago

👍 Cool that you are experimenting with this a bit! 🙂

ethereumjs / ethereumjs-monorepo

Trie: batch() Performance Optimizations #3293

batch() Key Sorting Option

Analyze batch() Caching Opportunities