ergoplatform / ergo

Ergo protocol description & reference client implementation
https://ergoplatform.org/
Creative Commons Zero v1.0 Universal
499 stars 169 forks source link

Storing AVL+ tree in RAM partially #1498

Open kushti opened 2 years ago

kushti commented 2 years ago

I made some experiments which show that over 10 blocks (so 20 minutes on avg.), from 50% to 90% of sub-trees can be not touched by blocks validation (around block #524K , subtrees below depth 12 considered, 2,048 such subtrees under the tree).

As with time storing the whole tree in RAM would not be feasible eventually, we may consider storing only presumably active branches in RAM (storing only ProxyInternalNode instances where a branch starts) and push others to secondary storage. We need to gather more information on usage patterns in the first place likely.

knizhnik commented 2 years ago

We have discussed this question several times, last one was: https://discord.com/channels/@me/672419182964506624/855326445034143814 Right now we are lazily load AVL tree nodes into memory and never evict them. It seems to be strange to hold the whole tree in memory, but load it in lazy way. If it fits in memory, then it is more efficient to load it at open time and avoid proxy nodes at all. But if this tree can be very large and doesn't fit in memory, then B-Tree will be much more efficient than AVL (and any other binary tree).

But if we do not want to perform such radical refactoring, then there are three possible solutions:

  1. Implement some kind of node pool and evict least recently accessed nodes
  2. Do not pin in memory nodes at height greater than N and load them on demand (when N=1, nodes ae always loaded from KV storage).
kushti commented 2 years ago

Actually, currently we lazy-loading the tree during rollback, to do that quickly, and lazy-loading is about restoring top node just. Maybe worth to start with lazy-loading a subtree during node start and rollbacks, and then consider eviction strategy.

knizhnik commented 2 years ago

Actually we are using lazy loading of AVL tree nodes (ProxyInternalProverNode) right now:

  override def left: ProverNodes[D] = {
    if (l == null) l = VersionedLDBAVLStorage.fetch[D](lkey)
    l
  }

So on node start only requested branches of tree are loaded from the storage.