Closed TianyueLi1227 closed 4 months ago
The original use of FlushAtDepth
was to serialize mutated nodes in the tree and save them to disk after block processing. That worked but was inefficient since each node serialization was done in the flushFunc
passed as a parameter.
The serialization is CPU-bound and the recursive nature meant we were using a single core. For that reason, and since node serialization was the most important use case, BatchSerialize was created, which is more optimized for the task.
Thank you very much for your answer! This is a great update for how to serialize the node in memory into a HashedNode into disk. I am using go-verkle for my own project recently, but I am troubled by the huge occupation of the memory since I stored huge amount of the data into the verkle tree, and it costs around 60GB of memory to run my program properly. Therefore, I am really keen on this Serialize
method in order to save the memory.
I just got couple questions for how to use BatchSerialize
method.
I notice that there is a limit of 1024 node to be serialize at once. This is ideal when the verkle tree is small. But what should I do if I am trying to serialize a huge verkle tree when the number of serialized node is hard to predicte?
I notice that the HashedNode
type is not fully implemented. Do you intend to replace NodedNode
with SerializedNode
when inserting/getting values from verkle tree so that it could read from the disk? Does that mean I need to construct a way of relating the verkleNode flushed into the disk with its location in the original verkle tree so that it could be recovered when needed?
Thank you very much! Looking forward to your response!
HashedNode
is implemented, which doesn't mean it can resolve all expected methods of Node
. Is a type of node that only has the "hash" information, but nothing else. It's used to represent a node that we had stored on disk. The idea is that when serializing a tree, we skip HashedNode
since that means those didn't change. You can see in the tree mutating methods that whenever we find a HashedNode
we use a provided resolver
function to pull that node from disk.Thank you very much for the explanation. I am getting more understanding of the hashedNode
. Recently, I am implementing a LRU
like funcationality of based on verkle Tree, which will decide when to put some nodes into the disk, and when to put them back to verkle Tree. Since I did not find the implementation of transforming some serializedNode into the hashedNode on verkle tree after the BatchSerialize
. Does that means go-verkle
library is expecting its users to determine the way how they want to build the NodeResolveFn
and decide when to put the serializedNode
outputed by BatchSerialization
? If not, could you please remind me if there is any useful function I might missed.
Also, by my current design, given a path
to some leafNode, I will flush nodes after some certain predetermined depth (say 10), which is kind of similar to FlushAtDepth
but it will have a provided path at first and only flush after
some depth. My question here is whether this kind of flush is feasible. Personally, I find this flush mechinism to be more suitable to BatchSerialize
because they all go down the branch to the leaveNode. But I am worried there could be some details I missed.
Does that means go-verkle library is expecting its users to determine the way how they want to build the NodeResolveFn and decide when to put the serializedNode outputed by BatchSerialization?
Correct. The library doesn't impose any decision on that, so that's for the library client to decide.
My question here is whether this kind of flush is feasible. Personally, I find this flush mechinism to be more suitable to BatchSerialize because they all go down the branch to the leaveNode. But I am worried there could be some details I missed.
If depth is important to you, then FlushAtDepth
sounds reasonable. It might not be the fastest way but should be fine. If not you could use BatchSerialize
to get perf benefits, and filter out the depth after you get the result. You might end up doing CPU work that gets discarded, but might still be faster than FlushAtDepth
, it depends on how many things are in memory.
Let me know when is as good moment to close this issue as resolved.
I'd assume the original question was answered.
Hello, I noticed that in the tree.go, there is a function named
FlushAtDepth
. However in the test part, there are also TODOs sayI am curious about the current issue with
FlushAtDepth
, is there anything wrong with its functionality? What kind of context is realted to this API? Thank you very much with the help!