Open Bhargavasomu opened 5 years ago
@pipermerriam would you like to add something more on top of this, in terms of specs?
This generally looks good and you're :+1: to start on this.
Adding Benchmarks to compare the performance (benchmark architecture similar to py-evm)
We have been talking about a benchmarking refactor in py-evm, see https://github.com/ethereum/py-evm/issues/1306
This could be a good time to try that out. Using the new proposed structure should be an improvement, but if it becomes a sticking point or a drag, just go back to the current way. No need to tie the two issues together.
@carver thankyou for the reference, will take a look.
@pipermerriam I have the basic prototype ready (cleanly designed), which just takes the key as a tuple of Nibbles
and the value is bytes. I still understand that I would have to encode
and hash
a lot of stuff. Could you please guide me on this, in regards to tell me where all we need to encode
and hash
. Also I need guidance on when to push to the DB. It's not clear from the code. Thankyou.
I have the basic prototype ready (cleanly designed)
It'd be helpful to see some code if you have it.
@pipermerriam @carver After going through the yellow paper and a few examples, I have came up with the following neater version of the algorithm/design
which is in par with the current live version too (Need to verify this).
Firstly, I have 3 classes LeafNode
, ExtensionNode
and the BranchNode
. Each of these nodes has 3 main functionalities which are insert_child(key, value)
, get_value(key)
, delete_value(key)
. (The names maybe weird right now, but will make them better later). Each of the functionalities are explained as follows.
The main idea here is that, when we want to insert a key and it’s corresponding value in the trie, we call the insert_child
function on the rootNode
. The rootNode
in turn calls the insert_child
function on its childNode (for BranchNode
and ExtensionNode
mainly). This recursive calls continue till the LeafNode is reached, where the base case of the recursion is executed and the recursion ends. Note that at each step of the node calling its child node, the key gets consumed and the remainder key is sent to the child nodes to be inserted. Further specific details for each node are mentioned below.
common prefix
between the current node's (which is a LeafNode
) key and the current key.BranchNode
. Now use the functionality of the insert_child
of BranchNode
class to insert the current new (key, value)
pair and the existing LeafNode's
(partial_key, value) pair.common prefix
exists, then create a new ExtensionNode
whose key
is the common prefix
and the value is the the BranchNode
created in the above step.common prefix
exists, then return the ExtensionNode
created, else return the BranchNode
.
child_node
exists with key[0]
as the starting key nibble
, then create a
node = LeafNode(key[1:], value)
and set it to the BranchNode
in the key[0]
index.
I.e, BranchNode.child_nodes[key[0]] = LeafNode(key[1:], value)
.child_node
already exists with key[0]
as the starting key nibble
, then recursively call insert_child
on that child. i.e., BranchNode.child_nodes[key[0]].insert_child(key[1:], value)
.
trie_key
refers to the sequence of nibbles (part of key), which is stored in the Node
as part of the trie
. This convention is followed throughout the whole code.trie_key
, then the following steps are carried out.
BranchNode
and insert the new (key, value) pair into the BranchNode
using the BranchNode.insert_child
functionality.ExtensionNode
as the child of the BranchNode
, corresponding to the first nibble of the trie_key
in the ExtensionNode
.
trie_key[1:]
is empty, then there is no point in storing the extension node, and hence we directly store it’s child. Now we delete the current ExtensionNode
i.e, BranchNode.child_nodes[trie_key[0]] = self.child_node
self.value
if self.trie_key == key
key not found
empty
(full consumed), then return self.value
key[0]
key not found
BranchNode.child_nodes[key[1:]].get_value(key[1:])
self.trie_key
, then raise a exception key not found
.ExtensionNode.child_node.get_value(remainder_key)
None
if self.trie_key == key
key not found
first nibble of the key
exists in the child_nodes of the BranchNode
.delete_value
function on the child node.
i.e., BranchNode.child_nodes[key[0]] = BranchNode.child_nodes[key[0]].delete_value(key[1:])
BranchNode
is None
and if this BranchNode
doesn’t point to any other child_nodes
, then return None
(As this node can be deleted)BranchNode
has no child_nodes
but it has some value associated with it, then convert this into a LeafNode
and return this newly created LeafNode
.BranchNode
itself.
self.trie_key
, then raise a exception key not found
.ExtensionNode.child_node.delete_value(remainder_key)
and set this node returned by that operation as the value of ExtensionNode.child_node
.
ExtensionNode.child_node
is None
, after the above update, then there is no point in having this ExtensionNode
and hence returns None
.ExtensionNode
@pipermerriam @carver I have tried my level best to be clear with the implementation followed. Please let me know if any part of it is unclear (or) if there is something that I'm missing out.
I believe that the above approach eliminates the need of using a nibble terminator
as we have a seperate class for LeafNode
. I understand that the machine needs bytes and hence to make the number of nibbles even, so as to form bytes; we are using HP encoding
. But I don't understand why we need to explicitly do that, as python should be taking care of that. I don't see how an Extension Node
or a Leaf Node
storing the nibbles (1, 2, 3)
(odd number of nibbles as an example) is problematic. Could someone explain me about this in detail. Thankyou!
@Bhargavasomu my memory of the algorithm is sufficiently foggy that I don't recall if the nibbles terminator is a protocol thing or an implementation thing. If it's just an implementation bit, I have no problem removing it. If it is protocol, I don't believe we can remove it.
In general, your plan/structure sounds fine, but I'll only really know once there is code to review. :+1: to move forward with this.
I just saw this: https://github.com/ethereum/py-trie/pull/79
:eyes:
What was wrong?
The whole code base is a bit hard to read and could be structured better.
How can it be fixed?
The code base can be refactored with the following improvements
Leaf Nodes
,Branch Nodes
,Extension Nodes
which in turn inherit a common classBaseNode
. The classBaseNode
should contain the followingLeaf
orBranch
orExtension
)Extension
andLeaf
nodes. List of size 16 forBranch
nodes).value
contained in the node (If has nothing, thenNull
)get_all_children
methodEncode
andDecode
methodsparse_node
methodBaseTrie
class should be implemented from which theBinaryTrie
and theHexaryTrie
should be implemented.py-evm
)