ethereum / py-trie

Python library which implements the Ethereum Trie structure.
MIT License
104 stars 52 forks source link

Code Refactoring #78

Open Bhargavasomu opened 5 years ago

Bhargavasomu commented 5 years ago

What was wrong?

The whole code base is a bit hard to read and could be structured better.

How can it be fixed?

The code base can be refactored with the following improvements

Bhargavasomu commented 5 years ago

@pipermerriam would you like to add something more on top of this, in terms of specs?

pipermerriam commented 5 years ago

This generally looks good and you're :+1: to start on this.

carver commented 5 years ago

Adding Benchmarks to compare the performance (benchmark architecture similar to py-evm)

We have been talking about a benchmarking refactor in py-evm, see https://github.com/ethereum/py-evm/issues/1306

This could be a good time to try that out. Using the new proposed structure should be an improvement, but if it becomes a sticking point or a drag, just go back to the current way. No need to tie the two issues together.

Bhargavasomu commented 5 years ago

@carver thankyou for the reference, will take a look.

Bhargavasomu commented 5 years ago

@pipermerriam I have the basic prototype ready (cleanly designed), which just takes the key as a tuple of Nibbles and the value is bytes. I still understand that I would have to encode and hash a lot of stuff. Could you please guide me on this, in regards to tell me where all we need to encode and hash. Also I need guidance on when to push to the DB. It's not clear from the code. Thankyou.

pipermerriam commented 5 years ago

I have the basic prototype ready (cleanly designed)

It'd be helpful to see some code if you have it.

Bhargavasomu commented 5 years ago

@pipermerriam @carver After going through the yellow paper and a few examples, I have came up with the following neater version of the algorithm/design which is in par with the current live version too (Need to verify this). Firstly, I have 3 classes LeafNode, ExtensionNode and the BranchNode. Each of these nodes has 3 main functionalities which are insert_child(key, value), get_value(key), delete_value(key). (The names maybe weird right now, but will make them better later). Each of the functionalities are explained as follows.

insert_child(key, value)

The main idea here is that, when we want to insert a key and it’s corresponding value in the trie, we call the insert_child function on the rootNode. The rootNode in turn calls the insert_child function on its childNode (for BranchNode and ExtensionNode mainly). This recursive calls continue till the LeafNode is reached, where the base case of the recursion is executed and the recursion ends. Note that at each step of the node calling its child node, the key gets consumed and the remainder key is sent to the child nodes to be inserted. Further specific details for each node are mentioned below.

LeafNode

Bhargavasomu commented 5 years ago

@pipermerriam @carver I have tried my level best to be clear with the implementation followed. Please let me know if any part of it is unclear (or) if there is something that I'm missing out.

Bhargavasomu commented 5 years ago

I believe that the above approach eliminates the need of using a nibble terminator as we have a seperate class for LeafNode. I understand that the machine needs bytes and hence to make the number of nibbles even, so as to form bytes; we are using HP encoding. But I don't understand why we need to explicitly do that, as python should be taking care of that. I don't see how an Extension Node or a Leaf Node storing the nibbles (1, 2, 3) (odd number of nibbles as an example) is problematic. Could someone explain me about this in detail. Thankyou!

pipermerriam commented 5 years ago

@Bhargavasomu my memory of the algorithm is sufficiently foggy that I don't recall if the nibbles terminator is a protocol thing or an implementation thing. If it's just an implementation bit, I have no problem removing it. If it is protocol, I don't believe we can remove it.

In general, your plan/structure sounds fine, but I'll only really know once there is code to review. :+1: to move forward with this.

pipermerriam commented 5 years ago

I just saw this: https://github.com/ethereum/py-trie/pull/79

:eyes: