gballet / multiproof-rs

A rust implementation of Alexey Akhunov's multiproof algorithm
Apache License 2.0
32 stars 8 forks source link

Add failing test case for hashing embedded nodes #31

Closed s1na closed 4 years ago

s1na commented 4 years ago

If a tree includes embedded nodes (those with encoding < 32), the computed hash() is wrong. This is due to the fact that rlp.encode([[1, 2], [3, 4]]) != rlp.encode([[1, 2], rlp.encode([3, 4]]). I.e. the parent node shouldn't serialize the encoding of the embedded node, but the embedded node in raw form (array).

Edit: the expected root was generated by the js merkle-patricia-tree library.

gballet commented 4 years ago

It's true, geth also has the same behavior. This is in direct conflict with what my understanding of equation 194 in the yellow paper says.

The confusing part is that geth's fullNode and shortNode don't correspond exactly to (respectively) the concept of branch node and extension node. Which means that to be completely correct, I need to rename FullNode with Branch.

gballet commented 4 years ago

So in direct contradiction to what I said yesterday, equation 194 is valid and the RLP of the subnode is indeed calculated. It seems that geth and other clients only calculate it when it should be inserted, whereas this code calculates it at the node level itself. The code was a bit confusing, but the conclusion is straightforward: nodes whose RLP is shorter than 32 should see their RLP included.

The problem with my code was that the RLP was calculated twice. This is fixed. Also there was an issue with the boundary, I would calculate the hash of the RLP if it was strictly greater than 32 bytes, and it should be calculated if it's greater or equal than 32 bytes.