Tests for merkle patricia trie proofs

MaksymZavershynskyi commented 4 years ago

We need exhaustive tests for various corner cases of merkle patricia trie proofs to exercise this code here: https://github.com/near/rainbow-bridge/blob/master/libs-rs/eth-prover/src/lib.rs#L177-L315

ailisp commented 4 years ago

As discussed in meeting, @Kouprin is going to help on this issue since having similar test in nearcore. @Kouprin please to drop my assignment if you start work on it, i'll also drop you if i have time to start

Kouprin commented 4 years ago

I see the following ways to prepare lots of tests:

Generate locally lots of fake trees locally and check proofs for them. The main concern here is our inner way of tree generating and proving may differ from Etherium one. This way of testing seems to be completely useless because it doesn't include ETH feedback.
Take lots of real ETH records and check proofs for them. This is tricky because: a) we can check only valid records easily and need to check invalid as well - we cannot just naively replace proof with random bytes; b) we can't guarantee we can test all corner cases because we can't build trees arbitrary. We may even don't know how those trees look like. It seems we already have some tests built in such way.
Run ETH node from scratch then build arbitrary tree and then prove it. This seems to be extremely complex as it implies lots of knowledge how ETH stuff works and then develop a technique to build arbitrary trees in ETH.

Any ideas?

ailisp commented 4 years ago

Good analysis! I suggest we do 1 since 2 is hard to test corner case and 3 is really hard. Fabricate corner cases to cover all code paths and verify result with other reference implementations. If result same it's very likely correct. If not same, there could be our or reference impl incorrect, we can do further verification there. At the meantime, i can run 2 for a wide range of blocks, if it can correctly verify 1 year of block it's probably correct enough. But these are test for "valid blocks", we don't know if it reject "invalid blocks", maybe do 2a)? Or use a fuzzy tester?

@nearmax WDYT?

Kouprin commented 4 years ago

@ailisp I'm against 1 and here's why. It sounds like solid way to run thousand of tests full of corner cases and check validity on them. We can do that. However, it will test only what we're building on our inner infrastructure and completely not related of what is actually happening on ETH side. Solution 1 will create feeling safe and well-tested wrongly.

I might be wrong if there is a Standard which describes how the trees built and Etherium follows it clearly. For now I see lots of weird conditions like if dec.iter().count() == 17 { // branch node and else if prefix == 3 { // odd leaf node. It seems those conditions are found by reverse engineering, not because the Standard says: "branch node is the node when its data size is 17 elements".

I'm not convinced that we should rely on our ability to reproduce ETH way for trees building. We must use trees built on Etherium instead.

ailisp commented 4 years ago

I see, I agree it's not much useful to manually construct info. How does other implementation test, would it possible to borrow their test suites or test tools to "build trees on Ethereium"?

MaksymZavershynskyi commented 4 years ago

Let's finish this discussion and set estimate.

Kouprin commented 4 years ago

@nearmax I suggest to divide it into the following tasks:

Imitate Eth trees - moderate benefits, high risk of being useless, moderate to implement.
Download all possible Eth trees - high benefits, will be useful for forever, moderate to implement (I expect we already have some tools to download, need to sync with @ailisp)
Run real Eth node and build arbitrary trees on it - high benefits (covering corner cases), moderate usefulness, hard to implement. Needs lots of investigation.

I'd suggest to do it in order 2-1-3. I expect doing 2 will cover most of the cases we want to cover.

ailisp commented 4 years ago

Download all possible Eth trees - high benefits, will be useful for forever, moderate to implement (I expect we already have some tools to download, need to sync with @ailisp)

Yes we have the tool, but download all receipt logs of a block is a time consuming op (proof is extract from receipt logs), as there's usually hundreds of them in one block, each of them request as separate HTTP request and soon hit infura rate limit. Downloading at slower rate cost several minutes to download all of logs of one block. I'll run our own eth full node, and parallelize hundred of eth-dumper to make this faster. Test all proofs is also slow, verify-near-proof take 20 minute to verify 10 blocks of proofs. We can also parallelize this and wait days till it finish

Kouprin commented 4 years ago

@ailisp Yep, that's why it's important to start downloading earlier. Can we analyze downloaded trees for their structure? If yes, we can get lots of trees and then build a subset of tests which covers most of corner cases.

ailisp commented 4 years ago

@ailisp Yep, that's why it's important to start downloading earlier. Can we analyze downloaded trees for their structure? If yes, we can get lots of trees and then build a subset of tests which covers most of corner cases.

There's a sample in https://s3-us-west-1.amazonaws.com/rainbow-bridge.nearprotocol.com/test-data/eth-proofs.tar.gz you can take a look first. Sorry i haven't start downloading, let me see if i can start instances download today

MaksymZavershynskyi commented 4 years ago

This is quite challenging. Setting estimate to 5.

alexauroradev commented 3 years ago

Connected with #488

Near-One / rainbow-bridge

Tests for merkle patricia trie proofs #240