EIP-1186: RPC-Method to get Merkle Proofs - eth_getProof

simon-jentzsch commented 6 years ago

eip: 1186 title: eth_getProof author: Simon Jentzsch simon.jentzsch@slock.it, Christoph Jentzsch christoph.jentzsch@slock.it discussions-to: simon.jentzsch@slock.it status: Draft type: Standards Track (Core, Networking, Interface, ERC) category : Interface created: 2018-06-24

Simple Summary

One of the great features of Ethereum is the fact, that you can verify all data of the state. But in order to allow verification of accounts outside the client, we need an additional function delivering us the required proof. These proofs are important to secure Layer2-Technologies.

Abstract

Ethereum uses MerkleTrees to store the state of accounts and their storage. This allows verification of each value by simply creating a MerkleProof. But currently, the eth-Module in the RPC-Interface does not give you access to these proofs. This EIP suggests an additional RPC-Method, which creates MerkleProofs for Accounts and Storage-Values.

Combined with a stateRoot (from the blockheader) it enables offline verification of any account or storage-value. This allows especially IOT-Devices or even mobile apps which are not able to run a light client to verify responses from an untrusted source only given a trusted blockhash.

Motivation

In order to create a MerkleProof access to the full state db is required. The current RPC-Methods allow an application to access single values (eth_getBalance,eth_getTransactionCount,eth_getStorageAt,eth_getCode), but it is impossible to read the data needed for a MerkleProof through the standard RPC-Interface. (There are implementations using leveldb and accessing the data via filesystems, but this can not be used for production systems since it requires the client to be stopped first - See https://github.com/zmitton/eth-proof)

Today MerkleProofs are already used internally. For example, the Light Client Protocol supports a function creating MerkleProof, which is used in order to verify the requested account or storage-data.

Offering these already existing function through the RPC-Interface as well would enable Applications to store and send these proofs to devices which are not directly connected to the p2p-network and still are able to verify the data. This could be used to verify data in mobile applications or IOT-devices, which are currently only using a remote client.

Specification

As Part of the eth-Module, an additional Method called eth_getProof should be defined as follows:

eth_getProof

Returns the account- and storage-values of the specified account including the Merkle-proof.

Parameters

DATA, 20 Bytes - address of the account.
ARRAY, 32 Bytes - array of storage-keys which should be proofed and included. See eth_getStorageAt
QUANTITY|TAG - integer block number, or the string "latest" or "earliest", see the default block parameter

Returns

Object - A account object:

balance: QUANTITY - the balance of the account. See eth_getBalance
codeHash: DATA, 32 Bytes - hash of the code of the account. For a simple Account without code it will return "0xc5d2460186f7233c927e7db2dcc703c0e500b653ca82273b7bfad8045d85a470"
nonce: QUANTITY, - nonce of the account. See eth_getTransactionCount
storageHash: DATA, 32 Bytes - SHA3 of the StorageRoot. All storage will deliver a MerkleProof starting with this rootHash.
accountProof: ARRAY - Array of rlp-serialized MerkleTree-Nodes, starting with the stateRoot-Node, following the path of the SHA3 (address) as key.
storageProof: ARRAY - Array of storage-entries as requested. Each entry is a object with these properties:
- key: QUANTITY - the requested storage key
- value: QUANTITY - the storage value
- proof: ARRAY - Array of rlp-serialized MerkleTree-Nodes, starting with the storageHash-Node, following the path of the SHA3 (key) as path.

Example

{
  "id": 1,
  "jsonrpc": "2.0",
  "method": "eth_getProof",
  "params": [
    "0x7F0d15C7FAae65896648C8273B6d7E43f58Fa842",
    [  "0x56e81f171bcc55a6ff8345e692c0f86e5b48e01b996cadc001622fb5e363b421" ],
    "latest"
  ]
}

The result will look like this:

{
  "id": 1,
  "jsonrpc": "2.0",
  "result": {
    "accountProof": [
      "0xf90211a...0701bc80",
      "0xf90211a...0d832380",
      "0xf90211a...5fb20c80",
      "0xf90211a...0675b80",
      "0xf90151a0...ca08080"
    ],
    "balance": "0x0",
    "codeHash": "0xc5d2460186f7233c927e7db2dcc703c0e500b653ca82273b7bfad8045d85a470",
    "nonce": "0x0",
    "storageHash": "0x56e81f171bcc55a6ff8345e692c0f86e5b48e01b996cadc001622fb5e363b421",
    "storageProof": [
      {
        "key": "0x56e81f171bcc55a6ff8345e692c0f86e5b48e01b996cadc001622fb5e363b421",
        "proof": [
          "0xf90211a...0701bc80",
          "0xf90211a...0d832380"
        ],
        "value": "0x1"
      }
    ]
  }
}

Rationale

This one Method actually returns 3 different important data points:

The 4 fields of an account-object as specified in the yellow paper [nonce, balance, storageHash, codeHash ], which allows storing a hash of the account-object in order to keep track of changes.
The MerkleProof for the account starting with a stateRoot from the specified block.
The MerkleProof for each requested storage entry starting with a storageHash from the account.

Combining these in one Method allows the client to work very efficient since the required data are already fetched from the db.

Proofs for non existant values

In case an address or storage-value does not exist, the proof needs to provide enough data to verify this fact. This means the client needs to follow the path from the root node and deliver until the last matching node. If the last matching node is a branch, the proof value in the node must be an empty one. In case of leaf-type, it must be pointing to a different relative-path in order to proof that the requested path does not exist.

possible Changes to be discussed:

instead of providing the blocknumber maybe the blockhash would be better since it would allow proofs of uncles-states.
in order to reduce data, the account-object may only provide the accountProof and storageProof. The Fields balance, nonce, storageHash and codeHash could be taken from the last Node in the proof by deserializing it.

Backwards Compatibility

Since this only adds a new Method there are no issues with Backwards Compatibility.

Test Cases

Tests still need to be implemented, but the core function creating the proof already exists inside the clients and are well tested. ## Implementation We implemented this function for: - [x] [parity](https://github.com/paritytech/parity/pull/9001) (Status: pending pull request) - [Docker](https://hub.docker.com/r/slockit/parity-in3/tags/) - [x] [geth](https://github.com/ethereum/go-ethereum/pull/17737) (Status: pending pull request) - [Docker](https://hub.docker.com/r/slockit/geth-in3/tags/) ## Copyright Copyright and related rights waived via [CC0](https://creativecommons.org/publicdomain/zero/1.0/).

MicahZoltu commented 6 years ago

It feels like this should be paired with an eth_verifyProof method. Getting a proof with no easy way to verify it feels significantly less useful.

simon-jentzsch commented 6 years ago

a eth_verifyProof might be helpful, but not required since you can easily verify this inside you dapp. (or even outside). , you just call eth_getBlockBy... and take the stateRoot. and then verify the proof.

import * as Trie from 'merkle-patricia-tree'
import * as util from 'ethereumjs-util'

const [block, account ] = await Promise.all([
   // we need the blockheader to get the stateRoot 
  web3.eth.getBlockByNumber('latest',false),

  // and we need the proof
  web3.eth.getProof(address,[],'latest')
])

// this function simply verifies the proof by starting with the stateRoot from the header and hopefully end with the leaf-node containing the rlp-serialized value of the account
Trie.verifyProof( block.stateRoot, util.sha3 ( address ), account.accountProof, (err, value) => {
   if (err || !value.equals(util.rlp.encode(account.nonce,account.balance,account.storageHash, account.codeHash)))
       console.log('proof failed :',err)
   else 
       console.log('verified!')
})

But I think it would be a good idea to offer a function in the web3-library like:

web3.eth.verify.account( account,  blockHash )
web3.eth.verify.storage( account.storageProof, account.stateRoot )

5chdn commented 6 years ago

Please create PR, not an issue.

simon-jentzsch commented 5 years ago

thanks, just created the PR

simon-jentzsch commented 5 years ago

I also created an Reference-Implementation for geth now:

PR:https://github.com/ethereum/go-ethereum/pull/17737 (Status: pending ) Docker: https://hub.docker.com/r/slockit/geth-in3/tags/

zmitton commented 5 years ago

@simon-jentzsch is it (inconveniently) true that the intermediary nodes to the other trees (i.e. transactions & receipts) are not stored in levelDB (geth/parity)? In my library I currently do multiple RPC calls for all the transactions of the particular block and re-create the tree locally.

zmitton commented 5 years ago

Seems like this is the case, so my library is still useful in building those proofs (and of course checking them).

simon-jentzsch commented 5 years ago

@zmitton yes, the other tries like transactions and receipts are only created temporarly since all data are available in the block, but I agree this means getting a merkle proof for a transactrionReceipt means running at least a bulk-request to get each receipt and construct the tree. (like here https://github.com/slockit/in3-server/blob/master/src/chains/proof.ts#L205 ) But at least these information are available. (I'm also thinking about caching these trees to optimize performance)

juan794 commented 5 years ago

a eth_verifyProof might be helpful, but not required since you can easily verify this inside you dapp. (or even outside). , you just call eth_getBlockBy... and take the stateRoot. and then verify the proof.

import * as Trie from 'merkle-patricia-tree'
import * as util from 'ethereumjs-util'

const [block, account ] = await Promise.all([
   // we need the blockheader to get the stateRoot 
  web3.eth.getBlockByNumber('latest',false),

  // and we need the proof
  web3.eth.getProof(address,[],'latest')
])

// this function simply verifies the proof by starting with the stateRoot from the header and hopefully end with the leaf-node containing the rlp-serialized value of the account
Trie.verifyProof( block.stateRoot, util.sha3 ( address ), account.accountProof, (err, value) => {
   if (err || !value.equals(util.rlp.encode(account.nonce,account.balance,account.storageHash, account.codeHash)))
       console.log('proof failed :',err)
   else 
       console.log('verified!')
})

But I think it would be a good idea to offer a function in the web3-library like:

web3.eth.verify.account( account,  blockHash )
web3.eth.verify.storage( account.storageProof, account.stateRoot )

I am not sure if it is an error or I am the only one experience it, but I think it is better to comment on it.

I am testing an implementation that requires offline existence verification of accounts. RLP decoding of the value in Trie.verifyProof, following the example above, takes account's balance (smart contract's in this case) as a string data type when the value is 0x0 which makes the verification fail. When I deposit some Ethers, the smart contract's balance is taken as an integer data type and the verification works ok. I am using Geth 1.18, NodeJS 8.10, and Rinkeby testnet.

zmitton commented 5 years ago

@juan794 this doesn't sound like an issue with the EIP. From the above code I dont see RLP needing to be decoded but you might want to bring this up with the rlp repo or 'merkle-patricia-tree' (depending on your code which i havent seen)

juan794 commented 5 years ago

Thanks @zmitton. I used RLP to understand why the verification was not working. I thought it is liked to this EIP because it is the contract's balance which makes the verification works straightforwardly as the example above, but you are right, it is closer to an RLP problem itself.

zmitton commented 5 years ago

@juan794 I've seen this issue before. the problem is that the number 0 is represented in ethereum as bytes<> not bytes<00> so the RLP of it becomes bytes<80> and not bytes<00> (which would be the rlp of bytes<00> because rlp of anything single byte under 80 is itself).

So find out find out where the software could use updating but I bet you its not here. the return value of 0 or empty from RPC is generally been string "0x0" I believe and so it should probably keep this behavior.

davidmurdoch commented 5 years ago

So find out find out where the software could use updating but I bet you its not here. the return value of 0 or empty from RPC is generally been string "0x0" I believe and so it should probably keep this behavior.

It depends on the data type returned by the RPC for the field in question. 0x0 is always a QUANTITY type and represents the number 0. 0x00 is an invalid QUANTITY. The DATA type allows for 0x which represents an empty set, bytes<>. 0x00 is bytes<00>, an array with a single byte: 0. 0x0000 is valid and different than 0x00 would represents bytes<00, 00>, an array of two bytes: 00 and 00.

MicahZoltu commented 4 years ago

Any reason not to move this to final? It is implemented in Geth, Parity, and Nethermind but Geth is currently behind --jsonrpc-experimental flag. Conversation has been dead for quite some time.

MicahZoltu commented 4 years ago

a eth_verifyProof might be helpful, but not required since you can easily verify this inside you dapp. (or even outside). , you just call eth_getBlockBy... and take the stateRoot. and then verify the proof.

import * as Trie from 'merkle-patricia-tree'
import * as util from 'ethereumjs-util'

const [block, account ] = await Promise.all([
   // we need the blockheader to get the stateRoot 
  web3.eth.getBlockByNumber('latest',false),

  // and we need the proof
  web3.eth.getProof(address,[],'latest')
])

// this function simply verifies the proof by starting with the stateRoot from the header and hopefully end with the leaf-node containing the rlp-serialized value of the account
Trie.verifyProof( block.stateRoot, util.sha3 ( address ), account.accountProof, (err, value) => {
   if (err || !value.equals(util.rlp.encode(account.nonce,account.balance,account.storageHash, account.codeHash)))
       console.log('proof failed :',err)
   else 
       console.log('verified!')
})

But I think it would be a good idea to offer a function in the web3-library like:

web3.eth.verify.account( account,  blockHash )
web3.eth.verify.storage( account.storageProof, account.stateRoot )

I just realized that this "simple example" requires library for merkle proof validation that may not be readily available in all environments. This example also doesn't appear to validate an actual storage proof, only the account proof.

So, I would like to re-assert my request that there be a JSON-RPC method for validating account and storage proofs. I don't think it should hold up this from becoming final.

MicahZoltu commented 4 years ago

When trying to actually use this, I found it was missing a couple piece of information: the state root hash and the block number/hash. If I run a query with latest, I am not provided with enough information to actually validate a proof against Ethereum, and because a block number is not returned I cannot lookup the block to get that data.

While I could guess-and-check and probably get the data pretty quickly, this process is error prone (uncles) and unnecessary work since that information is available to the node at the time of generating the proof.

p4u commented 3 years ago

Just in case someone needs to use EIP1186, we have implemented a Golang library for creating and verifying EIP1186 proofs: https://github.com/vocdoni/eth-storage-proof

Thanks @simon-jentzsch and everyone else for you work, this is an amazing technology that open the door for many offchain use cases of Ethereum.

jochem-brouwer commented 3 years ago

Just to verify here; the proof array always starts with the root hash. So it is not possible to create a proof where the proof has 0 items. (Except maybe if the trie is empty).

What happens if I try to create a proof on an empty trie?

saurik commented 3 years ago

@jochem-brouwer If you create a proof of an empty trie--which is easy to do if you create a new contract and have it not store anything and then ask for a proof of anything in that storage--what you seem to get back is an empty proof array. For the algorithm I use to verify the proofs, this is actually very natural: I maintain the current hash being proved and then walk through each proof, replacing the hash being proved with the hash of the node I was provided in the proof. If I run out of path on a proof, then I can return the value stored at that entry. If I am forced down an incompatible path, I return 0. If I run out of proofs--which can happen at the root of the trie, or anywhere below--I verify that the hash is the hash of an empty trie node--an RLP encoded vector of length 0--and return 0.

saurik commented 3 years ago

So, it isn't clear to me what the argument of "storage keys" is actually supposed to be: DATA, or QUANTITY? The document says "32 byte" (which maybe feels a bit like an implicit shout out to DATA); but then says "see eth_getStorageAt", which defines this argument as QUANTITY. The values in the reference implementation are passed to common.HexToHash, which doesn't particularly care. Sadly, I care, because I'm seeing implementations in the wild that have opinions narrower than what geth is accepting, and it would be nice to have something definitive with respect to the intended format of the argument. Should other treat that as a true "reference implementation", or was that just some loose wording and that was merely an "example implementation" that happens to be liberal in what it accepts?

p4u commented 3 years ago

Hey @saurik, not sure if I can help you but let me try.

Each Contract have its own Merkle Tree. The "storage key" is the index/key where the DATA (value/leaf) is stored inside this Merkle Tree. It is a Keccack256 hash (so 32 bytes) and depends on the Solidity compiler. For a Map of balances (of an ERC20 smart contract) the position in the Trie is equal to Keccack256(holderAddress, indexSlot). Index slot (or position slot) usually depends on the position where the Map is declared on the ERC20 smart contract solidity source code.

Here you can see an example on how the Storage Key is computed for a ERC20 like balance Map: https://github.com/vocdoni/storage-proofs-eth-go/blob/master/token/helpers.go#L15

Maybe you might find this document interesting too (go to "Storage Proofs" section) https://www.notion.so/aragonorg/Introducing-Vocdoni-Bridge-cf7e73d38c4a45788358e9a1497cdf19

saurik commented 3 years ago

@p4u So, I understood all of that. I'm asking a question about how the API expects these arguments to be encoded. Let's say, for example, that I want to access the value that is at storage slot 0; this is the slot you would be accessing if you have a contract with a single field of type uint256 and then like, SSTORE into it. Is the argument supposed to be "0x0", or "0x0000000000000000000000000000000000000000000000000000000000000000", or even simply "0"? Ethereum JSON/RPC APIs typically are defined to have arguments that are of either type QUANTITY or DATA. This API is not really specifying the format... it is just "ARRAY (of ?!?!?)", and I'm running into discrepancies with third-party implementations from groups like xDai. What it says here is "ARRAY 32-byte"... is that implying ARRAY of DATA 32-byte? It says, though, "see eth_getStorageAt", which firmly defines a storage key to be of type QUANTITY, NOT DATA. If it is type DATA (as maybe implied by the "32-byte"), then it would be "0x0000000000000000000000000000000000000000000000000000000000000000", but if it is type QUANTITY (as defined by eth_getStorageAt) then it would be "0x0". The "reference implementation" (a term which may have been thrown around loosely, but carries weight; the current implementation in geth is essentially identical) takes this string and passes it to common.HexToHash, which (near as I can tell, by implication of "what works") accepts all three of these formats, including "0". Is the intention that other third-party implementations accept all three of these formats? xDai's implementation only works correctly if you pass the DATA format; if you pass the QUANTITY format, it sometimes (as they have different versions--and I think even fundamentally different implementations--of their service behind a load balancer) rejects the value with "invalid length 1, expected a 0x-prefixed hex string with length of 64"... and it sometimes (this is "epic" ;P) succeeds but returns a proof of the wrong storage slot! So, I'm trying to figure out what the intention of the format for this argument was to be, so I can determine things like whether xDai's implementation--or that of any other chain I end up running into (RSK for example has their own implementation of a lot of these APIs in Java)--is compliant and what I should be making sure to generate when calling the API.

p4u commented 3 years ago

ARRAY, 32 Bytes - array of storage-keys which should be proofed and included. See eth_getStorageAt'

I would say that ARRAY expects a list of SHA256 keys, since they are Indexes of the Storage Merkle Patricia Tree. So it should be 0x0000000000000000000000000000000000000000000000000000000000000000 not 0x0 (which is NULL on RLP)

saurik commented 3 years ago

@p4u OK, and just to like, 100% verify: this is then unlike eth_getStorageAt, which (of course) also takes "indexes of the storage merkle patricia trie" but is firmly defined--in both EIP1474 and on eth.wiki--to take an argument of type "quantity"... which, in turn, "MUST be expressed using the fewest possible hex digits per byte", thereby making 0x000000... "invalid" when used with that API (at least by a compliant implementation: geth once again notably accepts this argument to eth_getStorageAt if passed in any format ;P).

p4u commented 3 years ago

I cannot 100% verify, but I'm talking from my experience using eth_getProof and web3.

jochem-brouwer commented 3 years ago

We might want to create a general EIP where we define these data types, and let other EIPs reference to this EIP so all EIPs point to the same terminology. (Same idea as the RFC which explains what MUST/SHOULD etc. means)

fedejinich commented 3 years ago

a eth_verifyProof might be helpful, but not required since you can easily verify this inside you dapp. (or even outside). , you just call eth_getBlockBy... and take the stateRoot. and then verify the proof.
import * as Trie from 'merkle-patricia-tree'
import * as util from 'ethereumjs-util'

const [block, account ] = await Promise.all([
   // we need the blockheader to get the stateRoot 
  web3.eth.getBlockByNumber('latest',false),

  // and we need the proof
  web3.eth.getProof(address,[],'latest')
])

// this function simply verifies the proof by starting with the stateRoot from the header and hopefully end with the leaf-node containing the rlp-serialized value of the account
Trie.verifyProof( block.stateRoot, util.sha3 ( address ), account.accountProof, (err, value) => {
   if (err || !value.equals(util.rlp.encode(account.nonce,account.balance,account.storageHash, account.codeHash)))
       console.log('proof failed :',err)
   else 
       console.log('verified!')
})
But I think it would be a good idea to offer a function in the web3-library like:
web3.eth.verify.account( account,  blockHash )
web3.eth.verify.storage( account.storageProof, account.stateRoot )
I just realized that this "simple example" requires library for merkle proof validation that may not be readily available in all environments. This example also doesn't appear to validate an actual storage proof, only the account proof.

So, I would like to re-assert my request that there be a JSON-RPC method for validating account and storage proofs. I don't think it should hold up this from becoming final.

Don't you think thats kind of controversial? I mean, there might be some cases where you are using eth_getProof because you don't trust the source, so why would you trust the response of eth_verifyProof? Maybe I'm missing something

p4u commented 3 years ago

I'm trying to add support for proofs of non existence in our implementation here: https://github.com/vocdoni/storage-proofs-eth-go/blob/master/ethstorageproof/ethstorageproof.go

According the original text from @simon-jentzsch

Proofs for non existant values In case an address or storage-value does not exist, the proof needs to provide enough data to verify this fact. This means the client needs to follow the path from the root node and deliver until the last matching node. If the last matching node is a branch, the proof value in the node must be an empty one. In case of leaf-type, it must be pointing to a different relative-path in order to proof that the requested path does not exist.

There are two possibilities. The first one (branch) is clear, I added it here: https://github.com/vocdoni/storage-proofs-eth-go/blob/master/ethstorageproof/ethstorageproof.go#L150

But I don't understand the leaf-type case, can someone help?

plasticalligator commented 2 years ago

This is still "todo"? Ethereum has had so many years to get it's act together that I can't comprehend why people are putting any money into a team that can't even cover the most basic functionality needed to develop anything meaningfully useful.

MicahZoltu commented 2 years ago

API specs are now maintained at https://github.com/ethereum/execution-apis, not in the EIPs repository. I believe all clients have implemented eth_getProof, but it doesn't appear anyone has added it to the API spec yet. It would be great for someone to do so!

I'm going to close this issue since this repository isn't the right place for it anymore.

jochem-brouwer commented 2 years ago

@MicahZoltu we are implementing this at EthereumJS. The EIP does not state that there should also be a field of the address which holds whatever address is being searched for. Geth returns this field in the proofs.

MicahZoltu commented 2 years ago

@jochem-brouwer It would not surprise me in the slightest to find that this idea (not even an EIP, just a proposal for an idea for an EIP) is not in line with actual implementations. It would be great if someone could figure out what Geth, Besu, Erigon, and Nethermind do and then add the intersection of those to execution-apis.

yjhmelody commented 6 months ago

Current geth implements seem are not aligned with the RFC. When an account not exist, its storage_hash/code_hash now just return all zeros but should return the empty hash according to the RFC.

plasticalligator commented 6 months ago

Somebody please just cover me in horse manure and then light me on fire.

ethereum / EIPs