Enhancing Security in MerkleProof.sol with safeVerify() Function for Double Hashing 64-Byte Leaves

🧐 Motivation The current implementation of MerkleProof.sol in the OpenZeppelin library exhibits a vulnerability when handling leaf data that is exactly 64 bytes in size. This issue, detailed in Issue #278 from the sherlock-audit repository, enables an attacker to bypass the merkle-tree proof, leading to significant security risks. Additionally, this vulnerability is acknowledged with a warning in the MerkleProof.sol contract(https://github.com/OpenZeppelin/openzeppelin-contracts/issues/3091). To address this, I propose introducing a safeVerify() function in the MerkleProof contract, which implements double hashing of leaves regardless of their size.

📝 Details This enhancement involves extending the existing MerkleProof.sol contract by adding a safeVerify() function. This function will apply double hashing to the provided leaf, irrespective of its size, thereby ensuring enhanced security and mitigating the identified vulnerability.

The proposed safeVerify() function could be implemented as follows:

    function safeVerify(bytes32[] memory proof, bytes32 leaf) internal pure returns (bool) {
        return MerkleProof.verify(proof, root, keccak256(leaf));
    }

I welcome feedback and further suggestions from the OpenZeppelin community on this proposal.

Hello @Erengonen

The MerkleProof.sol library is designed to support as many merkle trees as possible. In fact, the only requirement is that internal hashes are produced using the keccak256 hash of the sorted child values.

It is indeed true that this includes support for trees that are poorly designed. If the leaves are unhashed 32bytes values, than internal nodes can be proven even though they are no leaves. Similarly, if the leaves are produced by hashing 64 bytes, then you may prove non existing leaves that match some internal nodes.

We believe this is out of scope of the solidity library that verifies the trees, and it should be addressed by the tooling that build the trees. That is why we maintain @openzeppelin/merkle-tree. This library will encourage you to double hash your leaves in a way that is safe from the issues you describe.

I support the idea.

Leafs double hashing is considered to be a best practice when generating any Merkle Trees and is implemented by default in @openzeppelin/merkle-tree generator.

@Amxx I don't think that introducing safe way to validate if a leaf is included in the tree without a backward compatibility break will harm the library. The function can be named verifyLeaf and be referenced as a preferable way to validate Merkle Trees generated by the mentioned library.

Having keccak256(bytes.concat(keccak256(abi.encodePacked(params...)))) in smart contract looks a bit confusing.

Adding that function will make the notation a bit more clear and will prevent both the "unhashed 32 bytes length leafs" and "hashed 64 bytes length leafs" issues.

The only argument against I see is that for consistency reasons it is reasonable to apply the pattern for multiProofVerify function as well and it may be a bit more resource consuming.

Hi @Amxx, applying additional hashing will solve both problems.

Problem 1: “If the leaves are unhashed 32bytes values, than internal nodes can be proven even though they are no leaves.”

Let’s examine this Merkle tree structure:

              Root
             /    \
            /      \
           /        \
    [Hash_AB]      [Hash_CD]
     /    \          /    \
    /      \        /      \
LeafA    LeafB  LeafC     LeafD
  |         |     |          |
   \        |     |         /
    \ --->(Not hashed) <---/

Typically, in a Merkle tree, all leaves are hashed. However, in this example, the leaves are intentionally not hashed, illustrating a specific issue; they represent 32 bytes of actual data [Hash_AB], and [Hash_CD] are the hashes of (32 bytes of actual data + 32 bytes of actual data).Imagine a claim() function that implements the MerkleProof.verify() function, which does not hash the leaf parameter. For example:

claim(bytes32[] proof, bytes32 leaf){
…  
   return MerkleProof.verify(proof, root, leaf);
…  
}

In this scenario, it’s possible to pass unhashed leaves since the OpenZeppelin MerkleProof library does not apply hashing internally. However, if LeafA is 32 bytes of unhashed data, leaf could be misidentified as an intermediate node, leading to incorrect verification. For instance, in this Merkle tree, [Hash_AB] could be erroneously identified as a leaf (which is not accurate, as [Hash_AB] is an intermediate node), and [Hash_CD], LeafA can be used as proof.

To verify LeafA is in the Merkle tree the following proof can be applied to MerkleProof.verify() function as a proof parameter:

[LeafA, Hash_CD]

To verify that LeafA is included in the Merkle tree, the following leaf can be validly applied to MerkleProof.verify()function as a leaf parameter:

[Hash_AB]

[Hash_AB] is not a leaf! It is an intermediate node. This misrepresentation allows for incorrect proof verification using an intermediate node rather than the actual leaves (LeafA, LeafB, LeafC, LeafD). Implementing a safeVerify() function that applies additional hashing to the leaves can fix this issue. If leaves are presented as an unhashed 32-byte value, the function will hash the leaf parameter internally, preventing the misinterpretation of internal nodes as leaves.

Problem 2: “if the leaves are produced by hashing 64 bytes, then you may prove non existing leaves that match some internal nodes”

              Root
             /    \
            /      \
           /        \
    [Hash_AB]      [Hash_CD]
     /    \          /    \
    /      \        /      \
 LeafA    LeafB  LeafC     LeafD

In this example, all leaves are hashed. Each leaf is formed by concatenating two integer values. For instance, LeafA is derived as follows:

keccak256(abi.encodePacked(1, 2))

Assume the Merkle Tree is defined by:

const pairs = [
  [1, 2],
  [3, 4],
  [5, 6],
  [7, 8]
];

Here, LeafA equates to [1,2], and similarly, LeafB, LeafC, and LeafD correspond to elements in the pairs list.

Envision a claim() function that incorporates the MerkleProof.verify() function. This function creates a leaf by hashing the provided ‘one’ and ‘two’ parameters, exemplified as follows:

claim(bytes32[] proof, uint256 one, uint256 two){  
…
   return MerkleProof.verify(proof, root, keccak256(abi.encodePacked(one, two));  
…  
}

In scenarios where keccak256(abi.encodePacked(one, two)) is 64 bytes long(uint256 + uint256 = 32 bytes + 32 bytes = 64 bytes) Consequently, an intermediate node may be used to circumvent the verification process. Suppose we wish to confirm that LeafA is in the Merkle tree using the claim function.

To validate LeafA’s presence in the Merkle tree, the following proof parameter can be utilized in the claim function:

[LeafB, Hash_CD]

To verify the proof, we need to provide a leaf parameter, which will be constructed by hashing the provided one and two parameters. In this case, for the same verification, the following one and two parameters can be employed:

uint256 one = LeafB
uint256 two = LeafA

In order to create a leaf parameter, these parameters will be hashed inside the claim() function using keccak256. However, hashing these values results in the creation of an intermediate node:

[Hash_AB] = keccak256(abi.encodePacked(one, two)

In this instance, [Hash_AB] is mistakenly provided as a leaf parameter to the MerkleProof.verify(). [Hash_AB] is not a leaf but an intermediate node with a hash value of 64 bytes:

(uint256 + uint256 = 32 bytes + 32 bytes = 64 bytes)

As it matches the 64-byte criterion and represents an intermediate node, the verify function can erroneously be executed with the supplied intermediate node instead of the actual leaf. To counteract this, implementing a safeVerify() function is advisable. This function incorporates an additional hashing step, ensuring that internal nodes cannot be misinterpreted as leaves. Even if the intermediate node [Hash_AB] is sent to the function as a leaf parameter, the function’s additional hashing will ensure that [Hash_AB], when hashed inside the safeVerify() function, does not correspond to any intermediate node.

Hi @Amxx, applying additional hashing will solve both problems.

We never said it's not solving the problem, and we do agree it's a problem. However, we think it's reasonable to evaluate the severity and likelihood of the issue. Note not every contract without double hashing the leaves is vulnerable. In some cases, the design of the contract makes it impossible to get a valid intermediate preimage. Consider a function like this:

 function claim(address account, uint256 amount, bytes32[] memory proof) external {
      ...
      bytes32 leaf = keccak256(abi.encodePacked(account, amount));
      ...
  }

I'd say this is a common construction for airdrops, and note that since the leaf is constructed by concatenating account and amount, each leaf's preimage is 52 bytes long (20 + 32), so you would need an intermediate value in the tree with 10 trailing zeroes to make it work (in case keccak256 pads the input which I don't think so). Even if you were able to verify an intermediate value, you would only do it with a random account, restricting your ability to pick an arbitrary receiver.

While safeVerify is indeed the safest option for verifying trees, there are trees out there working fine with single hashed leaves and they aren't bugged, so they should still get support with the regular verify function. Also, we've never liked providing multiple options for the same thing, so keeping both safeVerify and verify wouldn't be our preferred option.

That is why we maintain @openzeppelin/merkle-tree. This library will encourage you to double hash your leaves in a way that is safe from the issues you describe.

I support this stand. Developers need a utility to produce Merkle Trees, and it's best for them if the asymmetry is built-in the tree itself, not the verifying library.

By feeling is also that it is very easy for a user to write

MerkleProof.verify(proof, root, keccak256(leaf))

instead of

MerkleProof.verify(proof, root, leaf)

if that is how its tree is constructed.

The added value of having both functions at the same time is unclear to me, while it feels like an unecessary risk of confusion for users.

I stand by my opinion that the important part is making sure the tree are constructed in a safe way. Once that is the case, I feel the current solidity tools are sufficient to verify any tree (including but not limited to the safe ones).

It looks like two sides of the same coin:

Usage of @openzeppelin/merkle-tree will cause double hashing on the smart contract side: bytes32 leaf = keccak256(bytes.concat(keccak256(abi.encode(addr, amount))));
Additional hashing in verification function will cause same double hashing with a bit more flexibility but less standardization

Both the ways provides same level of security and about same level of Gas usage. Covering both the sides in order to provide fully consistent solution looks reasonable to me. From the other side, keeping code minimalistic also makes sense.

Example to check Gas usage:

import "@openzeppelin/contracts/utils/cryptography/MerkleProof.sol";

pragma solidity ^0.8.20;

contract Target32 {
    bytes32 dataA = 0x0101010101010101010101010101010101010101010101010101010101010101;
    bytes32 dataB = 0x0202020202020202020202020202020202020202020202020202020202020202;

    bytes32 leafA = keccak256(bytes.concat(dataA));
    bytes32 leafB = keccak256(bytes.concat(dataB));

    bytes32 root = keccak256(abi.encodePacked(leafA, leafB));

    function tryDirect32() public view returns (bool) {
        bytes32 leaf = keccak256(bytes.concat(dataA));

        bytes32[] memory proof = new bytes32[](1);
        proof[0] = leafB;

        return MerkleProof.verify(proof, root, leaf);
    }

    function tryExtension32() public view returns (bool) {
        bytes32 leaf = dataA;

        bytes32[] memory proof = new bytes32[](1);
        proof[0] = leafB;

        return MerkleProofExtension.verifyLeaf(proof, root, leaf);
    }
}

contract Target64 {
    bytes32 halfDataA = 0x0101010101010101010101010101010101010101010101010101010101010101;
    bytes32 halfDataB = 0x0202020202020202020202020202020202020202020202020202020202020202;

    bytes32 leafA = keccak256(abi.encodePacked(halfDataA, halfDataA));
    bytes32 leafB = keccak256(abi.encodePacked(halfDataB, halfDataB));

    bytes32 root = keccak256(abi.encodePacked(leafA, leafB));

    function tryDirect64() public view returns (bool) {
        bytes32 leaf = keccak256(bytes.concat(keccak256(abi.encodePacked(halfDataA, halfDataA))));

        bytes32[] memory proof = new bytes32[](1);
        proof[0] = leafB;

        return MerkleProof.verify(proof, root, leaf);
    }

    function tryExtension64() public view returns (bool) {
        bytes32 leaf = keccak256(abi.encodePacked(halfDataA, halfDataA));

        bytes32[] memory proof = new bytes32[](1);
        proof[0] = leafB;

        return MerkleProofExtension.verifyLeaf(proof, root, leaf);
    }
}

library MerkleProofExtension {
    function verifyLeaf(bytes32[] memory proof, bytes32 root, bytes32 leaf) internal pure returns (bool) {
        return MerkleProof.verify(proof, root, keccak256(bytes.concat(leaf)));
    }
}

OpenZeppelin / openzeppelin-contracts

Enhancing Security in MerkleProof.sol with safeVerify() Function for Double Hashing 64-Byte Leaves #4818