FuelLabs / fuel-vm

Fuel v2 interpreter in Rust
Other
358 stars 88 forks source link

Finding the hash of a program in Fuel VM: hashing bytecode or use MAST root? #794

Open partylikeits1983 opened 4 months ago

partylikeits1983 commented 4 months ago

I am curious how the hash for a fuel predicate is computed. As I understand, the predicate hash is computed here: https://github.com/FuelLabs/fuel-vm/blob/2604237c9ff4a755e48b40b2c006711d22cff19f/fuel-tx/src/contract.rs#L72

using the root_from_code function:

    pub fn root_from_code<B>(bytes: B) -> Bytes32
    where
        B: AsRef<[u8]>,
    {
        let mut tree = BinaryMerkleTree::new();
        bytes.as_ref().chunks(LEAF_SIZE).for_each(|leaf| {
            // If the bytecode is not a multiple of LEAF_SIZE, the final leaf
            // should be zero-padded rounding up to the nearest multiple of 8
            // bytes.
            let len = leaf.len();
            if len == LEAF_SIZE || len % MULTIPLE == 0 {
                tree.push(leaf);
            } else {
                let padding_size = len.next_multiple_of(MULTIPLE);
                let mut padded_leaf = [PADDING_BYTE; LEAF_SIZE];
                padded_leaf[0..len].clone_from_slice(leaf);
                tree.push(padded_leaf[..padding_size].as_ref());
            }
        });

        tree.root().into()
    }

Looking at this implementation however, I was wondering why use a merkle tree to compute the hash of the predicate? In this implementation, the compiled bytecode of the program is passed in, divided into chunks, inserted into the merkle tree, and then the root of the merkle tree is found.

However, using a merkle tree to find the hash of a program is very similar to the concept of a merkleized abstract syntax tree (MAST). However, a MAST root, is the hash of an entire program, where the leaves of the merkleized abstract syntax tree are the "subprograms" of the entire program. MAST is used in bitcoin: https://github.com/bitcoin/bips/blob/master/bip-0114.mediawiki

My question is why is a merkle tree used to compute the hash of a fuel program? It seems that the function root_from_code does not need to use a merkle tree at all since all it does is divide the byte code into chunks, insert into a merkle tree, and then returns the root of the merkle tree. In this implementation, the bytecode in a single leaf could be from different logic flows in a program.

Why not just use a standard hash function like keccak256 for computing the hash of a program? Why use a merkle tree in this case?

partylikeits1983 commented 3 months ago

Looking through the documentation again, I noticed this:

"[The predicate root] 'is the Merkle root of the binary Merkle tree each leaf being 16KiB of instructions.'"

https://github.com/FuelLabs/fuel-specs/blob/master/src/identifiers/predicate-id.md

However, it seems that the root_from_code function doesn't insert each leaf as 16KiB of instructions, but 16KiB of compiled bytecode.

How could this potentially cause an issue? Different compiler versions could output different compiled bytecode for the same program, meaning the same predicate compiled with different compiler versions could have two different predicate roots.