ethereum / solidity

Solidity, the Smart Contract Programming Language
https://soliditylang.org
GNU General Public License v3.0
23.23k stars 5.75k forks source link

Resuming compilation from AST import does not produce identical metadata #14986

Open Subway2023 opened 6 months ago

Subway2023 commented 6 months ago

Description

AST import leads to inconsistent bytecode.

Environment

Steps to Reproduce

contract C {
    function f() public returns (bytes32) {
        return address(this).codehash;
    }
}

1. Get ast json

solc C.sol --combined-json ast >> C.json

2. Get bin-runtime by importing ast

solc  --bin-runtime --import-ast C.json
6080604052348015600e575f80fd5b50600436106026575f3560e01c806326121ff014602a575b5f80fd5b60306044565b604051603b91906078565b60405180910390f35b5f3073ffffffffffffffffffffffffffffffffffffffff163f905090565b5f819050919050565b6072816062565b82525050565b5f60208201905060895f830184606b565b9291505056fea2646970667358221220fbde534a68865754b695d6f3d9100ab6db0b8f6d336671369335137e84faf43c64736f6c63430008180033

3. Get bin-runtime by directly compiling

solc  --bin-runtime C.sol
6080604052348015600e575f80fd5b50600436106026575f3560e01c806326121ff014602a575b5f80fd5b60306044565b604051603b91906078565b60405180910390f35b5f3073ffffffffffffffffffffffffffffffffffffffff163f905090565b5f819050919050565b6072816062565b82525050565b5f60208201905060895f830184606b565b9291505056fea2646970667358221220a836f31a61ac087171816a7ee56fc5a4b688af164c9acd228bb73fd2c0644f1764736f6c63430008180033

4. Compare

The bytecode obtained from two methods differs, resulting in different return values for this function f()

cameel commented 6 months ago

The difference comes from the metadata hash embedded in the runtime bytecode. You can verify that the contract code is the same by telling the compiler to skip the hash:

diff --unified \
    <(solc --bin-runtime --import-ast C.json --metadata-hash none) \
    <(solc --bin-runtime              C.sol  --metadata-hash none)

And metadata is different because it includes the hash and type of the source file. With AST import you get language: "SolidityAST" and the source is considered to be the JSON AST, not the source (which the compiler no longer has access to). You can use that by using the --metadata-literal flag and requesting --metadata - the metadata will then include the actual sources rather than just their hashes.

So this is not a bug - it was designed that way. Still, it's not a great design. Ideally, you'd be able to get the same exact output, including metadata, no matter whether you compile directly from Solidity source or do multi-stage compilation with separate AST, Yul or EVM assembly import. We'd like to change that but this will be a breaking change and will require careful consideration to do it properly. I'll leave the issue open for the time being and describe our current idea below, but keep in mind that it's something we won't be addressing in the near future.

Redesigned multi-stage metadata

The current idea is to introduce proper separation between stages inside the compiler, with each one starting and stopping at a well defined artifact rather than continuing using internal structures.

Currently the default mode of operation is for the compiler to use those internal structures as canonical input for each stage, with import/export of that data being tacked on later, often without too careful validation. This means that resuming compilation from a later stage is not exactly identical to doing it in one go. It's hard to guarantee that no optional bits of data are omitted on export and there's no strong barrier against later stages reaching into the data from earlier ones that would be unavailable in multi-stage compilation (being able to access the original source is an example of that). A better design would be for the state to be always exported at the end of each stage and for the next stage to start by importing it.

This would also require adjusting the structure of metadata. Currently we assume that complete metadata is always available after the analysis stage. Each stage should instead be able to accept partial metadata and add to it, with complete metadata being available only at the very end.