ethereum / solidity

Solidity, the Smart Contract Programming Language
https://soliditylang.org
GNU General Public License v3.0
23.33k stars 5.77k forks source link

Non-deterministic output according to filesystem path #9790

Closed mikeshultz closed 4 years ago

mikeshultz commented 4 years ago

Description

I'm seeing non-deterministic output according to filesystem path when compiling libraries. I've been able to reproduce it with multiple versions.

Environment

Steps to Reproduce

I put together a test case repository. If you run run.sh /path/to/solc it'll compile 3 identical libraries in 3 different directories displaying md5 hashes of the source contract and output.

Example from my last run:

$ ./run.sh ~/.local/bin/solc.0.7.1
Compiled contract a: 253b4238c2f607c39950d7cd2e6f4641/40f8f82ad16272edc8dd616069b22605
Compiled contract b: 253b4238c2f607c39950d7cd2e6f4641/48da6b634e3ee37565a99955d02454a4
Compiled contract c: 253b4238c2f607c39950d7cd2e6f4641/19888f5974c4bb763ed80282f4bbd239

This may be related to #168 which was closed before resolution.

chriseth commented 4 years ago

This is known behaviour. You tell the compiler: "Compile the file called a/SafeMath.sol", so this is how the file is known to the compiler. Any imports performed will also call the file by that name and because of that, this fact has to be hashed into the compilation artefact to make the build reproducible (otherwise, imports might resolve to different files).

If you want to abstract away the directories a / b / c, you have three options:

mikeshultz commented 4 years ago

I'm curious about the hashing logic. What does the bytecode care about source imports? Any interfaces or library code should be effectively integrated or placeholdered at the point of bytecode output, no? I get why placeholder hashes might include file paths since it's very relevant, but not why that might effect contract bytecode. If you happen to move source files around but leave execution instructions all the same, I think most of us would expect the build to output the same bytecode at the end.

Not a big deal to me, it's just unintuitive. Thank you for the options, those will be helpful.

chriseth commented 4 years ago

Of course, there are changes to filenames and paths that should not change the semantics of the contract, but if you exchange the names of to files, then this can have a big influence, especially if they define the same symbols. The idea is that we want a hash in the bytecode that basically is a watermark and allows re-compilation in exactly the same way as it was done before deployment, including the names of all files and all comments in the source code.