Constructor arguments not correctly extracted/determined from bytecode during contract verification

derekpierre commented 1 week ago

Environment information

$ ape --version
0.8.14

$ ape plugins list
Installed Plugins
  etherscan    0.8.3
  infura       0.8.1
  polygon      0.8.0
  solidity     0.8.3

Python Version: 3.11.9
OS: macOS
solc: 0.8.23

What went wrong?

After resolving https://github.com/ApeWorX/ape-solidity/issues/153 with my local fix, I was still trying to do verification of our Coordinator contract (https://github.com/nucypher/nucypher-contracts/blob/main/contracts/contracts/coordination/Coordinator.sol) on polygon amoy (https://amoy.polygonscan.com/address/0x7d5f9a339e0b22f1e7d44f1b21e01f5c2207cdb3), after having already deployed it, and ran into issues determining the constructor arguments i.e. the contract was successfully deployed and separately, I'm not trying to verify it.

Code used from the ape console:

In [4]: coordinator = project.Coordinator.at("0x7d5f9a339e0b22f1e7d44f1b21e01f5c2207cdb3")

In [5]: explorer = networks.provider.network.explorer

In [6]: explorer.publish_contract(coordinator.address)

and got the following exception:

  File "/Users/derek/Documents/Github/repos/forks/derek/ape-etherscan/ape_etherscan/verify.py", line 249, in constructor_arguments
    ctor_args = extract_constructor_arguments(deployment_code, runtime_code)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/derek/Documents/Github/repos/forks/derek/ape-etherscan/ape_etherscan/verify.py", line 521, in extract_constructor_arguments
    raise ContractVerificationError("Runtime bytecode not found within deployment bytecode.")
ape_etherscan.exceptions.ContractVerificationError: Runtime bytecode not found within deployment bytecode.

Looking at the runtime and deployment bytecode I can notice the following difference:

rutime_bytecode

[common bytecode]f05e8e5e31bde978ea6035b5ce7ced927cfeb57ffbc515de4cdd140a47d44c5764736f6c63430008170033

vs

deployment_bytecode

[some other bytecode][common bytecode]2c08c810afa77797e57be348b557c9e8d7b8655781bd3bd9365cffe392bb8f0764736f6c63430008170033000000000000000000000000489287ed5bdf7a35fee411fbdcc47331093d0769

where:

[some other code] is just byte code at the beginning of the deployment_bytecode
[common bytecode] is the equivalent bytecode in both the runtime_bytecode and the deployment_bytecode

As you can see the entire runtime_bytecode is not contained within the deployment_bytecode but instead only a substring.

I can see the constructor arguments at the end of the deployment_code, i.e. 000000000000000000000000489287ed5bdf7a35fee411fbdcc47331093d0769 but it is not properly being extracted.

Looking at the extract_constructor_arguments method, I can see that the error is raised because of the following:

    # Find the start of the runtime bytecode within the deployment bytecode
    start_index = deployment_bytecode.find(runtime_bytecode)

    # If the runtime bytecode is not found within the deployment bytecode,
    # return an error message.
    if start_index == -1:
        raise ContractVerificationError("Runtime bytecode not found within deployment bytecode.")

The reason for the difference between the bytecode is unclear to me. If I'm doing something wrong, let me know. If not, it makes me wonder whether the algorithm used is not covering all cases/solidity versions?

I tried researching what etherscan does when they provide constructor arguments during manual verification, but couldn't really find anything.

How can it be fixed?

One option is to allow the constructor arguments to be optionally passed in as part of publish contract i.e.
```
explorer.publish_contract(coordinator.address, constructor_args=<args in hex>)
```

This could filter down into the SourceVerifier object.

That way, the caller can manually override the algorithm to determine constructor args. Similar to what etherscan allows for in case they got the constructor arguments incorrect.

I think this option is worthwhile irrespective of any alternative solutions you use.

Another option, which admittedly seems very flimsy (😅 ) was from the following article: https://mirror.xyz/n00b21337.eth/4HDSO5tlP3-_CAKUgT5QgQ4iIJZ7RH46zucebjycoN8

This article claimed that you can do a reverse search for 0033 and then use the bytecode after that for the constructor args:

So instead:

start_index = deployment_bytecode.rfind("0033")
if start_index == -1:
    raise ContractVerificationError("Can't find end of runtime bytecode")

constructor_arguments = deployment_bytecode[start_index + 4:]
return constructor_arguments

I did use this to work around my issue, and for this one contract. That being said, this seems extremely flimsy.

Something that could make it less flimsy is noted here, https://github.com/gnosis/verify-on-etherscan/blob/master/src/get_constructor_arguments.js#L14C1-L16C62 i.e.

// constructor bytecode is a sequence of 32-bytes
// a byte is represented with 2 characters in hex
// so a valid constructor must be a multiple of 64 characters

So you can validate the length of the determined constructor arguments and ensure it is a multiple of 64. Kind of what they do here to verify constructor arguments - https://github.com/gnosis/verify-on-etherscan/blob/master/src/get_constructor_arguments.js#L17. If not a multiple of 64, then likely you need to reverse find the next instance of 0033. Repeat until the corresponding arguments length is a multiple of 64.

linear[bot] commented 1 week ago

APE-1810 Constructor arguments not correctly extracted/determined from bytecode during contract verification

fjarri commented 5 days ago

This article claimed that you can do a reverse search for 0033 and then use the bytecode after that for the constructor args

This doesn't seem safe, what if there's 0033 somewhere in the constructor args? In general it seems that, with the way deployment code is organized in Ethereum, determining the position of the constructor arguments is akin to the halting problem: you have to run the whole thing to find out what it returns and what it uses as inputs.

I guess it is possible to get the constructor args if you know the constructor signature and can thus derive the full encoded size of the arguments, so you can take that substring from the end of the deployment bytecode and decode it. And if there are dynamic size arguments, you're out of luck.

It may be possible to find the end of the constructor code by tokenizing the bytecode and searching for RETURN INVALID, then working back to find the runtime code offset, but I'm not sure if the compiler always transforms the constructor to have a single return, or not.

ApeWorX / ape-etherscan