ethereum / solidity

Solidity, the Smart Contract Programming Language
https://soliditylang.org
GNU General Public License v3.0
23.06k stars 5.71k forks source link

Outputting the CBOR Metadata positions in the bytecode #14827

Open kuzdogan opened 7 months ago

kuzdogan commented 7 months ago

In compiler's output bytecode, some values need special attention during source code verification. These fields can be different between the compiled bytecode vs the on-chain bytecode. Still, they shouldn't break the verification because they are typically filled/assigned after compilation or the changes are not because of a change in the executed code (but comments etc.)

  1. Libraries: External library addresses are linked with --link or in JSON-input libraries:. This would replace the placeholder in format __$53aea86b7d70b31448b230b20ae141a537$__ with the library address in the bytecode.
  2. Immutables: Immutable variables in the bytecode are set to 000000... in the bytecode in compilation and only assigned a value in the constructor during deployment.
  3. CBOR metadata: Appended to the runtime bytecode. Contains the metadata hash so any slight change to the source files or the metadata file contents will change this value. If during verification the onchain CBOR does not match the compiled one, this will yield a partial match.

It is trivial to find libraries and immutables in a compiled bytecode because the former has special __$ placeholders and the latter's indexes are output in immutableReferences in the compiler.

However, the CBOR metadata part is not trivial to find:

  1. The bytecode can contain other contract bytecodes nested (e.g. factories) so only the main contract's CBOR is at the end of the bytecode and the other contracts' positions are unknown.
  2. Even if there's a single contract, the CBOR metadata is not necessarily at the end of the bytecode for the creation bytecode. It can be anywhere.

That's why verifiers typically use workarounds to find the exact positions such as adding whitespaces to all sources to change the metadata hashes. That requires recompiling the contract and is an expensive operation.

I suggest the compiler outputs the bytecode positions of the CBOR encoded metadata parts of the contracts, similar to how it's done in immutableReferences. This would make the source-code verification processes easier and also extracting the CBOR parts for other purposes.

Edit: Wrote an article about this, for some context: https://docs.sourcify.dev/blog/finding-auxdatas-in-bytecode/

github-actions[bot] commented 4 months ago

This issue has been marked as stale due to inactivity for the last 90 days. It will be automatically closed in 7 days.