ethereum / sourcify

Decentralized Solidity contract source code verification service
https://sourcify.dev
MIT License
775 stars 384 forks source link

List of metadata improvements #1523

Open marcocastignoli opened 1 month ago

marcocastignoli commented 1 month ago
marcocastignoli commented 1 month ago

This is the list of problems/improvements that we encountered while working on source code verification:

1. Auxdata positions are not a compilation artifacts

Current status

The only way to extract the auxdata object from bytecode is to read the last two bytes that represent its length, and split the bytecode getting the cbor auxdata. There are cases in which multiple auxdatas are present in the bytecode, there is no straightforward way to get a list of all auxdatas. As a workaround, we add a space in every file (causing a metadata change), we recompile the contract, and find nested auxdatas in the bytecode by comparing the edited recompiled contract with the original recompiled contract. More info here: https://docs.sourcify.dev/blog/finding-auxdatas-in-bytecode/

Solution

Output auxdata positions in the compilation output artifacts.

2. Metadata IPFS hash is derived from metadata as a string

Current Status

Metadata's IPFS hash is calculated from metadata as a json string. So formatting and key order matters in order to obtain the same metadata IPFS hash.

Solutions

3. File paths in sources easily cause partial matches

Current status

If you recompile a contract with different files path you get different metadata hash, leading to a partial match.

"sources": {
  "contracts/1_Storage.sol": {}
},

vs

"sources": {
  "diff/contracts/1_Storage.sol": {}
},

Solution

Using the file's keccak instead of the file's path could solve this.

"sources": {
  "KECCAK(file_content)": {}
},

The problem with this solution is that two files in different locations with the same content will be under the same key. But is this a problem?

4. Libraries in metadata's compiler settings are different than libraries in compiler settings #1370

Current status

Solution

In metadata use the same library format as in solc settings

5. compilationTarget in metadata's compiler settings is not a valid field in compiler settings #1450

Current status

In metadata's compiler settings there is a compilationTarget field which is not supported by the sol compiler.

Solution

Remove compilationTarget from metadata's compiler settings moving file path and name in another section.

kuzdogan commented 3 weeks ago

IPFS doesn't support json objects.

What does this mean? Didn't IPFS have an encoding scheme that takes JSON objects instead of strings, like this? https://ipld.io/docs/codecs/known/dag-json/

Also mentioning the related Solidity issue about this topic https://github.com/ethereum/solidity/issues/14389 but it's really difficult to grasp for me. Maybe we can ask some of our questions there.

Another source of sensitivity is the libraries field and the remappings fields. Can we maybe consider to somehow change or abstract away those fields? I'm not sure if that's entirely possible because at least the library field directly affects the output bytecode.

marcocastignoli commented 3 weeks ago

Didn't IPFS have an encoding scheme that takes JSON objects instead of strings, like this? https://ipld.io/docs/codecs/known/dag-json/

I did some googling about this. It seems like Helia (new js-ipfs) actually supports DAG-JSON. I also tried it and it's working, but fetching a DAG-JSON file from the ipfs client doesn't work

marcocastignoli ~ % ipfs get baguqeera5owf3tfvl6gny6z6bcp5gt3izadzunlvbyuoxi3yykkvqp75l4da
Error: unknown node type

I'm actually confused about this point because I cannot find clear and up-to-date documentation. Maybe it's worth contacting the IPFS team before moving forward.

EDIT: Ok there is actualy a dedicated argument for the IPFS cli, so it seems it's fully supported!

ipfs dag get baguqeera5owf3tfvl6gny6z6bcp5gt3izadzunlvbyuoxi3yykkvqp75l4da
{"link":{"/":"baguqeerasords4njcts6vs7qvdjfcvgnume4hqohf65zsfguprqphs3icwea"}}
marcocastignoli commented 3 weeks ago

Another source of sensitivity is the libraries field and the remappings fields. Can we maybe consider to somehow change or abstract away those fields? I'm not sure if that's entirely possible because at least the library field directly affects the output bytecode.

Can you elaborate on this?

kuzdogan commented 3 weeks ago

@marcocastignoli e.g. when people link libraries manually vs through the compiler it affects if these linkings are included in the metadata or not. This causes the metadata to change. We can still verify these contracts but only partially.

Like if someone deploys a contract by linking it with the compiler the libraries field will be non-empty in the metadata. If later they compile but without linking, this time the libraries field is empty and results in a different metadata.