gakonst / ethers-rs

Complete Ethereum & Celo library and wallet implementation in Rust. https://docs.rs/ethers
Apache License 2.0
2.47k stars 796 forks source link

Change artifact naming conventions #791

Open lattejed opened 2 years ago

lattejed commented 2 years ago

This was prompted by the forge issue here: https://github.com/gakonst/foundry/issues/392

Two overlapping issues:

  1. Some tests are being executed but dropped by forge due to naming conventions (that are decided upstream here). This can be fixed w/in forge, however:

  2. Contract names are under-specified here, where duplicate names across different files are presented ambiguously as filenames / paths are not retained.

Example, this is how dapptools presents test names:

<path>/<filename>:<name>

forge:

<filename>:<name>

Example test output:

dapptools:

Running 3 tests for src/test/Greeter.t.sol:Greet
[PASS] testCannotGm() (gas: 4540)
...

Running 3 tests for src/test/Greeter_2.t.sol:Greet
[PASS] testCannotGm() (gas: 4540)
...

forge:

Running 3 tests for Greet.json:Greet
[PASS] testCannotGm() (gas: 6819)

^ both Greeter and Greeter_2 are run by forge, but one is dropped from presentation due to how result are collected internally.

Using dapptools-style output is unambiguous. I'm proposing either changing naming conventions here to match dapptools. This would be a breaking change.

It may also make sense to keep things as-is but expose path, etc. so they can be consumed downstream.

--

As @mattsse has pointed out, currently artifacts from different compiler versions are not handled correctly. This is a caching, not a presentation issue, but can be addressed at the same time.

He also suggested having some / all of this configurable, tying in:

https://github.com/gakonst/foundry/blob/f9007fa8c238ffbb1365b26460a949ead303c046/config/README.md

@mattsse I'm not sure if you're thinking making output naming configurable or caching or both?

Any / all input welcome. I'll work on this myself.

lattejed commented 2 years ago

@gakonst @mattsse to what degree is the format of SolFilesCache sacred? I understand that it's an internal ethers format, but it's also currently (afaik) 1:1 with hardhat's format.

Dealing with artifact naming collisions wrt different version pragmas, we necessarily end up with artifacts arranged sth like:

out/
|-- 0.8.10xxx/src/
|---------------- A.sol/A.json
|---------------- C.sol/C.json
|-- 0.8.11xxx/src/
|---------------- B.sol/B.json
|---------------- C.sol/C.json

But with only one entry for C.sol in SolFilesCache since the entries are keyed to source paths -- none of the entries have a direct reference to the artifact(s).

Seems like the 'correct' way to change it would be to have the solcConfig key take a list of configs [] instead of a single one. Also, solcConfig will need an additional key to record the exact compiler version.

(Could also key against artifact path, but I think that would add more complexity in how the cache is handled internally.)

Do you guys see any issues deviating from format here?

Also, if this is ok, do you think this would warrant / require bumping the ETHERS_FORMAT_VERSION slug?

mattsse commented 2 years ago

thanks for the write-up.

+1 on modifying the cache format, I've thought about that already and we should break with hh format regardless just because there might be footguns when switching between both of them.

I think I'd rather flip it and do out/<filename>/Contract.version.json and perhaps add only .version. if there are duplicates?

rn the cache's file entries is a map with the source path of the solidity files as key like contracts/Greeter.sol and the greeter as a list of artifacts that solc generated for that file.

So your suggestion makes sense to make solcConfig and array, or at least the version field because the rest should be the same, like optimizer, outputSelection?

I also don't really like that artifacts are only stored by name, because this way we need to resolve the artifact manually, perhaps it would make more sense to store the relative path to all file artifacts jsons there.

definitely bumping the version slug

lattejed commented 2 years ago

If we're changing SolFilesCache format, yeah, it makes sense to store artifact paths there. The current format only stores source paths (full, plus a relative version) and artifact paths have to be reconstructed based on contract names only. That worked ok b/c artifacts were not being versioned, but actually keeping that metadata in the cache would simplify a lot of things wrt to the changes I'm making now.

Storing relative paths in the cache, keeping the artifact folder flat & only versioning conflicting items sounds good:

out/A.sol/A.json
|-- B.sol/B.json
|-- C.sol/C.0.8.10xxx.json
|-------- C.0.8.11xxx.json

That's a lot cleaner.

Afaik (I'll look at it more closely) the only part of solcConfig that's changing would be the version string, so just storing those would be fine. But cf the above, probably makes sense to have a map of versions + versioned artifact paths. Probably leave solcConfig as-is and add a new entry of versions + paths.

gakonst commented 2 years ago

I wonder if we should then add the solc version and optimizer runs in the filename? That would also solve for etherscan verification

lattejed commented 2 years ago

Do you mean caching version + optimizer info or specifically adding it to filenames? Optimizer info is currently stored in SolcConfig and exact version strings will be stored alongside it now.

gakonst commented 2 years ago

SolcConfig though is under cache/ should we be moving it to `out/, given that's where one would expect that extra metadata about an artifact should be stored?

lattejed commented 2 years ago

@gakonst I'd have to check if there are any scenarios where the contents of out get nuked but the cache doesn't -- pretty sure the entire artifacts folder gets deleted during cleanup.

Does anything in the cache need to be user accessible? I'm not sure if you're suggesting a specific use case here.

gakonst commented 2 years ago

the use case is etherscan verification which requires both solidity version and optimizer runs. Right now this Metadata is only stored at the cache file. My q is: is that the right place? Should that Metadata instead exist in the artifacts? Maybe it doesn't matter because it's likely that artifacts+cache always go together?

lattejed commented 2 years ago

Ah ok, I understand now, thanks @gakonst

I'd suggested keeping it in the cache since artifacts don't carry any metadata currently.

Writing artifacts and caching are separate flags internally, but afaik they're always used together apart from internal testing, so I don't think forge could ever see one written without the other.

I can add in some convenience methods so forge (or anything downstream) can query the cache easily. I don't think those currently exist anywhere. Would that be helpful?

lattejed commented 2 years ago

Re the cache file, looks like solcConfig is really what should be versioned, since it tracks the (normalized) version and is actually a separate compiler input for each artifact.

Also, I don't think it makes sense to change SolcConfig since it is directly serialized from compiler input.

Keeping a set of SolcConfig keyed by artifact paths seems like the way to go to me. I'll build it that way but am happy to walk it back if there are any objections.

gakonst commented 2 years ago

I'd try to group them if possible to avoid duplication, but that might be hard. Am OK with the suggested approach if @mattsse agrees

lattejed commented 2 years ago

Yeah, trying it out now -- I think storing a separate config per artifact is going to bloat the cache too much. I think I'll keep it as-is and store versions/artifact paths in a separate key.

I was thinking the preprocessor will have to be modified to not only check for config changes, but also for version changes -- but I guess they should always be caught by source hash changes. Afaik there's no other way to specify multiple compilers per run (other than in pragma directives).

mattsse commented 2 years ago

🙏 @lattejed I'm currently revamping the whole compile process and integrated the Version in the process so we can check if the solc version changed.

https://github.com/gakonst/ethers-rs/pull/802

So if you have a draft for the new CacheEntry format I'd be happy to review/integrate this

lattejed commented 2 years ago

@mattsse -- a lot of what I've done so far is version handling. I didn't realize this work was being done elsewhere. Feel free to cherry pick if yours isn't done yet.

Right now I'm working on a bug in preprocess_sources that doesn't pick up some cached artifacts -- must have always been there but wasn't noticed. (It requires a complete map of file hashes but only gets passed a partial from sources_by_version)

The only thing I've done to CacheEntry is add a map that contains <artifact paths>: <versions> but I'm not done consuming these changes in into_artifacts so I may change it again. I wouldn't commit to a format until it's actually integrated.

I'll push recent commits so you have them, though they're not really for public consumption (scattered todos, lack of docs)

lattejed commented 2 years ago

@rkrasiuk on second thought, I don't see why we can't finish part of this so you can ship your feature downstream. Matt could incorporate our changes into his rewrite if they overlap.

High level what needs to get done:

  1. Skip config integration for now
  2. The preprocessor reading cached artifacts needs to be fixed -- the current issue I've described above. That was an existing issue that was overlooked but there may be some regressions after that.
  3. into_artifacts path handling needs to be fixed (again). These paths should be read directly out of the cache now.
  4. Whatever you need from the cache will have to be made accessible.

If you want to help, you could start at #\4, I already know what has to get done for the other two.

Depending on what you need there, we can change what's stored in the cache.

lattejed commented 2 years ago

@rkrasiuk this is mostly done now save for updating tests, if you can explain what info you need downstream I can put together #\4 for you, assuming you're not already working on it.

rkrasiuk commented 2 years ago

@lattejed sure thing. i'd need 2 things to fix/improve etherscan verify: compiler version and number of optimizations runs. preliminary work on this was done in PR #755, but gakonst closed it as stale. the acceptance criteria would be:

  1. given the path to the source file inside of the project I can retrieve an artifact for that file, if it exists
  2. the artifact contains the compiler version and number of optimizations runs
lattejed commented 2 years ago

@rkrasiuk so you need both artifacts (abi / bytecode) and metadata (compiler input / settings)?

Do you know what the requirements are for compiler version? There are three formats used internally:

london 0.8.11 0.8.11+commit.d7f03943.Darwin.appleclang

rkrasiuk commented 2 years ago

@lattejed etherscan accepts the expanded version format like 0.8.11+commit.d7f03943. apparently the list of accepted versions is also limited. i wouldn't really need the ABI, since it's derivable from the contract source code and, I believe, that's what etherscan does itself. i just need the cached metadata basically

lattejed commented 2 years ago

@rkrasiuk I've added a couple things:

SolFilesCache::cache_populated_from_defaults SolFilesCache::metadata_for_source

I'll assume wherever you're consuming this you won't have access to Project which manages the compilation process and keeps paths, so cache_populated_from_defaults uses ProjectPathsConfig to get the default cache file location and parses it. This will need to be updated to handle different cache locations in the future.

metadata_for_source takes a source path and returns Option<(SolcConfig, BTreeMap<PathBuf, String>)>. The config has your optimizer info and the map has artifact paths mapped to expanded compiler versions.

If you want to change any of this let me know.

rkrasiuk commented 2 years ago

@lattejed thanks for this! actually, i am constructing the Project instance currently to call flatten on It (see draft PR #828 ). i can strip it down to ProjectPathsConfig, but need to review the implications of doing that

lattejed commented 2 years ago

@rkrasiuk ah, ok. If you have a Project you can use SolFilesCache::read(&project.paths.cache) which will do the same thing as SolFilesCache::cache_populated_from_defaults -- though the results will be the same unless you're explicitly setting the cache path.

rkrasiuk commented 2 years ago

@lattejed great work, thanks! looking through the PR rn