feat: optimize compilation through preprocessing and smarter caching

klkvr commented 2 months ago

A lot of the compilation overhead is coming from the need to compile all project's tests on every change to one of the src/ contracts. Ideally, we should only compile test contracts when interface of contracts they depend on changes, which doesn't happen often.

This can be made possible by avoiding solc mechanism of bytecode linking. Instead of linking bytecode at compile-time, we can link it at runtime, and we already have cheatcodes for that (deployCode, getCode).

Basic idea is to add a preprocessor translating all type(Contract).creationCode into vm.getCode invocations, and all new Contract(...) into vm.deployCode.

However, preprocessor on its own wouldn't let us skip compilation due to how caching works. We'll mark test file as dirty even if non-interface change is made to a contract. To address this, we would need to customize our caching logic by only marking test contracts as dirty if any of their dependencies had a non-interface change.

Implementation

Proposing an initial simple implementation of this preprocessing. It can be enabled as an opt-in, and improved over time.

Updated compilation pipeline

Before each compiler invocation, if we're compiling any test files, we'll do additional solc invocation to obtain AST for them and replace all creationCode/new keywords with getCode and deployCode invocations. Compiler will only see this preprocessed source code

Updated caching

For each non-test file, we'll store hash of additional "interface representation" which would be the same source file with all function bodies removed. We already have somewhat similar preprocessing step in bind-json: https://github.com/foundry-rs/foundry/blob/96105b4d240681c336e063eac0e250cc51a84414/crates/forge/bin/cmd/bind_json.rs#L63. "interface representation" is easy to construct, and is independent of non-interface changes.
When resolving dirty source files, non-test files are handled as usual, and test files are marked as dirty only if one of the following is true:
- its source hash or compilation settings have changed
- it depends on any test files which are dirty
- it depends on any non-test files which "interface representation" have changed

Notes

All of this could be done better if obtaining solc AST was cheaper. AST we receive from solc is very useful because it provides referencedDeclaration keys which allow us to detect names of items a specific file/contract depends on which could be useful to do caching smarter by filtering out unused imports. solang_parser is much cheaper to invoke, but does not provide this.
This introduces fundamental change to how foundry treats test/script files. Right now we treat all sources equally during compilation, and anything that user can do in src/, can be also done in test/. IMO this separation makes sense, as it makes it explicit which contracts are compiled for deployment vs for foundry EVM

mattsse commented 2 months ago

This can be made possible by avoiding solc mechanism of bytecode linking. Instead of linking bytecode at compile-time, we can link it at runtime, and we already have cheatcodes for that (deployCode, getCode).

Basic idea is to add a preprocessor translating all type(Contract).creationCode into vm.getCode invocations, and all new Contract(...) into vm.deployCode.

is this a strict requirement for this feature or should we recommend using vm.getCode? would this also preprocess non-test code?

klkvr commented 2 months ago

is this a strict requirement for this feature or should we recommend using vm.getCode?

using getCode is not type-safe and won't work like this if project does not divide each contract into interface and implementation. the idea here is to provide a way to keep type-safety and avoid refactoring whille still linking bytecode dynamically

would this also preprocess non-test code?

it should not. I don't think we should mess with metadata, so src/ contracts are not touched at all

gakonst commented 2 months ago

All of this could be done better if obtaining solc AST was cheaper. AST we receive from solc is very useful because it provides referencedDeclaration keys which allow us to detect names of items a specific file/contract depends on which could be useful to do caching smarter by filtering out unused imports. solang_parser is much cheaper to invoke, but does not provide this.

Should this be the first use case of sulk @DaniPopes / something to prio?

foundry-rs / compilers