Open brockelmore opened 2 years ago
I feel like the overhead will not be worth it in most cases. Individual contracts are usually so small that invoking solc is going to consume more time than solc compiling them.
This feature would make more sense in solc itself but I won't put it on the priority list even there.
I tend to agree that it would be cool but could be more trouble than it is worth, especially if we have compilation caching etc.
this is actually already in use for workspaces that require different solc versions https://github.com/gakonst/ethers-rs/pull/652
@mattsse bumping this, seems like next up in our priority list along with #769?
Marking this and #769 as high prio if I understand your comment correctly
Close-able? @mds1
@mattsse wdyt here? Slow compilation times are a big pain point these days so this could be valuable, and per https://github.com/foundry-rs/foundry/issues/166#issuecomment-1001475406 it sounds like it might not be too hard to implement
I believe this is not tractable: https://github.com/gakonst/ethers-rs/pull/943#issuecomment-1253022311. What makes compilation slow these days? Repos got bigger? Via-ir? Forge-std?
(Closing)
I believe this is not tractable: gakonst/ethers-rs#943 (comment). What makes compilation slow these days? Repos got bigger? Via-ir? Forge-std?
(Closing)
Fwiw, i do think for some repos it may be worthwhile dependent on the number of roots in the project. Where a root is a contract in which no other contract inherits or imports it (diagram below shows how we could split a project into two roots based on dependency graph). I generally believe this is tractable but the number of projects that need this may be few and far between? i.e. if I import my complex contract A, and use the contract as an interface elsewhere i still have to compile the contract in its entirety (but if the project uses interface contracts its probably worth splitting up). This is a different version of parallelization than trying to do any crazy linking. It is much more straightforward. I think it would only have a benefit where via-ir is on (also all of this should in theory be done at the compiler level, not in foundry...)
graph TD;
subgraph root_2
A-->B;
A-->C;
C-->G;
B-->H;
end
subgraph root_1
D-->E;
D-->F;
end
Hi all,
I'm revisiting the idea of parallel compilation in Foundry. After a discussion in one of the Solidity team's weekly meetings, it was indicated that they don't currently plan to implement parallel compilation in solc
. However, they suggested an alternative approach of running solc
for different files in parallel, rather than for the whole repo. With the shift towards via-ir
potentially becoming the default mode for Solidity compilation, I believe exploring parallel compilation approaches could become even more crucial for optimizing build times, especially for larger projects.
Since this issue was previously closed, has there been any new developments or considerations about integrating parallel compilation in Foundry? This could involve strategies like the one suggested by the Solidity team or other methods to effectively parallelize the build process. I recognize the complexity of this feature but am interested in whether there's room for further discussion or contributions.
Thanks for your work on Foundry!
no plans atm because the complexity overhead is quite huge.
but I can see that this would still be a great feature to have so reopening to not lose it
@gakonst option to compile files in parallel could be great and relatively easy solution.
Any marginal (even experimental opt-in) improvement would be very appreciated as my team and I are working on a major project with around 250 solidity files and it takes almost an hour to compile with via_ir
enabled…
Hey everyone! I've been tasked with exploring this topic in the context of improving the speed of compilation via IR. We (Solidity) wanted to know what the obstacles for this kind of parallelized compilation are and if we can do anything to make it easier for frameworks.
Theoretically, the parallelization is very simple already: take the Standard JSON input containing all the sources and split it into series of inputs where each one uses settings.outputSelection
to request output only for a single contract. The compiler will perform compilation and optimization only for the one you selected. It will still analyze all the sources, but the later stages of the pipeline are orders of magnitude slower than analysis so it should not matter that much.
To benchmark it, I created a proof of concept script (parasolc) that can be passed in place of a solc
binary to forge --use
. Here's also the full report with my findings: The parasolc experiment.
Now, as some have already suspected here, the overhead of doing it this way is just staggering. The projects I benchmarked require 3-4 times as much work compared to sequential compilation. The report attempts to explain where all that time is going, but the short of it is that this kind of parallelization makes the compiler repeat the same work multiple times in several ways (that may be different for different projects). Bytecode dependencies (contracts deployed with new
) can no longer be reused for contracts that depend on them and also the same sources are analyzed multiple times.
Still, while expensive, this method does provide an actual improvement in terms of wall-clock-time spent on compilation. Here are the numbers I got for an 8 core machine.
Benchmark | Real time | CPU time |
---|---|---|
OpenZeppelin (sequential) | 39 s | 39 s |
OpenZeppelin (parallelized) | 27 s | 144 s |
Uniswap v4 (sequential) | 166 s | 165 s |
Uniswap v4 (parallelized) | 76 s | 499 s |
It appears that with enough cores you can still come out ahead despite the overhead. While this is far from what I was hoping to present you here, and does not seem like a good choice for the default compilation mode, it's still a trade-off that may make sense in some situations. It may work better for some projects than others, depending on how they are structured and how interdependent their contracts are. The method is simple enough that it might make sense to be an optional feature.
I explored workarounds, like grouping bytecode dependencies together to improve reuse or culling of sources irrelevant to the contract being compiled. That's also in the report. Both unfortunately come with significant downsides and/or don't improve the situation as much as one would hope.
There's the idea described by @brockelmore above (and already used by e.g. Hardhat) of identifying independent clusters of contracts and compiling each cluster separately. It's largely orthogonal to what I explored here - it just shifts the problem inside the groups - and it's not effective against projects with tightly interconnected sources. Still, I think it's worthwhile when applicable, and we'd like to see more tools using it. It is not technically complicated and all the information is already out there, but it does require parsing the AST to identify the imports. If that's an obstacle, the one thing we could do to make it more straightforward would be to make the dependency graph between sources available as a separate output. Or even just outright assign each contract to a group based on it. Would Foundry make use of such an output?
Another idea, that would only work for IR though, would be to do unoptimized IR generation sequentially (i.e. request the ir
output for all contracts) and then only do optimization in parallel by compiling the produced Yul. Code generation is slower than analysis, but it's the optimization and Yul->EVM transform that are the real bottlenecks so it should provide some savings. The downside here is that it would be a little more complex than the naive method and only viable after we fix https://github.com/ethereum/solidity/issues/15062. It also only addresses the analysis overhead and would do nothing for bytecode dependencies.
In the longer term, there are things we could improve in the compiler, but I can't really tell if/when they will happen:
@cameel From the overhead and complexity added, it doesn't seem like full parallel processing is something to be completed in the short future. As Solidity is a very simple and repeatable (I believe) language, is there a way to flag in compile time "chunks" of code blocks that behave identically, where the compile execution process can be done in parallel with pointers? Or, is there a way to in parallel early-compile common contract operations? I don't know.
Requested Feature
For sufficiently large repos, solc compilation can be the slowest part of testing. In those cases we should multithread compilation as much as possible. Here is a rough sketch of a potential way to achieve this?
Suggested solution
There could be issues here that I am unaware of but documenting the thoughts I had anyway