This is a common practice to reduce compile times since it allows Cargo to compile more packages in parallel if they don't depend on each other and in general less needs to be recompiled after small directed changes to individual crates as far as I could make out from playing around.
AFAIK in theory, incremental compilation (which applies intra-crate only(?)) should take care of that but in reality rustc's algorithm doesn't seem to able to handle it that well (yet).
From looking at the report generated by cargo build --timings, Cargo isn't able / doesn't want to parallelize more rustc invocations but that's fine – the improvements to “incremental” comptimes far outweigh this! I minimized the dependencies of individual crates / workspace members as far as possible (and reasonable) (e.g. removing the session dependency from the lexer and the parser) and tried to balance DRY and code sharing a bit (e.g. inlining utilities::Lexer which would otherwise require a separate crate to avoid dependency cycles) since there is some overhead with each crate. Ofc, that's gonna decrease in significance the larger each crate grows over time.
Mid-development I did extensive experimenting by touching individual files in pre-polycrate lushui and post-polycrate lushui and comparing times. Of course that's far from thorough and proper.
However, the polycrate approach leads to significantly faster comptimes for “end-user” crates (crates closer to the driver crate in the dependency tree) which is pretty important. There are comptime regressions for “supplier” crates (like span, diagnostic) but obviously by their very nature as “suppliers” they aren't changed as often. I call this a great success!
Additionally, we now use the mold linker if available. I did some testing and it really is a lot faster!
For posterity, here are some performance figures I recorded whilst working on this PR (when I wasn't finished trying to get rid of cyclic dependencies (that took several days)):
Some performance figures
t. = touching
## Master
* all feat
* full: 1m 36s
* incr, t. diagnostics: 13.61s
* incr, t. lib: 13.64s
* incr, t. main: 13.17s
* all feat, clang/mold
* full: 1m 25s
* incr, t. cranelift: 3.53s
* incr, t. diagnostics: 4.33s
* incr, t. documenter: 3.99s
* incr, t. lib: 3.95s
* incr, t. main: 3.96s
* incr, t. llvm: 3.36s
* incr, t. resolver: 4.11s
* incr, t. server: 3.50s
* incr, t. span: 4.11s
* incr, t. typer: 3.28s
* no feat, clang/mold
* full: 26.55s
* incr, t. cranelift: n/a
* incr, t. diagnostics: 2.06s
* incr, t. documenter: 2.42s
* incr, t. lib: 2.18s
* incr, t. main: 1.85s
* incr, t. llvm: n/a
* incr, t. resolver: 2.26s
* incr, t. server: n/a
* incr, t. span: 1.85s
* incr, t. typer: 2.34s
## Polypackage
* all feat
* full: 1m 34s
* incr, t. diagnostics: 13.89s
* incr, t. driver: 9.86s ~ 10.93s
* all feat, clang/mold
* full: 1m 23s
* incr, t. cranelift: 2.33s
* incr, t. diagnostics: 4.89s ~ 6.37s
* incr, t. documenter: 2.51s
* incr, t. driver: 2.17s ~ 2.25s
* incr, t. llvm: 2.44s ~ 2.56s
* incr, t. resolver: 2.94s
* incr, t. server: 2.43s
* incr, t. span: 5.47s ~ 5.84s
* incr, t. typer: 3.35s
* no feat, clang/mold
* full: 26.16s
* incr, t. cranelift: n/a
* incr, t. diagnostics: 3.98s
* incr, t. documenter: 1.01s
* incr, t. driver: 0.80s
* incr, t. llvm: n/a
* incr, t. resolver: 1.34s
* incr, t. server: n/a
* incr, t. span: 3.46s
* incr, t. typer: 0.94s
This is a common practice to reduce compile times since it allows Cargo to compile more packages in parallel if they don't depend on each other and in general less needs to be recompiled after small directed changes to individual crates as far as I could make out from playing around. AFAIK in theory, incremental compilation (which applies intra-crate only(?)) should take care of that but in reality rustc's algorithm doesn't seem to able to handle it that well (yet). From looking at the report generated by
cargo build --timings
, Cargo isn't able / doesn't want to parallelize morerustc
invocations but that's fine – the improvements to “incremental” comptimes far outweigh this! I minimized the dependencies of individual crates / workspace members as far as possible (and reasonable) (e.g. removing thesession
dependency from the lexer and the parser) and tried to balance DRY and code sharing a bit (e.g. inliningutilities::Lexer
which would otherwise require a separate crate to avoid dependency cycles) since there is some overhead with each crate. Ofc, that's gonna decrease in significance the larger each crate grows over time.Mid-development I did extensive experimenting by
touch
ing individual files in pre-polycratelushui
and post-polycratelushui
and comparing times. Of course that's far from thorough and proper. However, the polycrate approach leads to significantly faster comptimes for “end-user” crates (crates closer to the driver crate in the dependency tree) which is pretty important. There are comptime regressions for “supplier” crates (likespan
,diagnostic
) but obviously by their very nature as “suppliers” they aren't changed as often. I call this a great success!Additionally, we now use the mold linker if available. I did some testing and it really is a lot faster!
For posterity, here are some performance figures I recorded whilst working on this PR (when I wasn't finished trying to get rid of cyclic dependencies (that took several days)):
Some performance figures
t. = touching ## Master * all feat * full: 1m 36s * incr, t. diagnostics: 13.61s * incr, t. lib: 13.64s * incr, t. main: 13.17s * all feat, clang/mold * full: 1m 25s * incr, t. cranelift: 3.53s * incr, t. diagnostics: 4.33s * incr, t. documenter: 3.99s * incr, t. lib: 3.95s * incr, t. main: 3.96s * incr, t. llvm: 3.36s * incr, t. resolver: 4.11s * incr, t. server: 3.50s * incr, t. span: 4.11s * incr, t. typer: 3.28s * no feat, clang/mold * full: 26.55s * incr, t. cranelift: n/a * incr, t. diagnostics: 2.06s * incr, t. documenter: 2.42s * incr, t. lib: 2.18s * incr, t. main: 1.85s * incr, t. llvm: n/a * incr, t. resolver: 2.26s * incr, t. server: n/a * incr, t. span: 1.85s * incr, t. typer: 2.34s ## Polypackage * all feat * full: 1m 34s * incr, t. diagnostics: 13.89s * incr, t. driver: 9.86s ~ 10.93s * all feat, clang/mold * full: 1m 23s * incr, t. cranelift: 2.33s * incr, t. diagnostics: 4.89s ~ 6.37s * incr, t. documenter: 2.51s * incr, t. driver: 2.17s ~ 2.25s * incr, t. llvm: 2.44s ~ 2.56s * incr, t. resolver: 2.94s * incr, t. server: 2.43s * incr, t. span: 5.47s ~ 5.84s * incr, t. typer: 3.35s * no feat, clang/mold * full: 26.16s * incr, t. cranelift: n/a * incr, t. diagnostics: 3.98s * incr, t. documenter: 1.01s * incr, t. driver: 0.80s * incr, t. llvm: n/a * incr, t. resolver: 1.34s * incr, t. server: n/a * incr, t. span: 3.46s * incr, t. typer: 0.94s