This change implements the multi-threading scheme described in #2293.
We partition the incoming into 8 modules, regardless of the --parallelism setting, which achieves the desired determinism characteristics. These modules are compiled on multiple threads both when generating LLVM and when generating machine code (i. e. invoking clang).
The peformance results of this work can be summarized as "modest", though some gains, especially in Release builds, are quite visible (up to 2x speedups on my machine). For yet unclear reasons, parallelism inside ILC itself is not fully utilized, however, that is also not the bottleneck of the build, as especially in Debug a lot of time is spent in wasm-esmcripten-finalize.exe. Time is also spent with the disk interactions - ideally the next step would be to move the Clang part of the build in-process (or do something like thin-LTO).
I have put together a small sheet of the results as they apply to HelloWasm on my machine:
Module count
1
2
3
4
5
6
7
8
NAOT ILC / 8
Total compilation time (Release)
01:03.3
42.0
37.8
33.7
33.1
32.8
30.9
30.3
26.6
01:04.7
41.9
39.2
33.1
32.5
32.9
31.6
30.1
27.6
Bitcode compilation time (Release)
6.66
5.36
5.04
4.75
4.58
4.57
4.45
4.55
3.84
7.72
5.34
4.84
4.78
5
4.6
4.46
4.53
3.71
WASM file size
4.672
4.537
4.5
4.483
4.467
4.451
4.457
4.455
Total compilation time (Debug)
1:05.9
53.4
51.8
49.6
49.3
46.6
1:04.9
53.1
52.9
44.7
Bitcode compilation time (Debug)
13.65
11.05
10.72
10.14
9.92
5.62
13.82
11.12
10.65
5.31
Note that the measurement were done using a CoreCLR-based ILC (except for the NAOT / 8 column).
This change implements the multi-threading scheme described in #2293.
We partition the incoming into
8
modules, regardless of the--parallelism
setting, which achieves the desired determinism characteristics. These modules are compiled on multiple threads both when generating LLVM and when generating machine code (i. e. invokingclang
).The peformance results of this work can be summarized as "modest", though some gains, especially in Release builds, are quite visible (up to 2x speedups on my machine). For yet unclear reasons, parallelism inside ILC itself is not fully utilized, however, that is also not the bottleneck of the build, as especially in Debug a lot of time is spent in
wasm-esmcripten-finalize.exe
. Time is also spent with the disk interactions - ideally the next step would be to move the Clang part of the build in-process (or do something like thin-LTO).I have put together a small sheet of the results as they apply to
HelloWasm
on my machine:Note that the measurement were done using a CoreCLR-based ILC (except for the
NAOT / 8
column).