Open akien-mga opened 2 months ago
Toolchains:
Toolchains:
Toolchains:
Toolchains:
Apple clang version 15.0.0 (clang-1500.3.9.4)
Target: arm64-apple-darwin23.6.0
scons target=template_release arch=arm64 platform=macos production=yes lto=*
LTO | Build Time | Peak memory usage | Executable size |
---|---|---|---|
none | 7:10 | sub 1G | 68.082.840 |
thin | 9:45 | ~ 2.5G | 74.674.240 |
full | 19:26 | ~ 12G [^1] | 66.936.680 |
[^1]: Mostly around 6G with a spike at the end of linking.
scons target=template_debug arch=arm64 platform=macos production=yes lto=*
LTO | Build Time | Peak memory usage | Executable size |
---|---|---|---|
none | 9:51 | sub 1G | 71.861.672 |
thin | 13:28 | ~ 2.5G | 84.408.856 |
full | 42:52 [^2] | ~ 18G | 74.752.334 |
[^2]: A lot of swap usage, so time is not directly comparable.
Toolchains:
Toolchains:
scons target=template_release production=yes use_llvm=yes lto=*
LTO | Build Time | Executable size |
---|---|---|
none | 04:09.81 | 58,270,720 |
thin | 04:46.66 | 68,056,064 |
full | N/A[^1] | N/A |
[^1]: Attempted to build for ~20 minutes before erroring out.
scons target=template_debug production=yes use_llvm=yes lto=*
LTO | Build Time | Executable size |
---|---|---|
none | 04:12.88 | 73,480,704 |
thin | 05:03.34 | 86,334,976 |
full | N/A | N/A |
scons target=template_release production=yes use_llvm=yes use_mingw=yes lto=*
LTO | Build Time | Executable size |
---|---|---|
none | 04:36.49 | 63,627,776 |
thin | 04:55.96 | 73,898,496 |
full | 14:38.60 | 70,736,896 |
scons target=template_debug production=yes use_llvm=yes use_mingw=yes lto=*
LTO | Build Time | Executable size |
---|---|---|
none | 04:37.85 | 68,219,392 |
thin | 05:14.51 | 79,650,304 |
full | 15:46.82 | 76,373,504 |
Something to bear in mind with LTO :
SCU builds will likely get the lions share of the benefit, without needing LTO. This is because they push a bunch of files into the same translation unit, which means that the compiler can optimize across cpps (which afaik is what LTO offers, the more convoluted way around).
We so far haven't used them in production, but it's worth mentioning as an alternative (no idea one how their size compares in release, or performance).
We so far haven't used them in production, but it's worth mentioning as an alternative (no idea one how their size compares in release, or performance).
Using SCU builds for fully optimized release builds can need a lot of RAM (I've measured 22 GB for the build process alone on Linux x86_64), so this is to keep in mind. That said, the release build server has plenty of RAM to spare.
SCU builds will likely get the lions share of the benefit, without needing LTO. This is because they push a bunch of files into the same translation unit, which means that the compiler can optimize across cpps (which afaik is what LTO offers, the more convoluted way around).
scons target="editor" use_llvm="yes" lto="none"
I've just ran two builds. One with SCU and another without. Both with LLVM and without LTO.
Performance impact: ??? (Didn't test) Size difference: ~6KB
Godot's SCU is not one creating one big file from all the files but just gluing files into bigger files but still produces many files not one big... Not to mention that lots of files are build as usually even with SCU build.
At the same time LTO is performed on final executable (on "all" the files at once). So in general SCU can't compete with LTO. While SCU possible gives some performance impact I think it is negligible though I didn't test performace
We so far haven't used them in production, but it's worth mentioning as an alternative (no idea one how their size compares in release, or performance).
Using SCU builds for fully optimized release builds can need a lot of RAM (I've measured 22 GB for the build process alone on Linux x86_64), so this is to keep in mind. That said, the release build server has plenty of RAM to spare.
I have a low-end device so I have only 4 threads. RAM usage greatly depends on amount of parallel threads. I saw peaks at 6GB with SCU (part of it is firefox ~1.4GB)
If you are going to build with SCU only release builds provided to user (I mean end user who compiles custom template for game). I think it is enough bearable to use less threads to use less RAM. So if SCU can really give impact, it'd be reasonable to mention SCU as a tool for optimization
For years we've operated under the assumption that LTO (Link Time Optimization) is a net positive for production builds as it would:
The drawback is much longer build times, hence why it's only used for production builds/official releases.
Now findings in #96785 suggest that the reduction in build size is only true for GCC's LTO, and not for LLVM LTO (whether "full" LTO
-flto
or ThinLTO-flto=thin
). With LLVM LTO there's a significant size increase for platforms we tested so far (Web, Android, Linux) of up to +15%. For the Web (currently using LTO for official builds) and Android (not using it for now) this is significant.So it's time we do a thorough review of build flags for all targets and compilers and make sure we're actually using the best configuration possible for official builds.
I'll post successive replies for each Godot target platform so we can use these posts (maintainers are welcome to edit my posts) to keep track of metrics and findings for each platform individually. If that turns out to be too unwieldy we can fork this issue in one issue per platform, but I expect we'll find closely related behavior across platforms who share a compiler toolchain (GCC, LLVM, MSVC).
@godotengine/buildsystem @godotengine/android @godotengine/ios @godotengine/linux-bsd @godotengine/macos @godotengine/web @godotengine/windows