Closed Andarwinux closed 9 months ago
I've made a journey with LTO many years ago, and the experience still hurts. What hurt most was the LTO metadata using a format that wasn't compatible between toolchain releases, requiring to rebuild every object after compiler updates. This isn't an issue in CI, but it is burden for local testing. Another issue was sensitivity to 3rd party issues (e.g. bumped into an OpenSSL function that was declared and defined differently, surfacing only when doing LTO). And slow link times.
Things may have improved and it wasn't ThinLTO yet back then, so we might give it a cautioius retry!
But, we do have to strip static libs to remove the non-reproducible bits from them, because we ship them in the curl-for-win package.
It seems CI bumped into this right away:
+ llvm-strip-17 --enable-deterministic-archives --strip-debug _a64-linux-gnu/usr/lib/libz.a
llvm-strip-17: error: '_a64-linux-gnu/usr/lib/libz.a(adler32.c.o)': The file was not recognized as a valid object file
https://github.com/curl/curl-for-win/actions/runs/7128972278/job/19411993913#step:3:2843
Do you see any way around this? Enabling this for a future LLVM version perhaps?
e.g. bumped into an OpenSSL function that was declared and defined differently, surfacing only when doing LTO
I'm currently working on adding ThinLTO support to mpv-winbuild-cmake, and with hundreds of packages including OpenSSL compiling without problems, perhaps the problems of the past no longer exist.
the LTO metadata using a format that wasn't compatible between toolchain releases, requiring to rebuild every object after compiler updates.
This is indeed an issue, but if LTO is an option then it shouldn't be a problem, many people (including me) use your script to build curl locally and don't care about compatibility and reproducibility.
And slow link times.
LTO build time can be optimized.
Do you see any way around this? Enabling this for a future LLVM version perhaps?
It can be compiled with -ffat-lto-object
, but this requires llvm18 and currently only support ELF.
e.g. bumped into an OpenSSL function that was declared and defined differently, surfacing only when doing LTO
I'm currently working on adding ThinLTO support to mpv-winbuild-cmake, and with hundreds of packages including OpenSSL compiling without problems, perhaps the problems of the past no longer exist.
That particular one is fixed by now, but it was a process. Perhaps the overall situation improved since, as more projects are testing LTO?
This is indeed an issue, but if LTO is an option then it shouldn't be a problem, many people (including me) use your script to build curl locally and don't care about compatibility and reproducibility.
I like to keep local and production toolchain config the same, otherwise it's inevitable that something breaks in one build path or the other. Each toggle doubles build combinations needing maintenance.
It's an option nevertheless (no pun intended)!
- For dependencies, only build static libraries, not dynamic libraries and executables, so linker LTO is eventually executed only once.
curl-for-win does that. From time to time I enable executables in dependencies for experiments, but speed isn't a top issue there, assuming they build.
- Skip all optimizations in the configure phase. For low performance machines (like Action), a significant portion of the build time is spent on dependency testing during the configure step. Simply make a wrapper for lld and disable all optimizations during the configure step. For full static library LTO build, this will drastically cut down on dependency testing time. ref: https://github.com/shinchiro/mpv-winbuild-cmake/blob/master/toolchain/llvm/llvm-ld.in#L10 https://github.com/Andarwinux/mpv-winbuild/actions/runs/7096141055/job/19314180070
Good point. It's a long battle in these builds. E.g. curl autotools probably still spends more time configuring than actually building, even after fixing a lot of perf issues in the former. A linker wrapper might be a workaround, but also an extra layer of complexity.
(I wonder if there is a CMake incantation that passes options solely to the build targets and not to feature checks? (and/or the same for autotools?)
Do you see any way around this? Enabling this for a future LLVM version perhaps?
It can be compiled with
-ffat-lto-object
, but this requires llvm18 and currently only support ELF.
Do you have a link that tracks the ThinLTO llvm-strip issue?
I've looked up history and it turns out Andy Polyakov (from OpenSSL) wasn't recommending LTO back then: https://github.com/openssl/openssl/issues/625#issuecomment-180747153 8f605c624349dbe5fb07c2a521db52d139856aac
Here's another note which fixes a fallout from fat-lto-objects which happen to use a "random-seed" that broke reproducibility: bd44218996910470be2c926f03619207f223a20a
Last but not least what we need to see if there is a real benefit for LTO for curl and libcurl, in runtime perf, compared to the size changes in binaries, build times, and maintenance needs.
I like to keep local and production toolchain config the same, otherwise it's inevitable that something breaks in one build path or the other. Each toggle doubles build combinations needing maintenance.
We can still test LTO and non-LTO builds in CI. Non-LTO builds don't even need to be tested, because if the LTO build is fine, then the non-LTO build is usually fine, but just because the non-LTO build is fine doesn't mean the LTO build is fine (I haven't encountered any counterexamples so far).
Good point. It's a long battle in these builds. E.g. curl autotools probably still spends more time configuring than actually building, even after fixing a lot of perf issues in the former. A linker wrapper might be a workaround, but also an extra layer of complexity.
This is actually the most convenient, works across build systems, can be target specific, can add specific flags in specific places on the command line arguments to override those added by the build system, and eliminates the need to add -Wl
, and can add some special COFF flags without having to deal with escaping (like -Xlink=-guard:cf,ehcont
).
(I wonder if there is a CMake incantation that passes options solely to the build targets and not to feature checks? (and/or the same for autotools?)
I've searched for this, but haven't found any useful information. Some well-maintained packages offer the option to enable LTO, which only works on the build step, which is perfect, but not all packages do, and some only support adding --enable-lto
but not --enable-lto=thin
, mixing FullLTO and ThinLTO loses the parallelism advantage of ThinLTO.
Do you have a link that tracks the ThinLTO llvm-strip issue?
Unfortunately, no, most of them are about distro build systems, and the solution is either to disable LTO or to disable strip.
Last but not least what we need to see if there is a real benefit for LTO for curl and libcurl, in runtime perf, compared to the size changes in binaries, build times, and maintenance needs.
For use cases such as curl, the performance advantage may not be significant, and most of the overhead is likely to be in network I/O. However, LTO can enhance the effect of --icf
and --gc-sections
to further reduce the size, and in addition, make -fsanitize=cfi
possible.
Here's a test build with ThinLTO: https://github.com/curl/curl-for-win/actions/runs/7131676251
The results mirror my memories: All binaries grew, by a large margin.
Static libs are not as larger as I remember with fat LTO, but still add
to the already inflated sizes (after --gc-sections
).
I agree with you that for curl, the bottleneck is not here. It's rather in the network code, protocol implementation (which has seen a lot of streamlining this year, and more is coming up), DNS resolver and parallelization.
This and the test binaries are not convincing me that curl-for-win is benefitting from LTO.
There was also a fallout with macOS + llvm where the linker is missing some LibreSSL functions. Here's the patch if you'd like to experiment:
--- a/_build.sh
+++ b/_build.sh
@@ -973,6 +973,11 @@ build_single_target() {
fi
fi
+ if [[ "${_CC}" = 'llvm' && ( "${_OS}" != 'mac' || "${_TOOLCHAIN}" = 'llvm-apple' ) ]]; then
+ _CFLAGS_GLOBAL+=' -flto=thin'
+ _CXXFLAGS_GLOBAL+=' -flto=thin'
+ fi
+
# curl recommended options to reduce binary size:
# https://github.com/curl/curl/blob/master/docs/INSTALL.md#reducing-size
@@ -1309,6 +1314,7 @@ build_single_target() {
_STRIP_LIB="${_STRIP_BIN}"
_STRIPFLAGS_LIB='--enable-deterministic-archives --strip-debug'
fi
+ _STRIP_LIB='echo'
export _OBJDUMP="${_BINUTILS_PREFIX}objdump${_BINUTILS_SUFFIX}"
export _READELF="${_BINUTILS_PREFIX}readelf${_BINUTILS_SUFFIX}" # symlink to llvm-readobj in llvm
if [ "${_OS}" = 'win' ]; then
This is due to more inline, and things should be different if you add -Os
as well.
Test runs:
-Os
+ -flto=thin
seems like an interesting combination. It's hard to
guess what comes out of optimizing for speed and performance at the
same time.
I'm considering adding LTO as an option. Reproducibiliy might suffer, though llvm-17 seems to create reproducible static libs for the default build config for Windows. I haven't verified other platforms. The list of other caveats may apply, so I consider this experimental, till all these questions clear up.
I will also push a small
config option to build with -Os
instead of -O3
.
It can be combined with thinlto
.
My preference goes to -Os
, question if it causes any performance
regressions (and how this may compare to the ThinLTO version). FWIW
I'm not aware of a curl performance suite, but maybe it exists.
It'd also be useful to know what is the deal with building the TLS/crypto backend with LTO, i.e. how does it play with its security propertis. It'd be similarly useful to know some official info if llvm indeed creates reproducible static libs by default, and for what platforms/formats.
Also worth nothing that we already do some "quasi-LTO" with libcurl, curl and libssh2, since these projects are compiled from single source file (using CMake's "unity" feature).
Clang ThinLTO now work fine on Linux and MinGW, just add
-flto=thin
, but there is currently an issue where executing strip with static libraries built with LTO may fail, which makes it impossible to build curl with LTO/CFI (-fsanitize=cfi
). Therefore, we should remove strip from the build script and strip only for curl executables or dynamic libraries.