curl / curl-for-win

Reproducible curl binaries for Linux, macOS and Windows
https://curl.se/windows/
MIT License
686 stars 207 forks source link

LTO support #59

Closed Andarwinux closed 9 months ago

Andarwinux commented 9 months ago

Clang ThinLTO now work fine on Linux and MinGW, just add -flto=thin, but there is currently an issue where executing strip with static libraries built with LTO may fail, which makes it impossible to build curl with LTO/CFI (-fsanitize=cfi). Therefore, we should remove strip from the build script and strip only for curl executables or dynamic libraries.

vszakats commented 9 months ago

I've made a journey with LTO many years ago, and the experience still hurts. What hurt most was the LTO metadata using a format that wasn't compatible between toolchain releases, requiring to rebuild every object after compiler updates. This isn't an issue in CI, but it is burden for local testing. Another issue was sensitivity to 3rd party issues (e.g. bumped into an OpenSSL function that was declared and defined differently, surfacing only when doing LTO). And slow link times.

Things may have improved and it wasn't ThinLTO yet back then, so we might give it a cautioius retry!

But, we do have to strip static libs to remove the non-reproducible bits from them, because we ship them in the curl-for-win package.

It seems CI bumped into this right away:

+ llvm-strip-17 --enable-deterministic-archives --strip-debug _a64-linux-gnu/usr/lib/libz.a
llvm-strip-17: error: '_a64-linux-gnu/usr/lib/libz.a(adler32.c.o)': The file was not recognized as a valid object file

https://github.com/curl/curl-for-win/actions/runs/7128972278/job/19411993913#step:3:2843

Do you see any way around this? Enabling this for a future LLVM version perhaps?

Andarwinux commented 9 months ago

e.g. bumped into an OpenSSL function that was declared and defined differently, surfacing only when doing LTO

I'm currently working on adding ThinLTO support to mpv-winbuild-cmake, and with hundreds of packages including OpenSSL compiling without problems, perhaps the problems of the past no longer exist.

the LTO metadata using a format that wasn't compatible between toolchain releases, requiring to rebuild every object after compiler updates.

This is indeed an issue, but if LTO is an option then it shouldn't be a problem, many people (including me) use your script to build curl locally and don't care about compatibility and reproducibility.

And slow link times.

LTO build time can be optimized.

  1. For dependencies, only build static libraries, not dynamic libraries and executables, so linker LTO is eventually executed only once.
  2. Skip all optimizations in the configure phase. For low performance machines (like Action), a significant portion of the build time is spent on dependency testing during the configure step. Simply make a wrapper for lld and disable all optimizations during the configure step. For full static library LTO build, this will drastically cut down on dependency testing time. ref: https://github.com/shinchiro/mpv-winbuild-cmake/blob/master/toolchain/llvm/llvm-ld.in#L10 https://github.com/Andarwinux/mpv-winbuild/actions/runs/7096141055/job/19314180070

Do you see any way around this? Enabling this for a future LLVM version perhaps?

It can be compiled with -ffat-lto-object, but this requires llvm18 and currently only support ELF.

vszakats commented 9 months ago

e.g. bumped into an OpenSSL function that was declared and defined differently, surfacing only when doing LTO

I'm currently working on adding ThinLTO support to mpv-winbuild-cmake, and with hundreds of packages including OpenSSL compiling without problems, perhaps the problems of the past no longer exist.

That particular one is fixed by now, but it was a process. Perhaps the overall situation improved since, as more projects are testing LTO?

This is indeed an issue, but if LTO is an option then it shouldn't be a problem, many people (including me) use your script to build curl locally and don't care about compatibility and reproducibility.

I like to keep local and production toolchain config the same, otherwise it's inevitable that something breaks in one build path or the other. Each toggle doubles build combinations needing maintenance.

It's an option nevertheless (no pun intended)!

  1. For dependencies, only build static libraries, not dynamic libraries and executables, so linker LTO is eventually executed only once.

curl-for-win does that. From time to time I enable executables in dependencies for experiments, but speed isn't a top issue there, assuming they build.

  1. Skip all optimizations in the configure phase. For low performance machines (like Action), a significant portion of the build time is spent on dependency testing during the configure step. Simply make a wrapper for lld and disable all optimizations during the configure step. For full static library LTO build, this will drastically cut down on dependency testing time. ref: https://github.com/shinchiro/mpv-winbuild-cmake/blob/master/toolchain/llvm/llvm-ld.in#L10 https://github.com/Andarwinux/mpv-winbuild/actions/runs/7096141055/job/19314180070

Good point. It's a long battle in these builds. E.g. curl autotools probably still spends more time configuring than actually building, even after fixing a lot of perf issues in the former. A linker wrapper might be a workaround, but also an extra layer of complexity.

(I wonder if there is a CMake incantation that passes options solely to the build targets and not to feature checks? (and/or the same for autotools?)

Do you see any way around this? Enabling this for a future LLVM version perhaps?

It can be compiled with -ffat-lto-object, but this requires llvm18 and currently only support ELF.

Do you have a link that tracks the ThinLTO llvm-strip issue?

I've looked up history and it turns out Andy Polyakov (from OpenSSL) wasn't recommending LTO back then: https://github.com/openssl/openssl/issues/625#issuecomment-180747153 8f605c624349dbe5fb07c2a521db52d139856aac

Here's another note which fixes a fallout from fat-lto-objects which happen to use a "random-seed" that broke reproducibility: bd44218996910470be2c926f03619207f223a20a

Last but not least what we need to see if there is a real benefit for LTO for curl and libcurl, in runtime perf, compared to the size changes in binaries, build times, and maintenance needs.

Andarwinux commented 9 months ago

I like to keep local and production toolchain config the same, otherwise it's inevitable that something breaks in one build path or the other. Each toggle doubles build combinations needing maintenance.

We can still test LTO and non-LTO builds in CI. Non-LTO builds don't even need to be tested, because if the LTO build is fine, then the non-LTO build is usually fine, but just because the non-LTO build is fine doesn't mean the LTO build is fine (I haven't encountered any counterexamples so far).

Good point. It's a long battle in these builds. E.g. curl autotools probably still spends more time configuring than actually building, even after fixing a lot of perf issues in the former. A linker wrapper might be a workaround, but also an extra layer of complexity.

This is actually the most convenient, works across build systems, can be target specific, can add specific flags in specific places on the command line arguments to override those added by the build system, and eliminates the need to add -Wl, and can add some special COFF flags without having to deal with escaping (like -Xlink=-guard:cf,ehcont).

(I wonder if there is a CMake incantation that passes options solely to the build targets and not to feature checks? (and/or the same for autotools?)

I've searched for this, but haven't found any useful information. Some well-maintained packages offer the option to enable LTO, which only works on the build step, which is perfect, but not all packages do, and some only support adding --enable-lto but not --enable-lto=thin, mixing FullLTO and ThinLTO loses the parallelism advantage of ThinLTO.

Do you have a link that tracks the ThinLTO llvm-strip issue?

Unfortunately, no, most of them are about distro build systems, and the solution is either to disable LTO or to disable strip.

Last but not least what we need to see if there is a real benefit for LTO for curl and libcurl, in runtime perf, compared to the size changes in binaries, build times, and maintenance needs.

For use cases such as curl, the performance advantage may not be significant, and most of the overhead is likely to be in network I/O. However, LTO can enhance the effect of --icf and --gc-sections to further reduce the size, and in addition, make -fsanitize=cfi possible.

vszakats commented 9 months ago

Here's a test build with ThinLTO: https://github.com/curl/curl-for-win/actions/runs/7131676251

The results mirror my memories: All binaries grew, by a large margin. Static libs are not as larger as I remember with fat LTO, but still add to the already inflated sizes (after --gc-sections).

I agree with you that for curl, the bottleneck is not here. It's rather in the network code, protocol implementation (which has seen a lot of streamlining this year, and more is coming up), DNS resolver and parallelization.

This and the test binaries are not convincing me that curl-for-win is benefitting from LTO.

There was also a fallout with macOS + llvm where the linker is missing some LibreSSL functions. Here's the patch if you'd like to experiment:

--- a/_build.sh
+++ b/_build.sh
@@ -973,6 +973,11 @@ build_single_target() {
     fi
   fi

+  if [[ "${_CC}" = 'llvm' && ( "${_OS}" != 'mac' || "${_TOOLCHAIN}" = 'llvm-apple' ) ]]; then
+    _CFLAGS_GLOBAL+=' -flto=thin'
+    _CXXFLAGS_GLOBAL+=' -flto=thin'
+  fi
+
   # curl recommended options to reduce binary size:
   # https://github.com/curl/curl/blob/master/docs/INSTALL.md#reducing-size

@@ -1309,6 +1314,7 @@ build_single_target() {
     _STRIP_LIB="${_STRIP_BIN}"
     _STRIPFLAGS_LIB='--enable-deterministic-archives --strip-debug'
   fi
+  _STRIP_LIB='echo'
   export _OBJDUMP="${_BINUTILS_PREFIX}objdump${_BINUTILS_SUFFIX}"
   export _READELF="${_BINUTILS_PREFIX}readelf${_BINUTILS_SUFFIX}"  # symlink to llvm-readobj in llvm
   if [ "${_OS}" = 'win' ]; then
Andarwinux commented 9 months ago

This is due to more inline, and things should be different if you add -Os as well.

vszakats commented 9 months ago

Test runs:

-Os + -flto=thin seems like an interesting combination. It's hard to guess what comes out of optimizing for speed and performance at the same time.

I'm considering adding LTO as an option. Reproducibiliy might suffer, though llvm-17 seems to create reproducible static libs for the default build config for Windows. I haven't verified other platforms. The list of other caveats may apply, so I consider this experimental, till all these questions clear up.

I will also push a small config option to build with -Os instead of -O3. It can be combined with thinlto.

My preference goes to -Os, question if it causes any performance regressions (and how this may compare to the ThinLTO version). FWIW I'm not aware of a curl performance suite, but maybe it exists.

It'd also be useful to know what is the deal with building the TLS/crypto backend with LTO, i.e. how does it play with its security propertis. It'd be similarly useful to know some official info if llvm indeed creates reproducible static libs by default, and for what platforms/formats.

Also worth nothing that we already do some "quasi-LTO" with libcurl, curl and libssh2, since these projects are compiled from single source file (using CMake's "unity" feature).