bazelbuild / bazel

a fast, scalable, multi-language and extensible build system
https://bazel.build
Apache License 2.0
23.21k stars 4.06k forks source link

Android NDK, no Link-Time Optimization? #13287

Open cpsauer opened 3 years ago

cpsauer commented 3 years ago

Hi awesome Bazel folks,

I was debugging an NDK build and looking at the link commands being run. I happened to notice that dbg and opt builds are getting the exact same set of linker flags--no -flto (or -O#) flags in either case. That made me wonder if Bazel is accidentally omitting link-time optimization (LTO) in optimized builds.

[I'm assume Bazel (a) intends to have optimized (opt) builds run with LTO, and (b) that those flags would be show up in the link command dumped in --verbose_failures. Apologies if I've missed something somewhere and this is a false alarm, but then maybe we should clarify the docs and have consistency across platforms.]

Easiest way to repro should be the Bazel NDK example. (I'm using NDK21, macOS 11, Bazel 4, if that might matter.) I was looking at it by building with an invalid linker flag and --verbose_failures.

Thanks for taking a peek, Chris

cpsauer commented 2 years ago

Hey, so to clarify: This really is not a support question--I know how to enable LTO/stripping and had already done so.

Instead it's either a defaults bug or a doc request depending on Bazel's intent for --compilation_mode=opt.

opt meant to be shorthand for (fully) optimized binaries? -> Missing LTO flags when building with the Android NDK (among several other important missing flags, see #10837)

opt indeed meant to be per-translation-unit-optimization only? -> Docs should probably say that, since other platforms have interpreted it the other way.

jgavris commented 2 years ago

ThinLTO seems like a reasonable default, not much build time overhead on top of -O2 unless you are also including debug symbols. And in many benchmarks (but not on average), it performs better at runtime than regular LTO.

cpsauer commented 2 years ago

Good call on ThinLTO :)

jgavris commented 2 years ago

@cpsauer Thanks. I think it was intended to be opt-in using --features=thin_lto. As far as I can tell on Bazel 4.2.2 using NDK 19 (and macOS desktop builds), that doesn't pass -flto=thin when building (ex: passing --features=thin_lto doesn't affect previously cached builds without it, also looking at the compile commands with -s). So support for it in general may have broader scope / require another issue. Right now we're just tossing in this into a .bazelrc.release config file.

build --copt -flto=thin
build --linkopt -flto=thin
keith commented 2 years ago

The android crosstool doesn't actually have the thin_lto feature, and therefore doesn't support it. I think this is something we should add, similar to how it works in the unix toolchain (assuming the NDK's tools support the same flags)

jgavris commented 2 years ago

@keith @cpsauer should we add it / can we as part of the defaults for the opt config for the NDK? Same for iOS and macOS I believe, last time I peeked into the compile commands. ThinLTO on Clang has been around for years.

keith commented 2 years ago

We were looking at this yesterday and found that if you're trying to use -Os you couldn't actually use LTO as well on Android until NDK 22 https://github.com/android/ndk/issues/721 so I guess we're blocked on https://github.com/bazelbuild/bazel/issues/12889 for that part.

As far as opt-in vs opt-out, I imagine folks would want us to mirror the current behavior where it is supported, but I'm not sure, either way I think adding it to make it as easy as flipping a feature for all platforms would be great.

cpsauer commented 2 years ago

Keith with a blazingly fast, good crossref, as usual :) There's an interaction I didn't know about.

@keith, could I ask what breakage you saw? Same fatal linker error as in the issue you linked https://github.com/android/ndk/issues/721?

Asking bc we'd been running -flto and now -flto=thin for a while now on opt builds, but hadn't seen any problems. (Switched to -flto=thin as soon as @jgavris tipped me off that NDK r21's (non-LLVM) linker supported it.)

cpsauer commented 2 years ago

IMO, we should really turn on thin LTO by default for opt, if we can.

Lots of users will never get deeper than requesting an optimized build. And I think they'd be surprised that opt didn't turn on link time optimization by default, since it's a key optimization lever. And with ThinLTO, it seems like there are even fewer downsides.

jgavris commented 2 years ago

It's probably more likely to get through review and testing to just stick with implementing --features=thin_lto (opt-in)? opt hasn't had fat LTO by default even though it's been around for a long time because I'm guessing because using fat LTO at Google scale was not feasible (ThinLTO was primarily motivated by developing LLVM / Clang itself!).

jgavris commented 2 years ago

@cpsauer you are also passing -fuse-ld=lld to linkopts on NDK 19? It is not quite yet the default linker.

https://developer.android.com/ndk/downloads/revision_history?authuser=1

Developers should begin testing their apps with LLD. AOSP has switched to using LLD by default and the NDK will use it by default in the next release. BFD and Gold will be removed once LLD has been through a release cycle with no major unresolved issues (estimated r21). Test LLD in your app by passing -fuse-ld=lld when linking. Note: lld does not currently support compressed symbols on Windows. Issue 888. Clang also cannot generate compressed symbols on Windows, but this can be a problem when using artifacts built from Darwin or Linux.

keith commented 2 years ago

@keith, could I ask what breakage you saw? Same fatal linker error as in the issue you linked https://github.com/android/ndk/issues/721?

Yes this one.

Asking bc we'd been running -flto and now -flto=thin for a while now on opt builds, but hadn't seen any problems.

What optimization level are you using? we're explicitly passing -Osize in this case, which results in a smaller dylib in our case than if you use a compatible -O and LTO in my testing.

keith commented 2 years ago

Here's some thin_lto setup https://github.com/bazelbuild/bazel/pull/15039 for Android, still opt-in

cpsauer commented 2 years ago

Woo! Thanks for doing, Keith.

Keith, re -Os & linker errors:

Aha! I realized how we'd been talking past each other. Heretofore, I'd just been specifying --compilation_mode=opt alongside the LTO options, which (from aquery) was creating a mix of -Os and -O2 when compiling. After some of monkeying around trying to induce that error by specifying --copt=-Os, I realized that you must have instead been referring to -linkopt=-Os. (And -Osizeis a typo, right? Since that didn't seem to be recognized by either.) -linkopt=-Os did indeed induce that same error, and additionally passing -fuse-ld=lld just changed things over to an equivalent error from lld.

FWIW, I'm getting significantly smaller .so's with LTO (thin or full) than I do linking with -Os. Like 25% smaller or so. (Differs, of course by the characteristics of the call graph!)

@keith, do you understand how (or if) the linker picks up optimization flags that were passed during compilation? (Asking because it seems like the output binary is the same with or without the link line containing -O2, the optimization level used to compile. But it differs if other optimizations levels are passed when linking.)

Jason,

I've been using NDK 21 (as the latest supported by Bazel), which seems to use LLVM-plugin-enabled gold. No -fuse-ld=lld originally, but I just tested it and it does work. I think I'll leave it on, since that's the default in newer versions.

Is there something backpinning you to NDK 19? (I'm noticing references to NDK 19 in lots of your comments.)

[Sorry guys for being a little slow one the reply this time around.]

keith commented 2 years ago

do you understand how (or if) the linker picks up optimization flags that were passed during compilation?

Interesting find! I would assume that bazel wouldn't treat them special at all, so if you didn't explicitly pass --linkopt=-Os you'd avoid it which I assume would be fine 🤔

jgavris commented 2 years ago

@cpsauer at the time I wrote this, we were still on Bazel 4.2.2 (I don't think 5.0 had officially released yet). 4.2.2 only supported up to r19c.

cpsauer commented 2 years ago

@keith: Pretty wild, right! I feel like I read something about this long ago, but can't find doc on it for the life of me. If you do ever run into it, I'd love it if you'd say something!

@jgavris: Makes sense. We've been using Bazelisk to stay at or near HEAD with rolling releases. It's been pretty sweet. Huge thanks to Keith for (among many things) helping the rolling story get working with mobile.

cpsauer commented 2 years ago

@gregestren, sorry to add one more thing to your plate, but could we remove the support tag? This really isn't.

cpsauer commented 2 years ago

Update: The thin_lto feature is supported by the Google NDK toolchains, which Alex released as a replacement for the old, built-in ones that had ben falling behind! https://github.com/bazelbuild/rules_android_ndk

I'm leaving this issue open because I really do want to propose that optimized builds ought to have (thin) link time optimization on by default.

@keith, I think the new toolchain should also solve the -O# linking issue you mentioned above, by virtue of being NDK >=r22! Do you know offhand if thin_lto is already supported by the Apple toolchains? I didn't see it on a quick pass, but I could have missed it.

keith commented 2 years ago

Do you know offhand if thin_lto is already supported by the Apple toolchains? I didn't see it on a quick pass, but I could have missed it.

don't think so, although just passing --copt=-flto=thin does mostly work from my testing.