Open cpsauer opened 3 years ago
Hey, so to clarify: This really is not a support question--I know how to enable LTO/stripping and had already done so.
Instead it's either a defaults bug or a doc request depending on Bazel's intent for --compilation_mode=opt
.
opt meant to be shorthand for (fully) optimized binaries? -> Missing LTO flags when building with the Android NDK (among several other important missing flags, see #10837)
opt indeed meant to be per-translation-unit-optimization only? -> Docs should probably say that, since other platforms have interpreted it the other way.
ThinLTO seems like a reasonable default, not much build time overhead on top of -O2
unless you are also including debug symbols. And in many benchmarks (but not on average), it performs better at runtime than regular LTO.
Good call on ThinLTO :)
@cpsauer Thanks. I think it was intended to be opt-in using --features=thin_lto
. As far as I can tell on Bazel 4.2.2 using NDK 19 (and macOS desktop builds), that doesn't pass -flto=thin
when building (ex: passing --features=thin_lto
doesn't affect previously cached builds without it, also looking at the compile commands with -s
). So support for it in general may have broader scope / require another issue. Right now we're just tossing in this into a .bazelrc.release
config file.
build --copt -flto=thin
build --linkopt -flto=thin
The android crosstool doesn't actually have the thin_lto feature, and therefore doesn't support it. I think this is something we should add, similar to how it works in the unix toolchain (assuming the NDK's tools support the same flags)
@keith @cpsauer should we add it / can we as part of the defaults for the opt
config for the NDK? Same for iOS and macOS I believe, last time I peeked into the compile commands. ThinLTO on Clang has been around for years.
We were looking at this yesterday and found that if you're trying to use -Os
you couldn't actually use LTO as well on Android until NDK 22 https://github.com/android/ndk/issues/721 so I guess we're blocked on https://github.com/bazelbuild/bazel/issues/12889 for that part.
As far as opt-in vs opt-out, I imagine folks would want us to mirror the current behavior where it is supported, but I'm not sure, either way I think adding it to make it as easy as flipping a feature for all platforms would be great.
Keith with a blazingly fast, good crossref, as usual :) There's an interaction I didn't know about.
@keith, could I ask what breakage you saw? Same fatal linker error as in the issue you linked https://github.com/android/ndk/issues/721?
Asking bc we'd been running -flto
and now -flto=thin
for a while now on opt
builds, but hadn't seen any problems.
(Switched to -flto=thin
as soon as @jgavris tipped me off that NDK r21's (non-LLVM) linker supported it.)
IMO, we should really turn on thin LTO by default for opt
, if we can.
Lots of users will never get deeper than requesting an optimized build. And I think they'd be surprised that opt
didn't turn on link time optimization by default, since it's a key optimization lever. And with ThinLTO, it seems like there are even fewer downsides.
It's probably more likely to get through review and testing to just stick with implementing --features=thin_lto
(opt-in)? opt
hasn't had fat LTO by default even though it's been around for a long time because I'm guessing because using fat LTO at Google scale was not feasible (ThinLTO was primarily motivated by developing LLVM / Clang itself!).
@cpsauer you are also passing -fuse-ld=lld
to linkopts
on NDK 19? It is not quite yet the default linker.
https://developer.android.com/ndk/downloads/revision_history?authuser=1
Developers should begin testing their apps with LLD. AOSP has switched to using LLD by default and the NDK will use it by default in the next release. BFD and Gold will be removed once LLD has been through a release cycle with no major unresolved issues (estimated r21). Test LLD in your app by passing -fuse-ld=lld when linking. Note: lld does not currently support compressed symbols on Windows. Issue 888. Clang also cannot generate compressed symbols on Windows, but this can be a problem when using artifacts built from Darwin or Linux.
@keith, could I ask what breakage you saw? Same fatal linker error as in the issue you linked https://github.com/android/ndk/issues/721?
Yes this one.
Asking bc we'd been running -flto and now -flto=thin for a while now on opt builds, but hadn't seen any problems.
What optimization level are you using? we're explicitly passing -Osize
in this case, which results in a smaller dylib in our case than if you use a compatible -O
and LTO in my testing.
Here's some thin_lto setup https://github.com/bazelbuild/bazel/pull/15039 for Android, still opt-in
Woo! Thanks for doing, Keith.
-Os
& linker errors:Aha! I realized how we'd been talking past each other. Heretofore, I'd just been specifying --compilation_mode=opt
alongside the LTO options, which (from aquery) was creating a mix of -Os
and -O2
when compiling. After some of monkeying around trying to induce that error by specifying --copt=-Os
, I realized that you must have instead been referring to -linkopt=-Os
. (And -Osize
is a typo, right? Since that didn't seem to be recognized by either.) -linkopt=-Os
did indeed induce that same error, and additionally passing -fuse-ld=lld
just changed things over to an equivalent error from lld
.
FWIW, I'm getting significantly smaller .so's with LTO (thin or full) than I do linking with -Os
. Like 25% smaller or so. (Differs, of course by the characteristics of the call graph!)
@keith, do you understand how (or if) the linker picks up optimization flags that were passed during compilation?
(Asking because it seems like the output binary is the same with or without the link line containing -O2
, the optimization level used to compile. But it differs if other optimizations levels are passed when linking.)
I've been using NDK 21 (as the latest supported by Bazel), which seems to use LLVM-plugin-enabled gold. No -fuse-ld=lld
originally, but I just tested it and it does work. I think I'll leave it on, since that's the default in newer versions.
Is there something backpinning you to NDK 19? (I'm noticing references to NDK 19 in lots of your comments.)
[Sorry guys for being a little slow one the reply this time around.]
do you understand how (or if) the linker picks up optimization flags that were passed during compilation?
Interesting find! I would assume that bazel wouldn't treat them special at all, so if you didn't explicitly pass --linkopt=-Os
you'd avoid it which I assume would be fine 🤔
@cpsauer at the time I wrote this, we were still on Bazel 4.2.2 (I don't think 5.0 had officially released yet). 4.2.2 only supported up to r19c.
@keith: Pretty wild, right! I feel like I read something about this long ago, but can't find doc on it for the life of me. If you do ever run into it, I'd love it if you'd say something!
@jgavris: Makes sense. We've been using Bazelisk to stay at or near HEAD with rolling releases. It's been pretty sweet. Huge thanks to Keith for (among many things) helping the rolling story get working with mobile.
@gregestren, sorry to add one more thing to your plate, but could we remove the support tag? This really isn't.
Update: The thin_lto feature is supported by the Google NDK toolchains, which Alex released as a replacement for the old, built-in ones that had ben falling behind! https://github.com/bazelbuild/rules_android_ndk
I'm leaving this issue open because I really do want to propose that optimized builds ought to have (thin) link time optimization on by default.
@keith, I think the new toolchain should also solve the -O# linking issue you mentioned above, by virtue of being NDK >=r22! Do you know offhand if thin_lto is already supported by the Apple toolchains? I didn't see it on a quick pass, but I could have missed it.
Do you know offhand if thin_lto is already supported by the Apple toolchains? I didn't see it on a quick pass, but I could have missed it.
don't think so, although just passing --copt=-flto=thin
does mostly work from my testing.
Hi awesome Bazel folks,
I was debugging an NDK build and looking at the link commands being run. I happened to notice that dbg and opt builds are getting the exact same set of linker flags--no
-flto
(or-O#
) flags in either case. That made me wonder if Bazel is accidentally omitting link-time optimization (LTO) in optimized builds.[I'm assume Bazel (a) intends to have optimized (opt) builds run with LTO, and (b) that those flags would be show up in the link command dumped in
--verbose_failures
. Apologies if I've missed something somewhere and this is a false alarm, but then maybe we should clarify the docs and have consistency across platforms.]Easiest way to repro should be the Bazel NDK example. (I'm using NDK21, macOS 11, Bazel 4, if that might matter.) I was looking at it by building with an invalid linker flag and
--verbose_failures
.Thanks for taking a peek, Chris