android / ndk

The Android Native Development Kit
1.97k stars 255 forks source link

Windows, NDK r14: link time optimization does not work with Clang #313

Closed psiha closed 6 years ago

psiha commented 7 years ago

I.e. #137 has not been fixed in the Windows x64 build of the NDK - I still get ndk-bundle\toolchains\aarch64-linux-android-4.9\prebuilt\windows-x86_64\aarch64-linux-android\bin\ld.gold.exe Error:error: .cpp.o:1:3: invalid character

kneth commented 7 years ago

I know it doesn't help you but https://github.com/android-ndk/ndk/wiki/Changelog-r14 says "LTO with Clang now builds without errors on Linux and Darwin.". I don't think LTO has been fixed for Windows but @DanAlbert knows for sure.

DanAlbert commented 7 years ago

That's not the the error I'd expect to see. We don't even have LLVMgold.dll in the NDK: https://github.com/android-ndk/ndk/issues/251

The forced optimization level is weird, but I'm guessing there's a reason for it. @stephenhines @pirama-arumuga-nainar, I don't suppose either of you know why Clang doesn't allow -Os with LTO?

psiha commented 7 years ago
pirama-arumuga-nainar commented 7 years ago

LLVMgold.so seems to define its own optimization level and it currently supports O0 through O3. The problem is that Clang's driver doesn't fully sanitize its optimization level before passing it to the Gold plugin. It sanitizes O4 and Ofast down to O3 but doesn't handle Os or Oz. https://github.com/llvm-mirror/clang/blob/master/lib/Driver/Tools.cpp#L1210.

I'll create an issue upstream. The driver should pass a suitable optimization level to the plugin in case of Os or Oz. The plugin could make better space-oriented decisions while adding the optimization levels. It's just a question of plumbing through the optimization level back to PassManagerBuilder (https://github.com/llvm-mirror/llvm/blob/master/lib/Transforms/IPO/PassManagerBuilder.cpp#L860)

I think it might be possible to pull in the patch sanitizing Os and Oz for r15. I'd also suggest looking at passing '-plugin-opt=disable-inlining' to the linker.

@psiha Did you invoke the clang driver when you experimented with OSX, Linux and MSVC? i.e. did you invoke clang -flto for linking? If so, can you add '-v' and post the linker command? Based on LLVM ToT code, I don't see how this would accept Os or Oz.

pirama-arumuga-nainar commented 7 years ago

Created https://bugs.llvm.org/show_bug.cgi?id=32155 in the LLVM issue tracker.

DanAlbert commented 7 years ago
  • that LTO is somehow platform dependent (even though you distribute your own linkers) really raises red flags
  • that you don't test officially supported platforms (Windows and Darwin) raises some more red flags

We do test on Windows and Darwin. Not sure why you think we don't.

In fact, the reason we don't have LTO on Windows right now is because it is a different environment and adding support for it is going to require thorough testing before we ship it.

  • we use -fuse-ld=gold and ld.gold.exe is present in the NDK (although of course on Windows we have to specify "-fuse-ld=gold.exe" while on other platforms omit .exe suffix...comical...)

This is still a thing? @pirama-arumuga-nainar, could you fix the driver for this? I thought we'd fixed this ages ago.

  • the 'invalid character' error I get is exactly the same error that the reporter of #137 got on Darwin

Ah, that issue was because -Wl,-flto has nothing to do with LTO. You need -flto. https://github.com/android-ndk/ndk/issues/137#issuecomment-228820201

psiha commented 7 years ago

That compiler codegen option logic is duplicated (or even worse, different) between clang and the LTO plugin (which, from a logic perspective, should just invoke the compiler, sans the front end, on the single huge TU available at link time) smells like a horrible hack...

'-plugin-opt=disable-inlining' is not an option as I do not want to disable inlining just disable the, often insane, inlining level done @O3 (i.e. use a much lower threshold since conservative inlining can also reduce the codegen size)... and, of course, Clang being totally crippled compared to GCC WRT codegen options and pragmas does not provide an option to control the inlining threshold - yes LLVM does but I can only imagine what will happen if I try to pass --mllvm options to the linker if even the documented things that are supposed to work fail every so often like this...

(In this regard/for LTO) We use the same compiler options for Clang on OSX, iOS and Android and only Android exhibits this behaviour/bug...since you claim you test on OSX you can investigate this further yourselves (if I find the time I may go through the -v output)... WRT MSVC, I meant MSVC itself (not the C2 Clang wrapper) where LTO or LTCG as it is called on Windows works just fine/production ready for the past >>12 years<< and here we still struggle with such basics in 2017 ... :/

psiha commented 7 years ago

We do test on Windows and Darwin. Not sure why you think we don't.

Your answer offers some queues as to why I think so ;p :

"This is still a thing? ... I thought we'd fixed this ages ago."

"reason we don't have LTO on Windows right now is because it is a different environment and adding support for it is going to require thorough testing before we ship it." (why on earth does a Windows host require anything special for LTO for Android?)

...and need I mention 'that one time' when the official NDK release for Windows failed to even unzip?...

Things that are tested either work or are listed under 'known issues'...

Ah, that issue was because -Wl,-flto has nothing to do with LTO. You need -flto.

And it is so - as I said, the same CMake logic is used for the Android flags regardless of the host OS (and most Clang flags, LTO ones specifically are also the same across host and target OSs) and it works on OSX...True, trying yet-another-rebuild on Windows today I now get the could not load plugin library error... (regardless: getting 'invalid character in object file' error for a missing linker option is not a very helpful error message)

ps. and yet another discovered LTO failure on this platform #318 ... :/

DanAlbert commented 7 years ago

Sorry, I think I've been unclear. Windows Clang+LTO is currently unsupported. That's why it's neither tested nor marked as a known issue. I don't list all open feature requests in our release notes, and neither does any other product I know of.

This is something we're going to be looking in to, fwiw, it's just not ready yet. IMO the highest priority issue is getting libc++ into better shape, which is why that gets most of my focus these days.

I'm worried that we won't have this done quickly enough to satisfy you, but the good news is that we're an open source project. I'd be happy to review (or find a reviewer for) any changes you'd like to contribute. The NDK's contribution guide is here, and our README.md is the landing page for our docs.

Otherwise, please understand that there is a limit to the number of things I can get done each release. This absolutely is something I want to support in the future, but these things take time.

kernhanda commented 7 years ago

I'm seeing this too. After following @psiha 's advice and changing -fuse-ld=gold to -fuse-ld=gold.exe, I get the following error:

C:/adt/sdk/ndk-bundle/toolchains/arm-linux-androideabi-4.9/prebuilt/windows-x86_64/lib/gcc/arm-linux-androideabi/4.9.x/../../../../arm-linux-androideabi/bin\ld.gold.exe: error: C:\adt\sdk\ndk-bundle\toolchains\llvm\prebuilt\windows-x86_64\bin/../lib64/LLVMgold.so: could not load plugin library: unable to load dll clang++.exe: error: linker command failed with exit code 1 (use -v to see invocation)

This is with NDK 14b, building for armeabi-v7a.

Not supporting LTO with clang on Windows is rather confusing since you're also calling GCC deprecated/unsupported.

Please let me know if there's a workaround available.

DanAlbert commented 7 years ago

C:/adt/sdk/ndk-bundle/toolchains/arm-linux-androideabi-4.9/prebuilt/windows-x86_64/lib/gcc/arm-linux-androideabi/4.9.x/../../../../arm-linux-androideabi/bin\ld.gold.exe: error: C:\adt\sdk\ndk-bundle\toolchains\llvm\prebuilt\windows-x86_64\bin/../lib64/LLVMgold.so: could not load plugin library: unable to load dll

Yeah, we don't build or ship this for Windows. I've uploaded a change to start building it (https://android-review.googlesource.com/c/435495/, will need a couple more changes to make sure it gets into the NDK), but there are more problems beyond that. Looks like even with the gold plugin in place, gold still can't load it properly on Windows. It looks like that's just a matter of other dependencies not being anywhere in the library load path on Windows (on Linux/Darwin we use an rpath to tell the linker where to find them, but that's not a thing in Windows). I'm building gold atm to see if my fix for this will work.

I'm hoping we'll have this for r16 beta 1. Assuming it's just this gold thing and the logistics of actually shipping it, that's doable. If there's anything actually wrong with LTO on Windows (that would be surprising to me), it will likely take a bit longer.

DanAlbert commented 7 years ago

Even with building and shipping this library, it doesn't to work on Windows. It has heap corruption issues that I haven't had time to drill into yet. Will be taking a closer look after we get a new clang, but this is off the table for beta 1.

DanAlbert commented 6 years ago

The good news is that getting Android's Clang moved over to the upstream build process fixes the issues on Windows. The bad news is that there are too many other regressions caused by that switch that are still being worked through, so there won't be a compiler update for r16. Should be available in r17 though.

psiha commented 6 years ago

So we'll maybe get LTO halfway through 2018!? Luckily the current compiler isn't some random buggy commit so we can safely wait until then for a compiler update...oh wait no, the current Android Clang 'optimizes':

if ( var ) { __builtin_assume( var == 6 ); } return var;

to

return 6;

...what a joke of a platform...

DanAlbert commented 6 years ago

We're an open source project. If you're not satisfied with the rate of progress, feel free to help out. You can find our contribution guide here: https://android.googlesource.com/platform/ndk/+/master/CONTRIBUTING.md.

Veigres commented 6 years ago

I have used LLVM's own linker (lld) for LTO both on Windows and Linux for about a year (since r13 came out). lld has support for LTO by default, so no LLVMgold.so plugin is needed. I highly recommend giving lld a try for those having issues with LTO on Windows. It is also significantly faster than the gold linker (claimed to be 2x faster, my tests on some fairly large projects seems to support that claim).

Klayflash commented 6 years ago

Veigres, thank you for usefull information.

I have build lld linker on windows by using following batch file

set PATH=%PATH%;C:\temp\llvm\cmake-3.9.4-win64-x64\bin
rem for python
set PATH=%PATH%;C:\lib\android\2\android-ndk-r15-windows-x86_64\android-ndk-r15\prebuilt\windows-x86_64\bin

set CMAKE_DONEE=%~dp0\build\cmakedonee.txt
cd %~dp0\build
if not exist %CMAKE_DONEE% (
  cmake -G "Visual Studio 14 Win64" -DCMAKE_BUILD_TYPE=Release -Thost=x64 -DLLVM_ENABLE_PROJECTS=lld %~dp0\llvm-project\llvm 
  if ERRORLEVEL 1 goto error
)
echo "cmake done" > %CMAKE_DONEE%
"C:\Program Files (x86)\MSBuild\14.0\Bin\amd64\msbuild.exe" "%~dp0\build\LLVM.sln"  /target:"lld executables\lld" /property:Configuration=Release /verbosity:minimal
if ERRORLEVEL 1 goto error

:success
echo SUCCESS!!!
goto exit
:error
echo ERROR!!!
goto exit
:exit
pause

see also https://lld.llvm.org/#build document

Then I copied lld linker to standaloneTolchain_x86\i686-linux-android\bin\ld.exe.

The linker works fine for x86 and Armv7 cpu. Executable become much smaller!

I found difference. Command line option "--fix-cortex-a8" is not recognized by lld linker.

stephenhines commented 6 years ago

If you are building for 64-bit ARM, you should be aware that lld is missing at least one patch for fixing some cortex-a53 errata (https://reviews.llvm.org/D36742). This is one reason why we haven't productionized lld yet for Android's NDK. It is definitely something that we are working on though.

DanAlbert commented 6 years ago

Confirmed fixed in build 4487575 (from master, will be in r17).

DoDoENT commented 6 years ago

using LTO forces us to use -O3 because with -Os it spits out the "Optimization level must be between 0 and 3" error

This error still happens in NDK r18 and also in NDK r17 and r17b.

enh commented 6 years ago

using LTO forces us to use -O3 because with -Os it spits out the "Optimization level must be between 0 and 3" error

This error still happens in NDK r18 and also in NDK r17 and r17b.

seems like that's a gold-specific error. in r18, can you try lld?

from the r18 release notes: "LLD is now available for testing. AOSP is in the process of switching to using LLD by default and the NDK will follow (timeline unknown). Test LLD in your app by passing -fuse-ld=lld when linking."

DanAlbert commented 6 years ago

from the r18 release notes: "LLD is now available for testing. AOSP is in the process of switching to using LLD by default and the NDK will follow (timeline unknown). Test LLD in your app by passing -fuse-ld=lld when linking."

This is actually a little annoying atm since the build systems will still force --fix-cortex-a8. We need to either get LLD to honor (or silently ignore, still not clear if we still need to work around) or teach ndk-build to not add that for LLD (not actually possible for us to do with CMake since we don't have all the flags that will be used when we're in the toolchain file; only those that are passed to CMake on the command line).

DoDoENT commented 6 years ago

seems like that's a gold-specific error. in r18, can you try lld?

With LLD I get this error:

/Users/dodo/android-sdks/ndk-bundle/toolchains/llvm/prebuilt/darwin-x86_64/bin/ld.lld: error: --plugin-opt=Oz: number expected, but got 'z'

I am now using -Oz, as this is now default option, according to changelog.

Of course, as @DanAlbert mentioned, I had to comment out the --fix-cortex-a8 within android.toolchain.cmake to make LLD work.

enh commented 6 years ago

/me checks llvm source. oh, yeah. different linker, different error message, same bug :-)

@chih-hung will know whether this is a known lld bug...

@pirama-arumuga-nainar in case this is WAI for LTO? (it occurs to me i'm just assuming that LTO should work even for -Os/-Oz...)

DanAlbert commented 6 years ago

We have another bug open for this already. We should take this conversation over to https://github.com/android-ndk/ndk/issues/721