android / ndk

The Android Native Development Kit
1.94k stars 254 forks source link

HELP WANTED:Clang toolchain compile with -fopenmp cause so size increase nearly 400K #742

Open cocodark opened 6 years ago

cocodark commented 6 years ago

As descripted in title, the latest ndk r17 set clang as the default toolchain, the -fopenmp command cause shared library size increase as much as nearly 400Kb, but gcc in the previous version, the -fopenmp only cause increasing about 30Kb, what's the difference between gcc & clang? image

cocodark commented 6 years ago

@pirama-arumuga-nainar Any suggestions?

pirama-arumuga-nainar commented 6 years ago

I am able to reproduce this with a simple case.

int g[1024];

int main() {
  int i;
#pragma omp parallel
  for (i = 0; i < 1024; i ++)
    g[i] = i*i;
  return 0;
}

With OpenMP, executable from GCC is 40K, while the one from Clang is 380K. This seemed to have been worse with r16, where the Clang generated binary was ~800K.

There are far too many functions in the text section and symbol table with Clang than with GCC. I'll see why this is and if we can trim this down. If it helps in the meantime, adding '-Wl,--exclude-libs,libomp.a' reduces the binary to 350K. The savings are mostly from a smaller symbol table. The text section is still big.

cocodark commented 6 years ago

@pirama-arumuga-nainar As Gcc will be removed in r18, this bug cause a lot trouble to us

pirama-arumuga-nainar commented 6 years ago

Using -ffunction-sections when building the libomp runtime and passing -Wl,--gc-sections while linking (which, according to @DanAlbert, is passed by ndk-build and CMake by default) helps shave a further 100k from my test. With this change, the clang-build with openmp is ~250K, which is still higher than the 40K from gcc.

This is all I can think of from a black box perspective. I'll kick off an email to openmp-dev asking for their opinion.

I've attached libomp.a.zip for arm64 built with -ffunction-sections for experimentation. Just drop it into your NDK installation.

pirama-arumuga-nainar commented 6 years ago

https://android-review.googlesource.com/c/toolchain/llvm_android/+/719087 passes ffunction-sections when building the runtimes. This should be a part of r19 or whenever the NDK clang gets updated past the one in r18.

cocodark commented 6 years ago

@pirama-arumuga-nainar Is there any progress?

pirama-arumuga-nainar commented 6 years ago

@cocodark did this help?

I've attached libomp.a.zip for arm64 built with -ffunction-sections for experimentation. Just drop it into your NDK installation.

I have to ask upstream OpenMP developers about this - and for that I need to recreate my experiment with a newer gcc. I'll do that this week.

cocodark commented 6 years ago

@pirama-arumuga-nainar After replace the libomp.a, clang still cause increase ~250K ,this is still inexplicable, which force me to revert to R14 with GCC, so ,I hope it will be resolved before NDK R18 released.

DanAlbert commented 6 years ago

I hope it will be resolved before NDK R18 released.

If we find out that we're just building something wrong then that's possible, but if it's something that will require changes upstream then unfortunately that won't happen. It takes quite a bit of time to get changes made, merged to LLVM, pulled back to Android, tested, and then finally released.

Has anyone looked to see if the same size issues are present with prior NDKs? We've had openmp support for Clang for over a year but this bug was only opened about two weeks ago.

DanAlbert commented 5 years ago

As far as I know this has not been resolved upstream, so moving to r20.

cocodark commented 5 years ago

@DanAlbert Thank you

DanAlbert commented 5 years ago

Still no upstream progress.

DanAlbert commented 4 years ago

Still no changes upstream afaik.

I don't suppose the original problem was that the GCC build used for comparison was using a shared libomp whereas Clang was using a static one?

nickdesaulniers commented 4 years ago

@DanAlbert what's the upstream bug link?

DanAlbert commented 4 years ago

@pirama-arumuga-nainar might know. I've just been asking him iirc.

pirama-arumuga-nainar commented 4 years ago

There's no upstream bug. We'd need steps to reproduce before reporting in upstream - so if we can reproduce for any Linux target, that'd be best. It'd also help us understand if this is an issue with how we're building for Android.

DanAlbert commented 4 years ago

Isn't https://github.com/android/ndk/issues/742#issuecomment-405695546 a repro case? I'll check rq to see if this is still a problem on r21. I sort of wonder if the problem was actually just that gcc was using a shared openmp and we didn't have that for Clang until r21...

DanAlbert commented 4 years ago

(no, we never had a shared omp runtime for gcc)

pirama-arumuga-nainar commented 4 years ago

Huh? IIRC, when adding openmp runtimes for Clang, we added static openmp for compatibility with gcc.

DanAlbert commented 4 years ago

Sorry, I mean't we never had a shared omp runtime. Caffeine hasn't made it to my blood stream yet.

DanAlbert commented 4 years ago

I do still see a fairly significant increase in size when using openmp:

For the test case above: without -fopenmp: 8.0K with -fopenmp: 280K

Our openmp runtime doesn't seem to link properly on my debian machine. Pirama is looking...

pirama-arumuga-nainar commented 4 years ago

The 8K number was slightly misleading because the foo.cpp from earlier turned out to be a no-op for gcc/libgomp.

I looked at it briefly and filed an upstream bug. Here's the content from there for cross reference:

This was originally reported a while ago in the Android NDK bug tracker as https://github.com/android/ndk/issues/742.

Statically-linking libomp.a for a simple OpenMP hello world program, https://computing.llnl.gov/tutorials/openMP/samples/C/omp_hello.c, produces a 468K binary (after strip). gcc + libgomp.a, in comparison, produces a binary of size 128K.

I want a sanity check if this is expected behavior or if the runtime can be better organized for static linking.

Steps to reproduce:

Build OpenMP statically with: -DLIBOMP_ENABLE_SHARED=OFF -DCMAKE_BUILD_TYPE=Release


$ du -sh a.out
548K

$ strip a.out && du -sh a.out
468K