android / ndk

The Android Native Development Kit
1.97k stars 255 forks source link

[Bug] NDK's openmp test crashing on x86 with latest LLVM update #2041

Closed DanAlbert closed 1 week ago

DanAlbert commented 2 months ago

Description

https://android.googlesource.com/platform/ndk/+/refs/heads/main/tests/device/openmp/ and https://android.googlesource.com/platform/ndk/+/refs/heads/main/tests/device/openmp_shared/jni are both failing when I update to clang-r530567. There are four sub tests (two of which are identical aside from whether they use the static or shared openmp, the other two are both static openmp) that do pretty trivial things, but all are crashing on 32-bit x86.

Repro steps (you'll need to download the NDK from the artifacts on https://android-review.googlesource.com/c/platform/ndk/+/3168418 unless you want to build it yourself, prebuilts/clang doesn't have a sysroot so can't build it without the rest of the NDK):

$ ndk/toolchains/llvm/prebuilt/$HOST/bin/clang -target i686-linux-android21 ndk/tests/device/openmp/jni/openmp.c -fopenmp -static-openmp -o omp_test
$ adb push omp_test /data/local/tmp/
$ adb shell "/data/local/tmp/omp_test; echo $?"

I think you should also be able to download the ndk-tests.tar.bz2 artifact to get the test executables if you don't need to rebuild them.

Logcat says:

--------- beginning of crash
07-11 13:52:20.771 13475 13475 F libc    : Fatal signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0x48 in tid 13475 (fib), pid 13475 (fib)
--------- beginning of main
07-11 13:52:20.778 13478 13478 I crash_dump32: obtaining output fd from tombstoned, type: kDebuggerdTombstone
07-11 13:52:20.779  1851  1851 I /system/bin/tombstoned: received crash request for pid 13475
07-11 13:52:20.779 13478 13478 I crash_dump32: performing dump of process 13475 (target tid = 13475)
07-11 13:52:20.780 13478 13478 F DEBUG   : *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
07-11 13:52:20.780 13478 13478 F DEBUG   : Build fingerprint: 'google/sdk_goog3_x86/generic_x86:10/QSR1.211112.003/8112140:userdebug/test-keys'
07-11 13:52:20.780 13478 13478 F DEBUG   : Revision: '0'
07-11 13:52:20.780 13478 13478 F DEBUG   : ABI: 'x86'
07-11 13:52:20.780 13478 13478 F DEBUG   : Timestamp: 2024-07-11 13:52:20-0700
07-11 13:52:20.781 13478 13478 F DEBUG   : pid: 13475, tid: 13475, name: fib  >>> ./fib <<<
07-11 13:52:20.781 13478 13478 F DEBUG   : uid: 0
07-11 13:52:20.781 13478 13478 F DEBUG   : signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0x48
07-11 13:52:20.781 13478 13478 F DEBUG   : Cause: null pointer dereference
07-11 13:52:20.781 13478 13478 F DEBUG   :     eax 5ba343f0  ebx 00000000  ecx 00000000  edx 00000000
07-11 13:52:20.781 13478 13478 F DEBUG   :     edi 00000000  esi 5ba343cc
07-11 13:52:20.781 13478 13478 F DEBUG   :     ebp ffad14a8  esp ffad145c  eip 5ba2fac0
07-11 13:52:20.781 13478 13478 F DEBUG   : 
07-11 13:52:20.781 13478 13478 F DEBUG   : backtrace:
07-11 13:52:20.781 13478 13478 F DEBUG   :       #00 pc 000acac0  /data/local/tmp/tests/x86-21-legacy-weakapi/cmake/openmp/x86/fib (BuildId: 8b41cd5b37adb209bc0f739b1bbb54243ca655b1)
07-11 13:52:20.781 13478 13478 F DEBUG   :       #01 pc 00071281  /data/local/tmp/tests/x86-21-legacy-weakapi/cmake/openmp/x86/fib (__kmp_query_cpuid+305) (BuildId: 8b41cd5b37adb209bc0f739b1bbb54243ca655b1)
07-11 13:52:20.781 13478 13478 F DEBUG   :       #02 pc 000a32e6  /data/local/tmp/tests/x86-21-legacy-weakapi/cmake/openmp/x86/fib (__kmp_runtime_initialize+54) (BuildId: 8b41cd5b37adb209bc0f739b1bbb54243ca655b1)
07-11 13:52:20.781 13478 13478 F DEBUG   :       #03 pc 0002b427  /data/local/tmp/tests/x86-21-legacy-weakapi/cmake/openmp/x86/fib (__kmp_do_serial_initialize()+359) (BuildId: 8b41cd5b37adb209bc0f739b1bbb54243ca655b1)
07-11 13:52:20.781 13478 13478 F DEBUG   :       #04 pc 00034fa4  /data/local/tmp/tests/x86-21-legacy-weakapi/cmake/openmp/x86/fib (__kmp_do_middle_initialize()+36) (BuildId: 8b41cd5b37adb209bc0f739b1bbb54243ca655b1)
07-11 13:52:20.781 13478 13478 F DEBUG   :       #05 pc 00034f62  /data/local/tmp/tests/x86-21-legacy-weakapi/cmake/openmp/x86/fib (__kmp_middle_initialize+66) (BuildId: 8b41cd5b37adb209bc0f739b1bbb54243ca655b1)
07-11 13:52:20.782 13478 13478 F DEBUG   :       #06 pc 000a6152  /data/local/tmp/tests/x86-21-legacy-weakapi/cmake/openmp/x86/fib (omp_get_num_procs+50) (BuildId: 8b41cd5b37adb209bc0f739b1bbb54243ca655b1)
07-11 13:52:20.782 13478 13478 F DEBUG   :       #07 pc 0001f66c  /data/local/tmp/tests/x86-21-legacy-weakapi/cmake/openmp/x86/fib (main+140) (BuildId: 8b41cd5b37adb209bc0f739b1bbb54243ca655b1)
07-11 13:52:20.782 13478 13478 F DEBUG   :       #08 pc 000898e8  /apex/com.android.runtime/lib/bionic/libc.so (__libc_init+120) (BuildId: 471745f0fbbcedb3db1553d5bd6fcd8b)

That's from an API 28 x86 emulator I got from go/xcid.

I've only tested on Darwin, but I expect this isn't host-specific.

Upstream bug

No response

Commit to cherry-pick

No response

Affected versions

Canary

Canary version

not actually submitted yet, clang-r530567

Host OS

Mac

Host OS version

all, probably

Affected ABIs

x86

pirama-arumuga-nainar commented 2 months ago

In libomp.a, __kmp_query_cpuid+305 is the call to strrchr:

    12d: e8 fc ff ff ff                calll   0x12e <__kmp_query_cpuid+0x12e>
                        0000012e:  R_386_PLT32  strrchr
     132: 83 c4 10                      addl    $0x10, %esp

Is this an issue with constructors being called before the dynamic linker has initialized the PLT? (tagging @rprichard ).

DanAlbert commented 2 months ago

(I've updated the CL to xfail the test for now, so if you try running run_tests.py now it won't yell loudly. If you want to see the failure, delete the test_config.py in those directories)

rprichard commented 1 month ago

It looks like this PR was the cause, https://github.com/llvm/llvm-project/pull/84626. It has since been reverted, and my impression is that the unmodified original code is/was correct for x86 PIC (but I haven't confirmed that).


Debugging notes:

Is this an issue with constructors being called before the dynamic linker has initialized the PLT?

The dynamic linker calls constructors in a separate pass after applying relocations for all shared objects.

Here's the segfault I see:

08-06 20:07:27.439  1797  1797 F DEBUG   : *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
08-06 20:07:27.439  1797  1797 F DEBUG   : Build fingerprint: 'Android/sdk_slim_x86/generic_x86:11/RSR1.210722.009/7752727:userdebug/test-keys'
08-06 20:07:27.439  1797  1797 F DEBUG   : Revision: '0'
08-06 20:07:27.439  1797  1797 F DEBUG   : ABI: 'x86'
08-06 20:07:27.440  1797  1797 F DEBUG   : Timestamp: 2024-08-06 20:07:27-0700
08-06 20:07:27.440  1797  1797 F DEBUG   : pid: 1794, tid: 1794, name: omp_test  >>> ./omp_test <<<
08-06 20:07:27.440  1797  1797 F DEBUG   : uid: 2000
08-06 20:07:27.440  1797  1797 F DEBUG   : signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0x3c
08-06 20:07:27.440  1797  1797 F DEBUG   : Cause: null pointer dereference
08-06 20:07:27.440  1797  1797 F DEBUG   :     eax 64b758f0  ebx 00000000  ecx 00000000  edx 00000000
08-06 20:07:27.440  1797  1797 F DEBUG   :     edi 00000000  esi 64b758cc
08-06 20:07:27.440  1797  1797 F DEBUG   :     ebp fff63678  esp fff6362c  eip 64b71030
08-06 20:07:27.441  1797  1797 F DEBUG   : backtrace:
08-06 20:07:27.441  1797  1797 F DEBUG   :       #00 pc 000ad030  /data/local/tmp/omp_test
08-06 20:07:27.441  1797  1797 F DEBUG   :       #01 pc 00071821  /data/local/tmp/omp_test (__kmp_query_cpuid+305)
08-06 20:07:27.441  1797  1797 F DEBUG   :       #02 pc 000a3886  /data/local/tmp/omp_test (__kmp_runtime_initialize+54)
08-06 20:07:27.441  1797  1797 F DEBUG   :       #03 pc 0002b9c7  /data/local/tmp/omp_test (__kmp_do_serial_initialize()+359)
08-06 20:07:27.441  1797  1797 F DEBUG   :       #04 pc 0002b83c  /data/local/tmp/omp_test (__kmp_get_global_thread_id_reg+140)
08-06 20:07:27.441  1797  1797 F DEBUG   :       #05 pc 000a5e88  /data/local/tmp/omp_test (omp_set_num_threads+24)
08-06 20:07:27.436  1797  1797 I crash_dump32: type=1400 audit(0.0:99): avc: denied { read } for name="omp_test" dev="dm-5" ino=65540 scontext=u:r:crash_dump:s0 tcontext=u:object_r:shell_data_file:s0 tclass=file permissive=1
08-06 20:07:27.441  1797  1797 F DEBUG   :       #06 pc 0001fe46  /data/local/tmp/omp_test (main+86)
08-06 20:07:27.441  1797  1797 F DEBUG   :       #07 pc 000522e3  /apex/com.android.runtime/lib/bionic/libc.so (__libc_init+115) (BuildId: 6e3a0180fa6637b68c0d181c343e6806)

ad030 is in the .plt. It's read-only code that reads the relocated address from the .got:

000ad030 <strrchr@plt>:
   ad030:       ff a3 3c 00 00 00       jmp    *0x3c(%ebx)
   ad036:       68 60 00 00 00          push   $0x60
   ad03b:       e9 20 ff ff ff          jmp    acf60 <__umoddi3+0xb8>

Apparently %ebx is zero, but it should be pointing to the .got. https://ewontfix.com/18/:

Since 32-bit x86 provides no direct way to do memory loads/stores relative to the current instruction pointer, the SysV ABI provides that, when code generated as PIC calls into a PLT thunk, it must pass a pointer to the GOT as a hidden argument in %ebx.

The %ebx register is supposed to point to the GOT, but it seems to be zero instead on entry to the .plt.

__kmp_query_cpuid initializes %edi to the GOT instead of %ebx:

   716f9:       e8 00 00 00 00          call   716fe <__kmp_query_cpuid+0xe>
   716fe:       5f                      pop    %edi
...
   7170c:       81 c7 2e d7 03 00       add    $0x3d72e,%edi

This pattern keeps showing up in __kmp_query_cpuid:

   71716:       89 df                   mov    %ebx,%edi
   71718:       0f a2                   cpuid
   7171a:       87 df                   xchg   %ebx,%edi

The first instance of this pattern overwrites the GOT saved in %edi.

It appears that cpuid accepts an input in %eax, and then writes its output to %eax, %ebx, %edx, %ecx. (https://datasheets.chipdb.org/Intel/x86/CPUID/24161821.pdf) Before invoking cpuid, the function saves off the value of %ebx (into %edi), then restores it afterwards.

pirama-arumuga-nainar commented 1 month ago

@rprichard Thanks for identifying the fix! http://r.android.com/3217551 and http://r.android.com/3217550 pick these changes to AOSP toolchain.