Closed DanAlbert closed 1 week ago
In libomp.a
, __kmp_query_cpuid+305
is the call to strrchr:
12d: e8 fc ff ff ff calll 0x12e <__kmp_query_cpuid+0x12e>
0000012e: R_386_PLT32 strrchr
132: 83 c4 10 addl $0x10, %esp
Is this an issue with constructors being called before the dynamic linker has initialized the PLT? (tagging @rprichard ).
(I've updated the CL to xfail the test for now, so if you try running run_tests.py
now it won't yell loudly. If you want to see the failure, delete the test_config.py
in those directories)
It looks like this PR was the cause, https://github.com/llvm/llvm-project/pull/84626. It has since been reverted, and my impression is that the unmodified original code is/was correct for x86 PIC (but I haven't confirmed that).
Debugging notes:
Is this an issue with constructors being called before the dynamic linker has initialized the PLT?
The dynamic linker calls constructors in a separate pass after applying relocations for all shared objects.
Here's the segfault I see:
08-06 20:07:27.439 1797 1797 F DEBUG : *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
08-06 20:07:27.439 1797 1797 F DEBUG : Build fingerprint: 'Android/sdk_slim_x86/generic_x86:11/RSR1.210722.009/7752727:userdebug/test-keys'
08-06 20:07:27.439 1797 1797 F DEBUG : Revision: '0'
08-06 20:07:27.439 1797 1797 F DEBUG : ABI: 'x86'
08-06 20:07:27.440 1797 1797 F DEBUG : Timestamp: 2024-08-06 20:07:27-0700
08-06 20:07:27.440 1797 1797 F DEBUG : pid: 1794, tid: 1794, name: omp_test >>> ./omp_test <<<
08-06 20:07:27.440 1797 1797 F DEBUG : uid: 2000
08-06 20:07:27.440 1797 1797 F DEBUG : signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0x3c
08-06 20:07:27.440 1797 1797 F DEBUG : Cause: null pointer dereference
08-06 20:07:27.440 1797 1797 F DEBUG : eax 64b758f0 ebx 00000000 ecx 00000000 edx 00000000
08-06 20:07:27.440 1797 1797 F DEBUG : edi 00000000 esi 64b758cc
08-06 20:07:27.440 1797 1797 F DEBUG : ebp fff63678 esp fff6362c eip 64b71030
08-06 20:07:27.441 1797 1797 F DEBUG : backtrace:
08-06 20:07:27.441 1797 1797 F DEBUG : #00 pc 000ad030 /data/local/tmp/omp_test
08-06 20:07:27.441 1797 1797 F DEBUG : #01 pc 00071821 /data/local/tmp/omp_test (__kmp_query_cpuid+305)
08-06 20:07:27.441 1797 1797 F DEBUG : #02 pc 000a3886 /data/local/tmp/omp_test (__kmp_runtime_initialize+54)
08-06 20:07:27.441 1797 1797 F DEBUG : #03 pc 0002b9c7 /data/local/tmp/omp_test (__kmp_do_serial_initialize()+359)
08-06 20:07:27.441 1797 1797 F DEBUG : #04 pc 0002b83c /data/local/tmp/omp_test (__kmp_get_global_thread_id_reg+140)
08-06 20:07:27.441 1797 1797 F DEBUG : #05 pc 000a5e88 /data/local/tmp/omp_test (omp_set_num_threads+24)
08-06 20:07:27.436 1797 1797 I crash_dump32: type=1400 audit(0.0:99): avc: denied { read } for name="omp_test" dev="dm-5" ino=65540 scontext=u:r:crash_dump:s0 tcontext=u:object_r:shell_data_file:s0 tclass=file permissive=1
08-06 20:07:27.441 1797 1797 F DEBUG : #06 pc 0001fe46 /data/local/tmp/omp_test (main+86)
08-06 20:07:27.441 1797 1797 F DEBUG : #07 pc 000522e3 /apex/com.android.runtime/lib/bionic/libc.so (__libc_init+115) (BuildId: 6e3a0180fa6637b68c0d181c343e6806)
ad030 is in the .plt. It's read-only code that reads the relocated address from the .got:
000ad030 <strrchr@plt>:
ad030: ff a3 3c 00 00 00 jmp *0x3c(%ebx)
ad036: 68 60 00 00 00 push $0x60
ad03b: e9 20 ff ff ff jmp acf60 <__umoddi3+0xb8>
Apparently %ebx is zero, but it should be pointing to the .got. https://ewontfix.com/18/:
Since 32-bit x86 provides no direct way to do memory loads/stores relative to the current instruction pointer, the SysV ABI provides that, when code generated as PIC calls into a PLT thunk, it must pass a pointer to the GOT as a hidden argument in %ebx.
The %ebx register is supposed to point to the GOT, but it seems to be zero instead on entry to the .plt.
__kmp_query_cpuid
initializes %edi to the GOT instead of %ebx:
716f9: e8 00 00 00 00 call 716fe <__kmp_query_cpuid+0xe>
716fe: 5f pop %edi
...
7170c: 81 c7 2e d7 03 00 add $0x3d72e,%edi
This pattern keeps showing up in __kmp_query_cpuid
:
71716: 89 df mov %ebx,%edi
71718: 0f a2 cpuid
7171a: 87 df xchg %ebx,%edi
The first instance of this pattern overwrites the GOT saved in %edi.
It appears that cpuid accepts an input in %eax, and then writes its output to %eax, %ebx, %edx, %ecx. (https://datasheets.chipdb.org/Intel/x86/CPUID/24161821.pdf) Before invoking cpuid, the function saves off the value of %ebx (into %edi), then restores it afterwards.
@rprichard Thanks for identifying the fix! http://r.android.com/3217551 and http://r.android.com/3217550 pick these changes to AOSP toolchain.
Description
https://android.googlesource.com/platform/ndk/+/refs/heads/main/tests/device/openmp/ and https://android.googlesource.com/platform/ndk/+/refs/heads/main/tests/device/openmp_shared/jni are both failing when I update to clang-r530567. There are four sub tests (two of which are identical aside from whether they use the static or shared openmp, the other two are both static openmp) that do pretty trivial things, but all are crashing on 32-bit x86.
Repro steps (you'll need to download the NDK from the artifacts on https://android-review.googlesource.com/c/platform/ndk/+/3168418 unless you want to build it yourself, prebuilts/clang doesn't have a sysroot so can't build it without the rest of the NDK):
I think you should also be able to download the ndk-tests.tar.bz2 artifact to get the test executables if you don't need to rebuild them.
Logcat says:
That's from an API 28 x86 emulator I got from go/xcid.
I've only tested on Darwin, but I expect this isn't host-specific.
Upstream bug
No response
Commit to cherry-pick
No response
Affected versions
Canary
Canary version
not actually submitted yet, clang-r530567
Host OS
Mac
Host OS version
all, probably
Affected ABIs
x86