Closed Quuxplusone closed 3 years ago
(This works fine on x86.)
Stack:
% lldb -- ./base_unittests '--
gtest_filter=MessagePumpMacTest.ScopedPumpMessagesAttemptWithModalDialog' --
single-process-tests
(lldb) target create "./base_unittests"
Current executable set to
'/Users/thakis/src/chrome/src/out/gn/repro/base_unittests' (arm64).
(lldb) settings set -- target.run-args "--
gtest_filter=MessagePumpMacTest.ScopedPumpMessagesAttemptWithModalDialog" "--
single-process-tests"
(lldb) r
Process 14421 launched:
'/Users/thakis/src/chrome/src/out/gn/repro/base_unittests' (arm64)
Debugger detected, switching to single process mode.
Pass --test-launcher-debug-launcher to debug the launcher itself.
[14421:259:ERROR:icu_util.cc(179)] icudtl.dat not found in bundle
[14421:259:ERROR:icu_util.cc(243)] Invalid file descriptor to ICU data received.
Detected presence of a debugger, running without test timeouts.
Note: Google Test filter =
MessagePumpMacTest.ScopedPumpMessagesAttemptWithModalDialog
[==========] Running 1 test from 1 test suite.
[----------] Global test environment set-up.
[----------] 1 test from MessagePumpMacTest
[ RUN ] MessagePumpMacTest.ScopedPumpMessagesAttemptWithModalDialog
Process 14421 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BREAKPOINT
(code=1, subcode=0x19a985d30)
frame #0: 0x000000019a985d30 libunwind.dylib`libunwind::UnwindCursor<libunwind::LocalAddressSpace, libunwind::Registers_arm64>::step() + 500
libunwind.dylib`libunwind::UnwindCursor<libunwind::LocalAddressSpace,
libunwind::Registers_arm64>::step:
-> 0x19a985d30 <+500>: brk #0xc471
0x19a985d34 <+504>: pacib x16, x8
0x19a985d38 <+508>: b 0x19a985dd8 ; <+668>
0x19a985d3c <+512>: ldur x11, [x10, #-0x8]
Target 0: (base_unittests) stopped.
(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BREAKPOINT
(code=1, subcode=0x19a985d30)
* frame #0: 0x000000019a985d30 libunwind.dylib`libunwind::UnwindCursor<libunwind::LocalAddressSpace, libunwind::Registers_arm64>::step() + 500
frame #1: 0x0000000190dc9f0c libobjc.A.dylib`objc_addExceptionHandler + 128
frame #2: 0x0000000193ade794 AppKit`-[NSApplication _commonBeginModalSessionForWindow:relativeToWindow:modalDelegate:didEndSelector:contextInfo:] + 960
frame #3: 0x0000000193ade330 AppKit`__35-[NSApplication runModalForWindow:]_block_invoke_2 + 44
frame #4: 0x0000000193ade2e8 AppKit`__35-[NSApplication runModalForWindow:]_block_invoke + 108
frame #5: 0x0000000193adda7c AppKit`_NSTryRunModal + 128
frame #6: 0x0000000193add928 AppKit`-[NSApplication runModalForWindow:] + 148
frame #7: 0x0000000193c54ab8 AppKit`__19-[NSAlert runModal]_block_invoke_2 + 160
frame #8: 0x0000000193c549fc AppKit`__19-[NSAlert runModal]_block_invoke + 108
frame #9: 0x0000000193adda7c AppKit`_NSTryRunModal + 128
frame #10: 0x0000000193b6ba80 AppKit`-[NSAlert runModal] + 140
frame #11: 0x0000000100b4afcc base_unittests`base::MessagePumpMacTest_ScopedPumpMessagesAttemptWithModalDialog_Test::TestBody() + 520
frame #12: 0x0000000100fc02f0 base_unittests`testing::Test::Run() + 260
Other test has fundamentally the same stack:
% lldb -- ./base_unittests '--
gtest_filter=MessagePumpMacTest.QuitWithModalWindow' --single-process-tests
(lldb) target create "./base_unittests"
Current executable set to
'/Users/thakis/src/chrome/src/out/gn/repro/base_unittests' (arm64).
(lldb) settings set -- target.run-args "--
gtest_filter=MessagePumpMacTest.QuitWithModalWindow" "--single-process-tests"
(lldb) r
Process 14442 launched:
'/Users/thakis/src/chrome/src/out/gn/repro/base_unittests' (arm64)
Debugger detected, switching to single process mode.
Pass --test-launcher-debug-launcher to debug the launcher itself.
[14442:259:ERROR:icu_util.cc(179)] icudtl.dat not found in bundle
[14442:259:ERROR:icu_util.cc(243)] Invalid file descriptor to ICU data received.
Detected presence of a debugger, running without test timeouts.
Note: Google Test filter = MessagePumpMacTest.QuitWithModalWindow
[==========] Running 1 test from 1 test suite.
[----------] Global test environment set-up.
[----------] 1 test from MessagePumpMacTest
[ RUN ] MessagePumpMacTest.QuitWithModalWindow
Process 14442 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BREAKPOINT
(code=1, subcode=0x19a985d30)
frame #0: 0x000000019a985d30 libunwind.dylib`libunwind::UnwindCursor<libunwind::LocalAddressSpace, libunwind::Registers_arm64>::step() + 500
libunwind.dylib`libunwind::UnwindCursor<libunwind::LocalAddressSpace,
libunwind::Registers_arm64>::step:
-> 0x19a985d30 <+500>: brk #0xc471
0x19a985d34 <+504>: pacib x16, x8
0x19a985d38 <+508>: b 0x19a985dd8 ; <+668>
0x19a985d3c <+512>: ldur x11, [x10, #-0x8]
Target 0: (base_unittests) stopped.
(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BREAKPOINT
(code=1, subcode=0x19a985d30)
* frame #0: 0x000000019a985d30 libunwind.dylib`libunwind::UnwindCursor<libunwind::LocalAddressSpace, libunwind::Registers_arm64>::step() + 500
frame #1: 0x0000000190dc9f0c libobjc.A.dylib`objc_addExceptionHandler + 128
frame #2: 0x0000000193ade794 AppKit`-[NSApplication _commonBeginModalSessionForWindow:relativeToWindow:modalDelegate:didEndSelector:contextInfo:] + 960
frame #3: 0x0000000193ade330 AppKit`__35-[NSApplication runModalForWindow:]_block_invoke_2 + 44
frame #4: 0x0000000193ade2e8 AppKit`__35-[NSApplication runModalForWindow:]_block_invoke + 108
frame #5: 0x0000000193adda7c AppKit`_NSTryRunModal + 128
frame #6: 0x0000000193add928 AppKit`-[NSApplication runModalForWindow:] + 148
frame #7: 0x0000000100b4ba64 base_unittests`testing::internal::TestFactoryImpl<base::MessagePumpMacTest_QuitWithModalWindow_Test>::CreateTest() + 160
frame #8: 0x00000001010ef458 base_unittests`base::TaskAnnotator::RunTask(char const*, base::PendingTask*) + 316
From https://opensource.apple.com/tarballs/objc4/ , objc4-824, runtime/objc-
exception.mm:
uintptr_t objc_addExceptionHandler(objc_exception_handler fn, void *context)
{
// Find the closest enclosing frame with objc catch handlers
struct frame_range target_frame = findHandler();
....
and
static struct frame_range findHandler(void)
{
// walk stack looking for frame with objc catch handler
unw_context_t uc;
unw_cursor_t cursor;
unw_proc_info_t info;
unw_getcontext(&uc);
unw_init_local(&cursor, &uc);
while ( (unw_step(&cursor) > 0) && (unw_get_proc_info(&cursor, &info) == UNW_ESUCCESS) ) {
// must use objc personality handler
if ( info.handler != (uintptr_t)__objc_personality_v0 )
continue;
// must have landing pad
if ( info.lsda == 0 )
continue;
// must have landing pad that catches objc exceptions
struct dwarf_eh_bases bases;
bases.tbase = 0; // from unwind-dw2-fde-darwin.c:examine_objects()
bases.dbase = 0; // from unwind-dw2-fde-darwin.c:examine_objects()
bases.func = info.start_ip;
unw_word_t ip;
unw_get_reg(&cursor, UNW_REG_IP, &ip);
ip -= 1;
struct frame_range try_range = {0, 0, 0, 0};
if ( isObjCExceptionCatcher(info.lsda, ip, &bases, &try_range) ) {
unw_word_t cfa;
unw_get_reg(&cursor, UNW_REG_SP, &cfa);
try_range.cfa = cfa;
return try_range;
}
}
return (struct frame_range){0, 0, 0, 0};
}
The unw_ stuff is implemented in libunwind/ in an llvm checkout. Probably
somewhere in libunwind/src/CompactUnwinder.hpp,
CompactUnwinder_arm64<A>::stepWithCompactEncoding()?
This looks pretty normal:
% xcrun unwinddump ./Users/thakis/src/chrome/src/out/gn/obj/base/base_unittests/message_pump_mac_unittest.o | grep -A 8 QuitWithModalWindow_Test | grep CreateTest -A4
start: 0x00001BCC __ZN7testing8internal15TestFactoryImplIN4base43MessagePumpMacTest_QuitWithModalWindow_TestEE10CreateTestEv
end: 0x00001C18 (len=0x0000004C)
unwind info: 0x44000001 std frame: x19/20
personality: ___gxx_personality_v0
lsda: 0x00002AD0 GCC_except_table30
In particular, no ___objc_personality_v0 so this shouldn't matter for the objc
unwinder code at all. (I'm surprised there's a ___gxx_personality_v0 at all, I
thought we built with -fno-exceptions.)
Here's the crashing disassembly:
0x19a985cc8 <+396>: tbnz w9, #0x0, 0x19a985e98 ; <+860>
0x19a985ccc <+400>: tbnz w9, #0x1, 0x19a985eb0 ; <+884>
0x19a985cd0 <+404>: tbnz w9, #0x2, 0x19a985ec8 ; <+908>
0x19a985cd4 <+408>: tbnz w9, #0x3, 0x19a985ee0 ; <+932>
0x19a985cd8 <+412>: tbnz w9, #0x4, 0x19a985ef8 ; <+956>
0x19a985cdc <+416>: tbnz w9, #0x8, 0x19a985f10 ; <+980>
0x19a985ce0 <+420>: tbnz w9, #0x9, 0x19a985f28 ; <+1004>
0x19a985ce4 <+424>: tbnz w9, #0xa, 0x19a985f40 ; <+1028>
0x19a985ce8 <+428>: tbz w9, #0xb, 0x19a985d00 ; <+452>
0x19a985cec <+432>: ldr x9, [x10]
0x19a985cf0 <+436>: str x9, [x19, #0x190]
0x19a985cf4 <+440>: ldur x9, [x10, #-0x8]
0x19a985cf8 <+444>: str x9, [x19, #0x198]
0x19a985cfc <+448>: sub x10, x10, #0x10 ; =0x10
0x19a985d00 <+452>: ldr x9, [x19, #0x100]
0x19a985d04 <+456>: str x10, [x19, #0x108]
0x19a985d08 <+460>: cmp x8, #0x1 ; =0x1
0x19a985d0c <+464>: mov x8, x9
0x19a985d10 <+468>: pacib x8, x10
0x19a985d14 <+472>: csel x16, x9, x8, ne
0x19a985d18 <+476>: add x8, x19, #0x110 ; =0x110
0x19a985d1c <+480>: autib x16, x10
0x19a985d20 <+484>: mov x17, x16
0x19a985d24 <+488>: xpaci x17
0x19a985d28 <+492>: cmp x16, x17
0x19a985d2c <+496>: b.eq 0x19a985d34 ; <+504>
-> 0x19a985d30 <+500>: brk #0xc471
I think the tbnz cascade corresponds to the
if (encoding & UNWIND_ARM64_FRAME_X19_X20_PAIR) {
...
}
if (encoding & UNWIND_ARM64_FRAME_X21_X22_PAIR) {
..
...
in CompactUnwinder_arm64<A>::stepWithCompactEncodingFrame(). And then that
looks maybe like a pointer authentication epilogue and we crash because we fail
the pointer auth check (?)
I don't know if that's the culprit here, but it looks like there are arm64e-specific relocations. This https://llvm.org/devmtg/2019-10/slides/McCall-Bougacha-arm64e.pdf is a good intro to arm64e . ld64 has a bunch of stuff under SUPPORT_ARCH_arm64e , ARM64_RELOC_AUTHENTICATED_POINTER , etc. (Chromium doesn't use arm64e, but some system libraries apparently do, and maybe that needs linker support already? There should probably be a dedicated bug for full arm64e support in lld though -- eventually that'll be a thing for userspace apps.)
(The assembly kind of looks like resign mitigation #2 in the slides.)
Here's a smaller repro: https://drive.google.com/file/d/1M1_2OaL6l4uTw7wSOb1h2XFRjGN6lVm7/view?usp=sharing (5.8M unzipped, 1.1M zipped)
Attached repro.tar.bz2
(953419 bytes, application/x-bzip2): repro
Comparing the output of xcrun unwinddump base_unittests
with ld and lld, after removing hex offsets and [ 2]
from the out shows:
(no frame, no saved registers) ZN4base8internal9BindStateIPFvRKZ4mainE3$_0EJS2_EE7DestroyEPKNS0_13BindStateBaseE (no frame, no saved registers) __ZN4base8internal7InvokerINS0_9BindStateIPFvvEJEEES3_E7RunOnceEPNS0_13BindStateBaseE (no frame, no saved registers) ZN4base8internal9BindStateIPFvvEJEE7DestroyEPKNS0_13BindStateBaseE (std frame: x19/20 x21/22 ) ZN4base24MessagePumpCFRunLoopBase13SetTimerSlackENS_10TimerSlackE (std frame: x19/20 x21/22 ) ZNSt3110unique_ptrIN4base24MessagePumpCFRunLoopBase17ScopedModeEnablerENS_14default_deleteIS3_EEED1Ev (std frame: x19/20 x21/22 ) ZN4base24MessagePumpCFRunLoopBaseD1Ev (no frame, no saved registers) ZN4base24MessagePumpCFRunLoopBase16EnterExitRunLoopEm (std frame: x19/20 x21/22 x23/24 x25/26) ZNSt3110unique_ptrIN4base8internal34ScopedLazyTaskRunnerListForTestingENS_14default_deleteIS3_EEED1Ev (std frame: x19/20 ) ZN4base4test15TaskEnvironmentD1Ev (std frame: x19/20 x21/22 ) ZN4base4test15TaskEnvironment14MockTimeDomainD1Ev (no frame, no saved registers) ZNK4base4test15TaskEnvironment14MockTimeDomain3NowEv (std frame: ) ZNK4base4test15TaskEnvironment14MockTimeDomain7GetNameEv (no frame, no saved registers) __ZThn56_N4base4test15TaskEnvironment14MockTimeDomainD1Ev (no frame, no saved registers) ZThn56_NK4base4test15TaskEnvironment14MockTimeDomain8NowTicksEv (std frame: x19/20 x21/22 ) __ZN4base24MessagePumpCFRunLoopBase17ScopedModeEnablerD2Ev (std frame: x19/20 x21/22 ) ___clang_call_terminate
The mismatch on common encodings might be benign, since there are different ways to encode the same thing... the other two seem suspicious though
If I read this right, ld creates compact unwind info entries for symbols that
don't have them in the input .o files:
# No compact unwind for this
thakis@M1BP gn % for f in $(head -4 base_unittests.rsp); do echo xcrun
unwinddump $f | grep
'__ZN4base8internal9BindStateIPFvvEJEE7DestroyEPKNS0_13BindStateBaseE'; done
# No eh_frame either
thakis@M1BP gn % for f in $(head -4 base_unittests.rsp); do echo dwarfdump $f |
grep '__ZN4base8internal9BindStateIPFvvEJEE7DestroyEPKNS0_13BindStateBaseE';
done
# Still, ld writes something to the output
thakis@M1BP gn % xcrun unwinddump base_unittests | grep
'__ZN4base8internal9BindStateIPFvvEJEE7DestroyEPKNS0_13BindStateBaseE'
# relink with ld...
thakis@M1BP gn % ../../third_party/llvm-build/Release+Asserts/bin/clang++ -
stdlib=libc++ -arch arm64 -nostdlib++ -isysroot sdk/xcode_links/MacOSX11.3.sdk -
mmacosx-version-min=10.11.0 -Wl,-ObjC -o "./base_unittests" -Wl,-
filelist,"./base_unittests.rsp" -framework AppKit -Wl,-dead_strip
# ...and, an entry:
thakis@M1BP gn % xcrun unwinddump base_unittests | grep
'__ZN4base8internal9BindStateIPFvvEJEE7DestroyEPKNS0_13BindStateBaseE'
[ 7] funcOffset=0x00001DC0, encoding[ 12]=0x00000000 (no frame, no saved
registers )
__ZN4base8internal9BindStateIPFvvEJEE7DestroyEPKNS0_13BindStateBaseE
(Also, changing the order of the input .o files makes the test pass with lld
too, for reasons I also don't understand yet.)
Most (but not quite all) of the "missing" functions aren't in any of the input
files:
thakis@M1BP gn % for f in $(head -4 base_unittests.rsp); do echo $f; xcrun
unwinddump $f | grep
'__ZN4base8internal7InvokerINS0_9BindStateIPFvvEJEEES3_E7RunOnceEPNS0_13BindStateBaseE';
done
obj/base/base_unittests/call_with_eh_frame_asm.o
obj/base/base_unittests/message_pump_mac_unittest_ii.o
start: 0x000002E8 __ZN4base8internal7InvokerINS0_9BindStateIPFvvEJEEES3_E7RunOnceEPNS0_13BindStateBaseE
obj/base/base_unittests/jumbo_ii.o
obj/base/base_unittests/message_pump_mac_ii.o
thakis@M1BP gn % for f in $(head -4 base_unittests.rsp); do echo $f; xcrun
unwinddump $f | grep
'__ZN4base8internal9BindStateIPFvvEJEE7DestroyEPKNS0_13BindStateBaseE'; done
obj/base/base_unittests/call_with_eh_frame_asm.o
obj/base/base_unittests/message_pump_mac_unittest_ii.o
obj/base/base_unittests/jumbo_ii.o
obj/base/base_unittests/message_pump_mac_ii.o
thakis@M1BP gn % for f in $(head -4 base_unittests.rsp); do echo $f; xcrun
unwinddump $f | grep
'__ZN4base24MessagePumpCFRunLoopBase13SetTimerSlackENS_10TimerSlackE'; done
obj/base/base_unittests/call_with_eh_frame_asm.o
obj/base/base_unittests/message_pump_mac_unittest_ii.o
obj/base/base_unittests/jumbo_ii.o
obj/base/base_unittests/message_pump_mac_ii.o
thakis@M1BP gn % for f in $(head -4 base_unittests.rsp); do echo $f; xcrun
unwinddump $f | grep
'__ZNSt3__110unique_ptrIN4base24MessagePumpCFRunLoopBase17ScopedModeEnablerENS_14default_deleteIS3_EEED1Ev';
done
obj/base/base_unittests/call_with_eh_frame_asm.o
obj/base/base_unittests/message_pump_mac_unittest_ii.o
obj/base/base_unittests/jumbo_ii.o
obj/base/base_unittests/message_pump_mac_ii.o
thakis@M1BP gn % for f in $(head -4 base_unittests.rsp); do echo $f; xcrun
unwinddump $f | grep '__ZN4base24MessagePumpCFRunLoopBaseD1Ev'; done
obj/base/base_unittests/call_with_eh_frame_asm.o
obj/base/base_unittests/message_pump_mac_unittest_ii.o
obj/base/base_unittests/jumbo_ii.o
obj/base/base_unittests/message_pump_mac_ii.o
thakis@M1BP gn % for f in $(head -4 base_unittests.rsp); do echo $f; xcrun
unwinddump $f | grep '__ZN4base24MessagePumpCFRunLoopBase16EnterExitRunLoopEm';
done
obj/base/base_unittests/call_with_eh_frame_asm.o
obj/base/base_unittests/message_pump_mac_unittest_ii.o
obj/base/base_unittests/jumbo_ii.o
obj/base/base_unittests/message_pump_mac_ii.o
thakis@M1BP gn % for f in $(head -4 base_unittests.rsp); do echo $f; xcrun
unwinddump $f | grep
'__ZNSt3__110unique_ptrIN4base8internal34ScopedLazyTaskRunnerListForTestingENS_14default_deleteIS3_EEED1Ev';
done
obj/base/base_unittests/call_with_eh_frame_asm.o
obj/base/base_unittests/message_pump_mac_unittest_ii.o
obj/base/base_unittests/jumbo_ii.o
obj/base/base_unittests/message_pump_mac_ii.o
thakis@M1BP gn % for f in $(head -4 base_unittests.rsp); do echo $f; xcrun
unwinddump $f | grep '__ZN4base4test15TaskEnvironmentD1Ev'; done
obj/base/base_unittests/call_with_eh_frame_asm.o
obj/base/base_unittests/message_pump_mac_unittest_ii.o
obj/base/base_unittests/jumbo_ii.o
obj/base/base_unittests/message_pump_mac_ii.o
thakis@M1BP gn % for f in $(head -4 base_unittests.rsp); do echo $f; xcrun
unwinddump $f | grep '__ZN4base4test15TaskEnvironment14MockTimeDomainD1Ev'; done
obj/base/base_unittests/call_with_eh_frame_asm.o
obj/base/base_unittests/message_pump_mac_unittest_ii.o
obj/base/base_unittests/jumbo_ii.o
obj/base/base_unittests/message_pump_mac_ii.o
thakis@M1BP gn % for f in $(head -4 base_unittests.rsp); do echo $f; xcrun
unwinddump $f | grep '__ZNK4base4test15TaskEnvironment14MockTimeDomain3NowEv';
done
obj/base/base_unittests/call_with_eh_frame_asm.o
obj/base/base_unittests/message_pump_mac_unittest_ii.o
obj/base/base_unittests/jumbo_ii.o
obj/base/base_unittests/message_pump_mac_ii.o
thakis@M1BP gn % for f in $(head -4 base_unittests.rsp); do echo $f; xcrun
unwinddump $f | grep
'__ZNK4base4test15TaskEnvironment14MockTimeDomain7GetNameEv'; done
obj/base/base_unittests/call_with_eh_frame_asm.o
obj/base/base_unittests/message_pump_mac_unittest_ii.o
obj/base/base_unittests/jumbo_ii.o
obj/base/base_unittests/message_pump_mac_ii.o
thakis@M1BP gn % for f in $(head -4 base_unittests.rsp); do echo $f; xcrun
unwinddump $f | grep
'__ZThn56_N4base4test15TaskEnvironment14MockTimeDomainD1Ev'; done
obj/base/base_unittests/call_with_eh_frame_asm.o
obj/base/base_unittests/message_pump_mac_unittest_ii.o
obj/base/base_unittests/jumbo_ii.o
obj/base/base_unittests/message_pump_mac_ii.o
thakis@M1BP gn % for f in $(head -4 base_unittests.rsp); do echo $f; xcrun
unwinddump $f | grep
'__ZThn56_NK4base4test15TaskEnvironment14MockTimeDomain8NowTicksEv'; done
obj/base/base_unittests/call_with_eh_frame_asm.o
obj/base/base_unittests/message_pump_mac_unittest_ii.o
obj/base/base_unittests/jumbo_ii.o
obj/base/base_unittests/message_pump_mac_ii.o
start: 0x00001F48 __ZThn56_NK4base4test15TaskEnvironment14MockTimeDomain8NowTicksEv
thakis@M1BP gn % for f in $(head -4 base_unittests.rsp); do echo $f; xcrun
unwinddump $f | grep
'__ZN4base24MessagePumpCFRunLoopBase17ScopedModeEnablerD2Ev'; done
obj/base/base_unittests/call_with_eh_frame_asm.o
obj/base/base_unittests/message_pump_mac_unittest_ii.o
obj/base/base_unittests/jumbo_ii.o
obj/base/base_unittests/message_pump_mac_ii.o
thakis@M1BP gn % for f in $(head -4 base_unittests.rsp); do echo $f; xcrun
unwinddump $f | grep '___clang_call_terminate'; done
obj/base/base_unittests/call_with_eh_frame_asm.o
obj/base/base_unittests/message_pump_mac_unittest_ii.o
obj/base/base_unittests/jumbo_ii.o
obj/base/base_unittests/message_pump_mac_ii.o
getAllUnwindInfos() in ld64:
if ( atom->beginUnwind() == atom->endUnwind() ) {
// be sure to mark that we have no unwind info for stuff in the TEXT segment without unwind info
if ( (atom->section().type() == ld::Section::typeCode) && (atom->size() !=0) ) {
entries.push_back(UnwindEntry(atom, address, 0, NULL, NULL, NULL, 0));
}
}
I think ld64's algorithm is:
- at .o parse time, assign each symbol unwind info from __eh_frame (cfiArray)
or __unwind_info (cuInfoArray) in Parser<A>::parse
- when processing compat info, go through all symbols, and either assign dummy
unwind info or unwind info from .o files
lld currently just merges __compact_unwind from all inputs, which means things
without explicit unwind entries don't get unwind info.
Not sure if it's intentional on clang's end that it omits some unwind
information but apparently it's important to use ld64's algorithm.
Interesting finds...
I'm still puzzled over the missing LSDA for __ZSt9terminatev
though. At first I'd wondering if perhaps there were multiple definitions for that function in different object files, with only some having the associated LSDA, but it seems like the only place that it's defined is in cxa_handlers.o. And we do have an entry for that function in the second-level pages, so we are finding the one CU entry that points to it.
Another idea I had was to diff xcrun unwinddump
output for when I swap the order of the first few files. If I swap the 2nd and 3rd file (original order of these 2:
obj/base/base_unittests/message_pump_mac_unittest_ii.o obj/base/base_unittests/jumbo_ii.o
) then unwinddump output is basically identical except for a few offsets but the test passes. So I guess the missing entries aren't really a problem. Hmm.
I also tried building libunwind locally to add some debugging stuff there and then running the test with DYLD_LIBRARY_PATH=$HOME/src/llvm-project/out/gn/lib DYLD_PRINT_LIBRARIES=1
set, but my locally built libunwind complains libunwind: malformed __unwind_info at 0x1944E328C bad second level page
(for both ld64 and lld used to built base_unittests). Not sure why. Maybe I'll debug that part a bit.
Here's something resembling a slightly more concrete theory. Part of the unwinddump output when the binary crashes:
encodingsCount=0x00000000 funcOffset=0x00000B90 (std frame:) ZN4base3mac15CallWithEHFrameEU13block_pointerFvvE funcOffset=0x00000BB0 (std frame: x19/20) _main funcOffset=0x00000DA0 (std frame: x19/20) ZL3foov funcOffset=0x00000E44 (std frame: x19/20) ZN4base8internal16BindLambdaHelperIZ4mainE3$0FvvEE3RunERKS2 funcOffset=0x00000E7C (no frame, no saved registers) ZN4base8internal7InvokerINS0_9BindStateIPFvRKZ4mainE3$_0EJS3_EEEFvvEE3RunEPNS0_13BindStateBaseE funcOffset=0x00017960 (std frame: x19/20 x21/22 x23/24) __ZN4base24MessagePumpCFRunLoopBase3RunEPNS_11MessagePump8DelegateE
Here's the same with lines 2/3 switched as mentioned in the previous comment, where there's no crash:
encodingsCount=0x00000000 funcOffset=0x00000B90 (std frame:) ZN4base3mac15CallWithEHFrameEU13block_pointerFvvE funcOffset=0x00017670 (std frame: x19/20) _main funcOffset=0x00017860 (std frame: x19/20) ZL3foov funcOffset=0x00017904 (std frame: x19/20) ZN4base8internal16BindLambdaHelperIZ4mainE3$0FvvEE3RunERKS2 funcOffset=0x0001793C (no frame, no saved registers) ZN4base8internal7InvokerINS0_9BindStateIPFvRKZ4mainE3$_0EJS3_EEEFvvEE3RunEPNS0_13BindStateBaseE funcOffset=0x00017960 (std frame: x19/20 x21/22 x23/24) __ZN4base24MessagePumpCFRunLoopBase3RunEPNS_11MessagePump8DelegateE
This is identical except for the funcOffsets.
In the first (crashing) output, there's a big hole between 0x00000E7C and 0x00017960 between second-to-last and last line.
In the second (working) output, the hole is between 0x00000B90 and 0x00017670 between first and second line.
Looking at nm -n base_unittests
output, here's where the "missing" symbols from comment 11 are in both cases.
crashing:
thakis@M1BP gn % nm -n base_unittests | grep -E 'ZN4base8internal9BindStateIPFvRKZ4mainE3\$_0EJS2_EE7DestroyEPKNS0_13BindStateBaseE|__ZN4base8internal7InvokerINS0_9BindStateIPFvvEJEEES3_E7RunOnceEPNS0_13BindStateBaseE|ZN4base8internal9BindStateIPFvvEJEE7DestroyEPKNS0_13BindStateBaseE|ZN4base24MessagePumpCFRunLoopBase13SetTimerSlackENS_10TimerSlackE|ZNSt3110unique_ptrIN4base24MessagePumpCFRunLoopBase17ScopedModeEnablerENS_14default_deleteIS3_EEED1Ev|ZNSt3110unique_ptrIN4base24MessagePumpCFRunLoopBase17ScopedModeEnablerENS_14default_deleteIS3_EEED1Ev|ZN4base24MessagePumpCFRunLoopBase16EnterExitRunLoopEm|ZNSt3110unique_ptrIN4base8internal34ScopedLazyTaskRunnerListForTestingENS_14default_deleteIS3_EEED1Ev|ZN4base4test15TaskEnvironmentD1Ev|ZN4base4test15TaskEnvironment14MockTimeDomainD1Ev|ZNK4base4test15TaskEnvironment14MockTimeDomain3NowEv|ZNK4base4test15TaskEnvironment14MockTimeDomain7GetNameEv|ZThn56_N4base4test15TaskEnvironment14MockTimeDomainD1Ev|ZThn56_NK4base4test15TaskEnvironment14MockTimeDomain8NowTicksEv|__ZN4base24MessagePumpCFRunLoopBase17ScopedModeEnablerD2Ev|___clang_call_terminate'
0000000100000e8c t ZN4base8internal9BindStateIPFvRKZ4mainE3$_0EJS2_EE7DestroyEPKNS0_13BindStateBaseE 0000000100000e98 t __ZN4base8internal7InvokerINS0_9BindStateIPFvvEJEEES3_E7RunOnceEPNS0_13BindStateBaseE 0000000100000ea0 t ZN4base8internal9BindStateIPFvvEJEE7DestroyEPKNS0_13BindStateBaseE 0000000100017b24 t ZN4base24MessagePumpCFRunLoopBase13SetTimerSlackENS_10TimerSlackE 0000000100018170 t ZNSt3110unique_ptrIN4base24MessagePumpCFRunLoopBase17ScopedModeEnablerENS_14default_deleteIS3_EEED1Ev 00000001000186b0 t ZN4base24MessagePumpCFRunLoopBase16EnterExitRunLoopEm 0000000100018dd8 t ZN4base4test15TaskEnvironmentD1Ev 0000000100018df0 t ZN4base4test15TaskEnvironment14MockTimeDomainD1Ev 0000000100018e50 t ZNK4base4test15TaskEnvironment14MockTimeDomain3NowEv 0000000100018edc t ZNK4base4test15TaskEnvironment14MockTimeDomain7GetNameEv 0000000100018ef4 t ZThn56_N4base4test15TaskEnvironment14MockTimeDomainD1Ev 0000000100018f40 t ZThn56_NK4base4test15TaskEnvironment14MockTimeDomain8NowTicksEv 0000000100018f4c t __ZN4base24MessagePumpCFRunLoopBase17ScopedModeEnablerD2Ev 0000000100019668 t ___clang_call_terminate
working:
% nm -n base_unittests | grep -E 'ZN4base8internal9BindStateIPFvRKZ4mainE3\$_0EJS2_EE7DestroyEPKNS0_13BindStateBaseE|__ZN4base8internal7InvokerINS0_9BindStateIPFvvEJEEES3_E7RunOnceEPNS0_13BindStateBaseE|ZN4base8internal9BindStateIPFvvEJEE7DestroyEPKNS0_13BindStateBaseE|ZN4base24MessagePumpCFRunLoopBase13SetTimerSlackENS_10TimerSlackE|ZNSt3110unique_ptrIN4base24MessagePumpCFRunLoopBase17ScopedModeEnablerENS_14default_deleteIS3_EEED1Ev|ZNSt3110unique_ptrIN4base24MessagePumpCFRunLoopBase17ScopedModeEnablerENS_14default_deleteIS3_EEED1Ev|ZN4base24MessagePumpCFRunLoopBase16EnterExitRunLoopEm|ZNSt3110unique_ptrIN4base8internal34ScopedLazyTaskRunnerListForTestingENS_14default_deleteIS3_EEED1Ev|ZN4base4test15TaskEnvironmentD1Ev|ZN4base4test15TaskEnvironment14MockTimeDomainD1Ev|ZNK4base4test15TaskEnvironment14MockTimeDomain3NowEv|ZNK4base4test15TaskEnvironment14MockTimeDomain7GetNameEv|ZThn56_N4base4test15TaskEnvironment14MockTimeDomainD1Ev|ZThn56_NK4base4test15TaskEnvironment14MockTimeDomain8NowTicksEv|__ZN4base24MessagePumpCFRunLoopBase17ScopedModeEnablerD2Ev|___clang_call_terminate'
0000000100011d64 t ZN4base8internal9BindStateIPFvvEJEE7DestroyEPKNS0_13BindStateBaseE 000000010001794c t __ZN4base8internal9BindStateIPFvRKZ4mainE3$_0EJS2_EE7DestroyEPKNS0_13BindStateBaseE 0000000100017958 t ZN4base8internal7InvokerINS0_9BindStateIPFvvEJEEES3_E7RunOnceEPNS0_13BindStateBaseE 0000000100017b24 t ZN4base24MessagePumpCFRunLoopBase13SetTimerSlackENS_10TimerSlackE 0000000100018170 t ZNSt3110unique_ptrIN4base24MessagePumpCFRunLoopBase17ScopedModeEnablerENS_14default_deleteIS3_EEED1Ev 00000001000186b0 t ZN4base24MessagePumpCFRunLoopBase16EnterExitRunLoopEm 0000000100018dd8 t ZN4base4test15TaskEnvironmentD1Ev 0000000100018df0 t ZN4base4test15TaskEnvironment14MockTimeDomainD1Ev 0000000100018e50 t ZNK4base4test15TaskEnvironment14MockTimeDomain3NowEv 0000000100018edc t ZNK4base4test15TaskEnvironment14MockTimeDomain7GetNameEv 0000000100018ef4 t ZThn56_N4base4test15TaskEnvironment14MockTimeDomainD1Ev 0000000100018f40 t ZThn56_NK4base4test15TaskEnvironment14MockTimeDomain8NowTicksEv 0000000100018f4c t __ZN4base24MessagePumpCFRunLoopBase17ScopedModeEnablerD2Ev 0000000100019668 t ___clang_call_terminate
In the first case, the first 3 entries are in the hole (assuming 100000000 is the load address). In the second case, just the first entry is.
When libunwind looks for unwind information for a function, as far as I know it picks up the unwind info from the nearest entry at a lower address. So if something is in that hole, it gets the first entry above the hole.
In the crashing case, the last entry before the hole is
funcOffset=0x00000E7C (no frame, no saved registers) __ZN4base8internal7InvokerINS0_9BindStateIPFvRKZ4mainE3$_0EJS3_EEEFvvEE3RunEPNS0_13BindStateBaseE
In the non-crashing cash, the last entry before the hole is
funcOffset=0x00000B90 (std frame:) __ZN4base3mac15CallWithEHFrameEU13block_pointerFvvE
And I guess "std frame" is what the missing entry needs to work.
This theory explains why this is dependent on .o order, and it suggests it is due to missing unwind entries.
(Why is this arm-only? Don't yet know for sure, but some ideas:
Maybe clang/llvm do that optimization to omit compact unwind info for one function if it'd be identical to the previous function too? And then if the previous function is elided (either due to it being a weak coealesced symbol or a dead symbol -- in this case the former I suppose since it repros without -dead_strip too) then it'd be important that the linker explicitly inserts the unwind kind from the previous symbol.
Hm, if I set a breakpoint in foo
and run bt
at that breakpoint in lldb, here's what I get with the crashing binary:
(lldb) bt
main + 496 frame #1: 0x000000010000e3d8 base_unittests
non-virtual thunk to base::sequence_manager::internal::ThreadControllerWithMessagePumpImpl::DoWork() + 48So lldb also has trouble unwinding.
With the working binary (by tweaking the rsp file):
(lldb) bt
main + 496 frame #1: 0x000000010000df98 base_unittests
base::sequence_manager::internal::ThreadControllerWithMessagePumpImpl::DoWorkImpl(base::sequence_manager::LazyNow) + 136
frame #2: 0x000000010000e0dc base_unittestsnon-virtual thunk to base::sequence_manager::internal::ThreadControllerWithMessagePumpImpl::DoWork() + 48 frame #3: 0x0000000100018328 base_unittests
base::MessagePumpCFRunLoopBase::RunWork() + 76
frame #4: 0x0000000100000ba0 base_unittestsbase::mac::CallWithEHFrame(void () block_pointer) + 16 frame #5: 0x0000000100017db0 base_unittests
base::MessagePumpCFRunLoopBase::RunWorkSource(void) + 68
frame #6: 0x0000000191026ad4 CoreFoundation__CFRUNLOOP_IS_CALLING_OUT_TO_A_SOURCE0_PERFORM_FUNCTION__ + 28 frame #7: 0x0000000191026a20 CoreFoundation
CFRunLoopDoSource0 + 208
frame #8: 0x000000019102670c CoreFoundation`CFRunLoopDoSources0 + 268
frame #9: 0x0000000191025094 CoreFoundation__CFRunLoopRun + 820 frame #10: 0x00000001910245e8 CoreFoundation
CFRunLoopRunSpecific + 600
frame #11: 0x0000000198f3f2a0 HIToolboxRunCurrentEventLoopInMode + 292 frame #12: 0x0000000198f3ef2c HIToolbox
ReceiveNextEventCommon + 320
frame #13: 0x0000000198f3edd4 HIToolbox_BlockUntilNextEventMatchingListInModeWithFilter + 72 frame #14: 0x0000000193813480 AppKit
_DPSNextEvent + 836
frame #15: 0x0000000193811e20 AppKit-[NSApplication(NSEvent) _nextEventMatchingEventMask:untilDate:inMode:dequeue:] + 1292 frame #16: 0x0000000193803cac AppKit
-[NSApplication run] + 596
frame #17: 0x000000010001882c base_unittestsbase::MessagePumpNSApplication::DoRun(base::MessagePump::Delegate*) + 312 frame #18: 0x00000001000179ec base_unittests
base::MessagePumpCFRunLoopBase::Run(base::MessagePump::Delegate*) + 140
frame #19: 0x000000010000e370 base_unittestsbase::sequence_manager::internal::ThreadControllerWithMessagePumpImpl::Run(bool, base::TimeDelta) + 296 frame #20: 0x00000001000022e4 base_unittests
base::RunLoop::Run(base::Location const&) + 100
frame #21: 0x00000001000177cc base_unittestsmain + 348 frame #22: 0x0000000190f45450 libdyld.dylib
start + 4"non-virtual thunk to base::sequence_manager::internal::ThreadControllerWithMessagePumpImpl::DoWork()" is __ZThn8_N4base16sequence_manager8internal35ThreadControllerWithMessagePumpImpl6DoWorkEv in mangled.
"base::MessagePumpCFRunLoopBase::RunWork()" is __ZN4base24MessagePumpCFRunLoopBase7RunWorkEv in mangled.
Crashing binary:
% nm -n base_unittests | grep -E 'ZThn8_N4base16sequence_manager8internal35ThreadControllerWithMessagePumpImpl6DoWorkEv|ZN4base24MessagePumpCFRunLoopBase7RunWorkEv'
000000010000e3a8 t ZThn8_N4base16sequence_manager8internal35ThreadControllerWithMessagePumpImpl6DoWorkEv 00000001000182dc t ZN4base24MessagePumpCFRunLoopBase7RunWorkEv
Working binary:
thakis@M1BP gn % nm -n base_unittests | grep -E 'ZThn8_N4base16sequence_manager8internal35ThreadControllerWithMessagePumpImpl6DoWorkEv|ZN4base24MessagePumpCFRunLoopBase7RunWorkEv'
000000010000e0ac t ZThn8_N4base16sequence_manager8internal35ThreadControllerWithMessagePumpImpl6DoWorkEv 00000001000182dc t ZN4base24MessagePumpCFRunLoopBase7RunWorkEv
Reminder form an earlier comment:
In the crashing case, the last entry before the hole is
funcOffset=0x00000E7C (no frame, no saved registers) __ZN4base8internal7InvokerINS0_9BindStateIPFvRKZ4mainE3$_0EJS3_EEEFvvEE3RunEPNS0_13BindStateBaseE
And the first after it is
funcOffset=0x00017960 (std frame: x19/20 x21/22 x23/24) __ZN4base24MessagePumpCFRunLoopBase3RunEPNS_11MessagePump8DelegateE
In the non-crashing cash, the last entry before the hole is
funcOffset=0x00000B90 (std frame:) __ZN4base3mac15CallWithEHFrameEU13block_pointerFvvE
And the first after it is
funcOffset=0x00017670 (std frame: x19/20) _main
So __ZThn8_N4base16sequence_manager8internal35ThreadControllerWithMessagePumpImpl6DoWorkEv is also in that hole. So I guess that's the function that has the wrong unwind info, but it usually picks up the right unwind info from a different symbol, even with ld64, since it's never appears in unwinddump output, even when things work.
__ZThn8_N4base16sequence_manager8internal35ThreadControllerWithMessagePumpImpl6DoWorkEv is in obj/base/base_unittests/jumbo_ii.o which doesn't have any unwind info, likely due to it being a cc file built with -fno-exceptions.
Here's a tiny stand-alone repro based on this theory. It successfully reproduces the issue :)
The idea is to put a "no frame, no saved registers" entry in file0.o, to not have any entries in file1.o (due to using -fno-exceptions, which omits the entries, which is supposed to give everything the default entry), and then put something from file1.o on the stack when calling the unwinder (which I called "call").
The unwinder will look for the entry for "call", but since lld doesn't write one, will find the entry at the highest lower address, which is the entry from file0.o which is "no frame, no registers" -- but "call" has a default frame. ...at least that's what I thought, but now that I compare unwinddump
output for the binary linked by lld and ld64, the difference is that lld writes
encodingsCount=0x00000000 [ 0] encoding[2]=0x02000000 (no frame, no saved registers) Z5dummyv [ 1] encoding[0]=0x04000000 (std frame:) Z3foov [ 2] encoding[1]=0x02001000 (stack size=16:) __ZL7handlerP11objc_objectPv [ 3] encoding[0]=0x04000000 (std frame:) _main
while ld64 writes
encodingsCount=0x00000003 [ 0] encoding[3]=0x02000000 (no frame, no saved registers) Z5dummyv [ 1] encoding[2]=0x00000000 (no frame, no saved registers) Z4callPFvvE [ 2] encoding[0]=0x04000000 (std frame:) Z3foov [ 3] encoding[1]=0x02001000 (stack size=16:) ZL7handlerP11objc_objectPv [ 4] encoding[0]=0x04000000 (std frame:) _main
So ld64 does have an explicit entry for __Z4callPFvvE but it's also "no frame, no saved registers". The encoding is 0x00000000 instead of 0x02000000 which I suppose is the difference that matters (?)
Repro:
thakis@M1BP gn % cat file0.mm
void dummy() {}
thakis@M1BP gn % cat file1.cc void call(void(*f)()) { f(); }
thakis@M1BP gn % cat file2.mm
void call(void(*)());
static void handler(id a, void* b) {}
void foo() { // Anything that calls unw_getcontext/unw_init_local/unw_step should do, // objc_addExceptionHandler() does that, so calling that for convenience. objc_addExceptionHandler(&handler, 0); }
int main() { call(foo); }
thakis@M1BP gn % clang -c file0.mm file2.mm thakis@M1BP gn % clang -c -fno-exceptions file1.cc
thakis@M1BP gn % ../../third_party/llvm-build/Release+Asserts/bin/clang -fuse-ld=lld file0.o file1.o file2.o -isysroot $(xcrun -show-sdk-path) -lobjc thakis@M1BP gn % ./a.out zsh: trace trap ./a.out
thakis@M1BP gn % ../../third_party/llvm-build/Release+Asserts/bin/clang file0.o file1.o file2.o -isysroot $(xcrun -show-sdk-path) -lobjc
thakis@M1BP gn % ./a.out # Works with ld64
Nice find! I guess we could deal with this by having cuPtrVector
include dummy entries for functions without unwind info, right before the sort+fold step here: https://github.com/llvm/llvm-project/blob/main/lld/MachO/UnwindInfoSection.cpp#L297
FWIW, compiling your small repro example for x86_64 generates nonzero (and identical!) unwind info for all 4 functions, including __Z4callPFvvE
.
If you build the cc file with -fomit-frame-pointer, you can get that file to not include any unwind info. Locally I've also hacked up things to not emit eh_frame on x86_64 either [1], not sure that makes a difference. With that, bt
in lldb doesn't show call
, but that's with both lld and ld64. I guess things are better on x86_64 because everything always unwinds off rbp if it unwinds, at least for clang-generated object files (?)
It's a bit surprising that ld64 writes an encoding of 0x00000000; as far a I can tell that's not a valid encoding. I would expect that libunwind/src/CompactUnwinder.hpp,
CompactUnwinder_arm64::stepWithCompactEncoding
would hit the
_LIBUNWIND_ABORT("invalid compact unwind encoding");
but that's not what's happening. So I think I don't fully understand things yet.
[1]:
% git diff diff --git a/llvm/lib/MC/MCObjectFileInfo.cpp b/llvm/lib/MC/MCObjectFileInfo.cpp index 99526b384c6e..b4a53f9225f1 100644 --- a/llvm/lib/MC/MCObjectFileInfo.cpp +++ b/llvm/lib/MC/MCObjectFileInfo.cpp @@ -57,10 +57,11 @@ void MCObjectFileInfo::initMachOMCObjectFileInfo(const Triple &T) { MachO::S_ATTR_STRIP_STATIC_SYMS | MachO::S_ATTR_LIVE_SUPPORT, SectionKind::getReadOnly());
}
FDECFIEncoding = dwarf::DW_EH_PE_pcrel;
we could deal with this by having
cuPtrVector
include dummy entries for functions without unwind info, right before the sort+fold step here:
We also need to be able to identify all functions without unwind info somehow, right?
Looks like UnwindCursor<A, R>::getInfoFromCompactEncodingSection() copies the encoding to _info.format and that being 0 causes _unwindInfoMissing to become true. So an encoding of 0 is a special case.
(However, it looks like _unwindInfoMissing true causes step() to return UNW_STEP_END which is a bit strange too?)
lldb also special-cases encoding of 0 here: https://github.com/llvm/llvm-project/blob/main/lldb/source/Symbol/CompactUnwindInfo.cpp#L180
But lldb manages to unwind fine past call() somehow…
We also need to be able to identify all functions without unwind info somehow, right?
Yeah I was thinking that we would scan the compactUnwindSection once to gather all function pointers, then scan each InputFile for symbols in __text
that don't have a corresponding entry. Not exactly cheap but I guess it's unavoidable
(In reply to Jez Ng from comment #26)
Yeah I was thinking that we would scan the compactUnwindSection once to gather all function pointers, then scan each InputFile for symbols in
__text
that don't have a corresponding entry. Not exactly cheap but I guess it's unavoidable
It might be cheaper to give each Defined an associated unwind info range (when parsing the .o file this is cheap-ish: just walk the sorted unwind info in parallel) and then write the whole __unwind_info section based on that like ld64 does. That would also implement only writing unwind stuff for non-coalesced symbols as a side effect.
Not sure if that's better or worse.
The __eh_frame / __unwind_info stuff doesn't fit terribly neatly into the normal linker design :/
gkm said he's interested in working on the actual fix, so assigning to him. (Feel free to give back to me if that should change :) )
We have another crash bug in libunwind, this one in x86_64: https://bugs.chromium.org/p/chromium/issues/detail?id=1220175
Chances are it's due to this bug too, but I haven't verified that yet.
Attached tmp.diff
(2740 bytes, text/plain): patch sketch
Uploaded that as a stop gap to here: https://reviews.llvm.org/D104681
repro.tar.bz2
(953419 bytes, application/x-bzip2)tmp.diff
(2740 bytes, text/plain)