Quuxplusone / LLVMBugzillaTest

0 stars 0 forks source link

[lld/mac/arm64] Two of Chromium's base_unittests fail when linked with lld #49498

Closed Quuxplusone closed 3 years ago

Quuxplusone commented 3 years ago
Bugzilla Link PR50529
Status RESOLVED FIXED
Importance P enhancement
Reported by Nico Weber (nicolasweber@gmx.de)
Reported on 2021-05-29 12:53:33 -0700
Last modified on 2021-06-22 03:13:48 -0700
Version unspecified
Hardware PC All
CC gkm@fb.com, jezreel@gmail.com, llvm-bugs@lists.llvm.org, smeenai@fb.com
Fixed by commit(s) rGd6565a2dbcbe0932cd5cbb95bf2fc06855dfe938
Attachments repro.tar.bz2 (953419 bytes, application/x-bzip2)
tmp.diff (2740 bytes, text/plain)
Blocks
Blocked by
See also
repro: https://drive.google.com/file/d/1YvyZxh_ItvL1hdVeaab-kU7zm-
d1AokX/view?usp=sharing (17M bzipped, 79M extracted -- not huge)

With ld64:

% ld @response.txt
% ./base_unittests '--gtest_filter=MessagePumpMacTest.*'
IMPORTANT DEBUGGING NOTE: batches of tests are run inside their
own process. For debugging a test inside a debugger, use the
--gtest_filter=<your_test_name> flag along with
--single-process-tests.
Using sharding settings from environment. This is shard 0/1
Using 1 parallel jobs.
[12105:259:ERROR:icu_util.cc(179)] icudtl.dat not found in bundle
[12105:259:ERROR:icu_util.cc(243)] Invalid file descriptor to ICU data received.
Note: Google Test filter =
MessagePumpMacTest.ScopedPumpMessagesInPrivateModes:MessagePumpMacTest.ScopedPumpMessagesAttemptWithModalDialog:MessagePumpMacTest.QuitWithModalWindow
[==========] Running 3 tests from 1 test suite.
[----------] Global test environment set-up.
[----------] 3 tests from MessagePumpMacTest
[ RUN      ] MessagePumpMacTest.ScopedPumpMessagesInPrivateModes
[       OK ] MessagePumpMacTest.ScopedPumpMessagesInPrivateModes (42 ms)
[ RUN      ] MessagePumpMacTest.ScopedPumpMessagesAttemptWithModalDialog
[       OK ] MessagePumpMacTest.ScopedPumpMessagesAttemptWithModalDialog (41 ms)
[ RUN      ] MessagePumpMacTest.QuitWithModalWindow
[       OK ] MessagePumpMacTest.QuitWithModalWindow (30 ms)
[----------] 3 tests from MessagePumpMacTest (114 ms total)

[----------] Global test environment tear-down
[==========] 3 tests from 1 test suite ran. (114 ms total)
[  PASSED  ] 3 tests.
[1/3] MessagePumpMacTest.ScopedPumpMessagesInPrivateModes (42 ms)
[2/3] MessagePumpMacTest.ScopedPumpMessagesAttemptWithModalDialog (41 ms)
[3/3] MessagePumpMacTest.QuitWithModalWindow (30 ms)
SUCCESS: all tests passed.

But with lld:

% ./base_unittests '--gtest_filter=MessagePumpMacTest.*'
IMPORTANT DEBUGGING NOTE: batches of tests are run inside their
own process. For debugging a test inside a debugger, use the
--gtest_filter=<your_test_name> flag along with
--single-process-tests.
Using sharding settings from environment. This is shard 0/1
Using 1 parallel jobs.
[12111:259:ERROR:icu_util.cc(179)] icudtl.dat not found in bundle
[12111:259:ERROR:icu_util.cc(243)] Invalid file descriptor to ICU data received.
Note: Google Test filter =
MessagePumpMacTest.ScopedPumpMessagesInPrivateModes:MessagePumpMacTest.ScopedPumpMessagesAttemptWithModalDialog:MessagePumpMacTest.QuitWithModalWindow
[==========] Running 3 tests from 1 test suite.
[----------] Global test environment set-up.
[----------] 3 tests from MessagePumpMacTest
[ RUN      ] MessagePumpMacTest.ScopedPumpMessagesInPrivateModes
[       OK ] MessagePumpMacTest.ScopedPumpMessagesInPrivateModes (36 ms)
[ RUN      ] MessagePumpMacTest.ScopedPumpMessagesAttemptWithModalDialog
[12110:259:ERROR:test_launcher.cc(1093)] no test result for
MessagePumpMacTest.QuitWithModalWindow
[1/3] MessagePumpMacTest.ScopedPumpMessagesInPrivateModes (36 ms)
[2/3] MessagePumpMacTest.ScopedPumpMessagesAttemptWithModalDialog (CRASHED)
[3/3] MessagePumpMacTest.QuitWithModalWindow (SKIPPED)
1 test crashed:
    MessagePumpMacTest.ScopedPumpMessagesAttemptWithModalDialog (../../base/message_loop/message_pump_mac_unittest.mm:99)
1 test not run:
    MessagePumpMacTest.QuitWithModalWindow (../../base/message_loop/message_pump_mac_unittest.mm:121)
Tests took 0 seconds.
Quuxplusone commented 3 years ago

(This works fine on x86.)

Quuxplusone commented 3 years ago
Stack:

% lldb -- ./base_unittests '--
gtest_filter=MessagePumpMacTest.ScopedPumpMessagesAttemptWithModalDialog' --
single-process-tests
(lldb) target create "./base_unittests"
Current executable set to
'/Users/thakis/src/chrome/src/out/gn/repro/base_unittests' (arm64).
(lldb) settings set -- target.run-args  "--
gtest_filter=MessagePumpMacTest.ScopedPumpMessagesAttemptWithModalDialog" "--
single-process-tests"
(lldb) r
Process 14421 launched:
'/Users/thakis/src/chrome/src/out/gn/repro/base_unittests' (arm64)
Debugger detected, switching to single process mode.
Pass --test-launcher-debug-launcher to debug the launcher itself.
[14421:259:ERROR:icu_util.cc(179)] icudtl.dat not found in bundle
[14421:259:ERROR:icu_util.cc(243)] Invalid file descriptor to ICU data received.
Detected presence of a debugger, running without test timeouts.
Note: Google Test filter =
MessagePumpMacTest.ScopedPumpMessagesAttemptWithModalDialog
[==========] Running 1 test from 1 test suite.
[----------] Global test environment set-up.
[----------] 1 test from MessagePumpMacTest
[ RUN      ] MessagePumpMacTest.ScopedPumpMessagesAttemptWithModalDialog
Process 14421 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BREAKPOINT
(code=1, subcode=0x19a985d30)
    frame #0: 0x000000019a985d30 libunwind.dylib`libunwind::UnwindCursor<libunwind::LocalAddressSpace, libunwind::Registers_arm64>::step() + 500
libunwind.dylib`libunwind::UnwindCursor<libunwind::LocalAddressSpace,
libunwind::Registers_arm64>::step:
->  0x19a985d30 <+500>: brk    #0xc471
    0x19a985d34 <+504>: pacib  x16, x8
    0x19a985d38 <+508>: b      0x19a985dd8               ; <+668>
    0x19a985d3c <+512>: ldur   x11, [x10, #-0x8]
Target 0: (base_unittests) stopped.
(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BREAKPOINT
(code=1, subcode=0x19a985d30)
  * frame #0: 0x000000019a985d30 libunwind.dylib`libunwind::UnwindCursor<libunwind::LocalAddressSpace, libunwind::Registers_arm64>::step() + 500
    frame #1: 0x0000000190dc9f0c libobjc.A.dylib`objc_addExceptionHandler + 128
    frame #2: 0x0000000193ade794 AppKit`-[NSApplication _commonBeginModalSessionForWindow:relativeToWindow:modalDelegate:didEndSelector:contextInfo:] + 960
    frame #3: 0x0000000193ade330 AppKit`__35-[NSApplication runModalForWindow:]_block_invoke_2 + 44
    frame #4: 0x0000000193ade2e8 AppKit`__35-[NSApplication runModalForWindow:]_block_invoke + 108
    frame #5: 0x0000000193adda7c AppKit`_NSTryRunModal + 128
    frame #6: 0x0000000193add928 AppKit`-[NSApplication runModalForWindow:] + 148
    frame #7: 0x0000000193c54ab8 AppKit`__19-[NSAlert runModal]_block_invoke_2 + 160
    frame #8: 0x0000000193c549fc AppKit`__19-[NSAlert runModal]_block_invoke + 108
    frame #9: 0x0000000193adda7c AppKit`_NSTryRunModal + 128
    frame #10: 0x0000000193b6ba80 AppKit`-[NSAlert runModal] + 140
    frame #11: 0x0000000100b4afcc base_unittests`base::MessagePumpMacTest_ScopedPumpMessagesAttemptWithModalDialog_Test::TestBody() + 520
    frame #12: 0x0000000100fc02f0 base_unittests`testing::Test::Run() + 260
Quuxplusone commented 3 years ago
Other test has fundamentally the same stack:

% lldb -- ./base_unittests '--
gtest_filter=MessagePumpMacTest.QuitWithModalWindow' --single-process-tests
(lldb) target create "./base_unittests"
Current executable set to
'/Users/thakis/src/chrome/src/out/gn/repro/base_unittests' (arm64).
(lldb) settings set -- target.run-args  "--
gtest_filter=MessagePumpMacTest.QuitWithModalWindow" "--single-process-tests"
(lldb) r
Process 14442 launched:
'/Users/thakis/src/chrome/src/out/gn/repro/base_unittests' (arm64)
Debugger detected, switching to single process mode.
Pass --test-launcher-debug-launcher to debug the launcher itself.
[14442:259:ERROR:icu_util.cc(179)] icudtl.dat not found in bundle
[14442:259:ERROR:icu_util.cc(243)] Invalid file descriptor to ICU data received.
Detected presence of a debugger, running without test timeouts.
Note: Google Test filter = MessagePumpMacTest.QuitWithModalWindow
[==========] Running 1 test from 1 test suite.
[----------] Global test environment set-up.
[----------] 1 test from MessagePumpMacTest
[ RUN      ] MessagePumpMacTest.QuitWithModalWindow
Process 14442 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BREAKPOINT
(code=1, subcode=0x19a985d30)
    frame #0: 0x000000019a985d30 libunwind.dylib`libunwind::UnwindCursor<libunwind::LocalAddressSpace, libunwind::Registers_arm64>::step() + 500
libunwind.dylib`libunwind::UnwindCursor<libunwind::LocalAddressSpace,
libunwind::Registers_arm64>::step:
->  0x19a985d30 <+500>: brk    #0xc471
    0x19a985d34 <+504>: pacib  x16, x8
    0x19a985d38 <+508>: b      0x19a985dd8               ; <+668>
    0x19a985d3c <+512>: ldur   x11, [x10, #-0x8]
Target 0: (base_unittests) stopped.
(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BREAKPOINT
(code=1, subcode=0x19a985d30)
  * frame #0: 0x000000019a985d30 libunwind.dylib`libunwind::UnwindCursor<libunwind::LocalAddressSpace, libunwind::Registers_arm64>::step() + 500
    frame #1: 0x0000000190dc9f0c libobjc.A.dylib`objc_addExceptionHandler + 128
    frame #2: 0x0000000193ade794 AppKit`-[NSApplication _commonBeginModalSessionForWindow:relativeToWindow:modalDelegate:didEndSelector:contextInfo:] + 960
    frame #3: 0x0000000193ade330 AppKit`__35-[NSApplication runModalForWindow:]_block_invoke_2 + 44
    frame #4: 0x0000000193ade2e8 AppKit`__35-[NSApplication runModalForWindow:]_block_invoke + 108
    frame #5: 0x0000000193adda7c AppKit`_NSTryRunModal + 128
    frame #6: 0x0000000193add928 AppKit`-[NSApplication runModalForWindow:] + 148
    frame #7: 0x0000000100b4ba64 base_unittests`testing::internal::TestFactoryImpl<base::MessagePumpMacTest_QuitWithModalWindow_Test>::CreateTest() + 160
    frame #8: 0x00000001010ef458 base_unittests`base::TaskAnnotator::RunTask(char const*, base::PendingTask*) + 316
Quuxplusone commented 3 years ago
From https://opensource.apple.com/tarballs/objc4/ , objc4-824, runtime/objc-
exception.mm:

uintptr_t objc_addExceptionHandler(objc_exception_handler fn, void *context)
{
    // Find the closest enclosing frame with objc catch handlers
    struct frame_range target_frame = findHandler();
    ....

and

static struct frame_range findHandler(void)
{
    // walk stack looking for frame with objc catch handler
    unw_context_t    uc;
    unw_cursor_t    cursor;
    unw_proc_info_t    info;
    unw_getcontext(&uc);
    unw_init_local(&cursor, &uc);
    while ( (unw_step(&cursor) > 0) && (unw_get_proc_info(&cursor, &info) == UNW_ESUCCESS) ) {
        // must use objc personality handler
        if ( info.handler != (uintptr_t)__objc_personality_v0 )
            continue;
        // must have landing pad
        if ( info.lsda == 0 )
            continue;
        // must have landing pad that catches objc exceptions
        struct dwarf_eh_bases bases;
        bases.tbase = 0;  // from unwind-dw2-fde-darwin.c:examine_objects()
        bases.dbase = 0;  // from unwind-dw2-fde-darwin.c:examine_objects()
        bases.func = info.start_ip;
        unw_word_t ip;
        unw_get_reg(&cursor, UNW_REG_IP, &ip);
        ip -= 1;
        struct frame_range try_range = {0, 0, 0, 0};
        if ( isObjCExceptionCatcher(info.lsda, ip, &bases, &try_range) ) {
            unw_word_t cfa;
            unw_get_reg(&cursor, UNW_REG_SP, &cfa);
            try_range.cfa = cfa;
            return try_range;
        }
    }

    return (struct frame_range){0, 0, 0, 0};
}

The unw_ stuff is implemented in libunwind/ in an llvm checkout. Probably
somewhere in libunwind/src/CompactUnwinder.hpp,
CompactUnwinder_arm64<A>::stepWithCompactEncoding()?
Quuxplusone commented 3 years ago
This looks pretty normal:

 % xcrun unwinddump ./Users/thakis/src/chrome/src/out/gn/obj/base/base_unittests/message_pump_mac_unittest.o | grep -A 8 QuitWithModalWindow_Test | grep CreateTest -A4
  start:        0x00001BCC   __ZN7testing8internal15TestFactoryImplIN4base43MessagePumpMacTest_QuitWithModalWindow_TestEE10CreateTestEv
  end:          0x00001C18   (len=0x0000004C)
  unwind info:  0x44000001   std frame: x19/20
  personality:              ___gxx_personality_v0
  lsda:         0x00002AD0  GCC_except_table30

In particular, no ___objc_personality_v0 so this shouldn't matter for the objc
unwinder code at all. (I'm surprised there's a ___gxx_personality_v0 at all, I
thought we built with -fno-exceptions.)
Quuxplusone commented 3 years ago
Here's the crashing disassembly:

    0x19a985cc8 <+396>:  tbnz   w9, #0x0, 0x19a985e98     ; <+860>
    0x19a985ccc <+400>:  tbnz   w9, #0x1, 0x19a985eb0     ; <+884>
    0x19a985cd0 <+404>:  tbnz   w9, #0x2, 0x19a985ec8     ; <+908>
    0x19a985cd4 <+408>:  tbnz   w9, #0x3, 0x19a985ee0     ; <+932>
    0x19a985cd8 <+412>:  tbnz   w9, #0x4, 0x19a985ef8     ; <+956>
    0x19a985cdc <+416>:  tbnz   w9, #0x8, 0x19a985f10     ; <+980>
    0x19a985ce0 <+420>:  tbnz   w9, #0x9, 0x19a985f28     ; <+1004>
    0x19a985ce4 <+424>:  tbnz   w9, #0xa, 0x19a985f40     ; <+1028>
    0x19a985ce8 <+428>:  tbz    w9, #0xb, 0x19a985d00     ; <+452>
    0x19a985cec <+432>:  ldr    x9, [x10]
    0x19a985cf0 <+436>:  str    x9, [x19, #0x190]
    0x19a985cf4 <+440>:  ldur   x9, [x10, #-0x8]
    0x19a985cf8 <+444>:  str    x9, [x19, #0x198]
    0x19a985cfc <+448>:  sub    x10, x10, #0x10           ; =0x10
    0x19a985d00 <+452>:  ldr    x9, [x19, #0x100]
    0x19a985d04 <+456>:  str    x10, [x19, #0x108]
    0x19a985d08 <+460>:  cmp    x8, #0x1                  ; =0x1
    0x19a985d0c <+464>:  mov    x8, x9
    0x19a985d10 <+468>:  pacib  x8, x10
    0x19a985d14 <+472>:  csel   x16, x9, x8, ne
    0x19a985d18 <+476>:  add    x8, x19, #0x110           ; =0x110
    0x19a985d1c <+480>:  autib  x16, x10
    0x19a985d20 <+484>:  mov    x17, x16
    0x19a985d24 <+488>:  xpaci  x17
    0x19a985d28 <+492>:  cmp    x16, x17
    0x19a985d2c <+496>:  b.eq   0x19a985d34               ; <+504>
->  0x19a985d30 <+500>:  brk    #0xc471

I think the tbnz cascade corresponds to the

  if (encoding & UNWIND_ARM64_FRAME_X19_X20_PAIR) {
    ...
  }
  if (encoding & UNWIND_ARM64_FRAME_X21_X22_PAIR) {
    ..
  ...

in CompactUnwinder_arm64<A>::stepWithCompactEncodingFrame(). And then that
looks maybe like a pointer authentication epilogue and we crash because we fail
the pointer auth check (?)
Quuxplusone commented 3 years ago

I don't know if that's the culprit here, but it looks like there are arm64e-specific relocations. This https://llvm.org/devmtg/2019-10/slides/McCall-Bougacha-arm64e.pdf is a good intro to arm64e . ld64 has a bunch of stuff under SUPPORT_ARCH_arm64e , ARM64_RELOC_AUTHENTICATED_POINTER , etc. (Chromium doesn't use arm64e, but some system libraries apparently do, and maybe that needs linker support already? There should probably be a dedicated bug for full arm64e support in lld though -- eventually that'll be a thing for userspace apps.)

Quuxplusone commented 3 years ago

(The assembly kind of looks like resign mitigation #2 in the slides.)

Quuxplusone commented 3 years ago

Here's a smaller repro: https://drive.google.com/file/d/1M1_2OaL6l4uTw7wSOb1h2XFRjGN6lVm7/view?usp=sharing (5.8M unzipped, 1.1M zipped)

Quuxplusone commented 3 years ago

Attached repro.tar.bz2 (953419 bytes, application/x-bzip2): repro

Quuxplusone commented 3 years ago

Comparing the output of xcrun unwinddump base_unittests with ld and lld, after removing hex offsets and [ 2] from the out shows:

(no frame, no saved registers) ZN4base8internal9BindStateIPFvRKZ4mainE3$_0EJS2_EE7DestroyEPKNS0_13BindStateBaseE (no frame, no saved registers) __ZN4base8internal7InvokerINS0_9BindStateIPFvvEJEEES3_E7RunOnceEPNS0_13BindStateBaseE (no frame, no saved registers) ZN4base8internal9BindStateIPFvvEJEE7DestroyEPKNS0_13BindStateBaseE (std frame: x19/20 x21/22 ) ZN4base24MessagePumpCFRunLoopBase13SetTimerSlackENS_10TimerSlackE (std frame: x19/20 x21/22 ) ZNSt3110unique_ptrIN4base24MessagePumpCFRunLoopBase17ScopedModeEnablerENS_14default_deleteIS3_EEED1Ev (std frame: x19/20 x21/22 ) ZN4base24MessagePumpCFRunLoopBaseD1Ev (no frame, no saved registers) ZN4base24MessagePumpCFRunLoopBase16EnterExitRunLoopEm (std frame: x19/20 x21/22 x23/24 x25/26) ZNSt3110unique_ptrIN4base8internal34ScopedLazyTaskRunnerListForTestingENS_14default_deleteIS3_EEED1Ev (std frame: x19/20 ) ZN4base4test15TaskEnvironmentD1Ev (std frame: x19/20 x21/22 ) ZN4base4test15TaskEnvironment14MockTimeDomainD1Ev (no frame, no saved registers) ZNK4base4test15TaskEnvironment14MockTimeDomain3NowEv (std frame: ) ZNK4base4test15TaskEnvironment14MockTimeDomain7GetNameEv (no frame, no saved registers) __ZThn56_N4base4test15TaskEnvironment14MockTimeDomainD1Ev (no frame, no saved registers) ZThn56_NK4base4test15TaskEnvironment14MockTimeDomain8NowTicksEv (std frame: x19/20 x21/22 ) __ZN4base24MessagePumpCFRunLoopBase17ScopedModeEnablerD2Ev (std frame: x19/20 x21/22 ) ___clang_call_terminate

Quuxplusone commented 3 years ago

The mismatch on common encodings might be benign, since there are different ways to encode the same thing... the other two seem suspicious though

Quuxplusone commented 3 years ago
If I read this right, ld creates compact unwind info entries for symbols that
don't have them in the input .o files:

# No compact unwind for this
thakis@M1BP gn % for f in $(head -4 base_unittests.rsp); do echo xcrun
unwinddump $f | grep
'__ZN4base8internal9BindStateIPFvvEJEE7DestroyEPKNS0_13BindStateBaseE'; done

# No eh_frame either
thakis@M1BP gn % for f in $(head -4 base_unittests.rsp); do echo dwarfdump $f |
grep '__ZN4base8internal9BindStateIPFvvEJEE7DestroyEPKNS0_13BindStateBaseE';
done

# Still, ld writes something to the output
thakis@M1BP gn % xcrun unwinddump base_unittests | grep
'__ZN4base8internal9BindStateIPFvvEJEE7DestroyEPKNS0_13BindStateBaseE'

# relink with ld...
thakis@M1BP gn % ../../third_party/llvm-build/Release+Asserts/bin/clang++ -
stdlib=libc++ -arch arm64 -nostdlib++ -isysroot sdk/xcode_links/MacOSX11.3.sdk -
mmacosx-version-min=10.11.0 -Wl,-ObjC -o "./base_unittests" -Wl,-
filelist,"./base_unittests.rsp" -framework AppKit -Wl,-dead_strip

# ...and, an entry:
thakis@M1BP gn % xcrun unwinddump base_unittests | grep
'__ZN4base8internal9BindStateIPFvvEJEE7DestroyEPKNS0_13BindStateBaseE'

            [  7] funcOffset=0x00001DC0, encoding[ 12]=0x00000000 (no frame, no saved
registers                            )
__ZN4base8internal9BindStateIPFvvEJEE7DestroyEPKNS0_13BindStateBaseE

(Also, changing the order of the input .o files makes the test pass with lld
too, for reasons I also don't understand yet.)
Quuxplusone commented 3 years ago
Most (but not quite all) of the "missing" functions aren't in any of the input
files:

thakis@M1BP gn % for f in $(head -4 base_unittests.rsp); do echo $f; xcrun
unwinddump $f | grep
'__ZN4base8internal7InvokerINS0_9BindStateIPFvvEJEEES3_E7RunOnceEPNS0_13BindStateBaseE';
done
obj/base/base_unittests/call_with_eh_frame_asm.o
obj/base/base_unittests/message_pump_mac_unittest_ii.o
  start:        0x000002E8   __ZN4base8internal7InvokerINS0_9BindStateIPFvvEJEEES3_E7RunOnceEPNS0_13BindStateBaseE
obj/base/base_unittests/jumbo_ii.o
obj/base/base_unittests/message_pump_mac_ii.o
thakis@M1BP gn % for f in $(head -4 base_unittests.rsp); do echo $f; xcrun
unwinddump $f | grep
'__ZN4base8internal9BindStateIPFvvEJEE7DestroyEPKNS0_13BindStateBaseE'; done
obj/base/base_unittests/call_with_eh_frame_asm.o
obj/base/base_unittests/message_pump_mac_unittest_ii.o
obj/base/base_unittests/jumbo_ii.o
obj/base/base_unittests/message_pump_mac_ii.o
thakis@M1BP gn % for f in $(head -4 base_unittests.rsp); do echo $f; xcrun
unwinddump $f | grep
'__ZN4base24MessagePumpCFRunLoopBase13SetTimerSlackENS_10TimerSlackE'; done
obj/base/base_unittests/call_with_eh_frame_asm.o
obj/base/base_unittests/message_pump_mac_unittest_ii.o
obj/base/base_unittests/jumbo_ii.o
obj/base/base_unittests/message_pump_mac_ii.o
thakis@M1BP gn % for f in $(head -4 base_unittests.rsp); do echo $f; xcrun
unwinddump $f | grep
'__ZNSt3__110unique_ptrIN4base24MessagePumpCFRunLoopBase17ScopedModeEnablerENS_14default_deleteIS3_EEED1Ev';
done
obj/base/base_unittests/call_with_eh_frame_asm.o
obj/base/base_unittests/message_pump_mac_unittest_ii.o
obj/base/base_unittests/jumbo_ii.o
obj/base/base_unittests/message_pump_mac_ii.o
thakis@M1BP gn % for f in $(head -4 base_unittests.rsp); do echo $f; xcrun
unwinddump $f | grep '__ZN4base24MessagePumpCFRunLoopBaseD1Ev'; done
obj/base/base_unittests/call_with_eh_frame_asm.o
obj/base/base_unittests/message_pump_mac_unittest_ii.o
obj/base/base_unittests/jumbo_ii.o
obj/base/base_unittests/message_pump_mac_ii.o
thakis@M1BP gn % for f in $(head -4 base_unittests.rsp); do echo $f; xcrun
unwinddump $f | grep '__ZN4base24MessagePumpCFRunLoopBase16EnterExitRunLoopEm';
done
obj/base/base_unittests/call_with_eh_frame_asm.o
obj/base/base_unittests/message_pump_mac_unittest_ii.o
obj/base/base_unittests/jumbo_ii.o
obj/base/base_unittests/message_pump_mac_ii.o
thakis@M1BP gn % for f in $(head -4 base_unittests.rsp); do echo $f; xcrun
unwinddump $f | grep
'__ZNSt3__110unique_ptrIN4base8internal34ScopedLazyTaskRunnerListForTestingENS_14default_deleteIS3_EEED1Ev';
done
obj/base/base_unittests/call_with_eh_frame_asm.o
obj/base/base_unittests/message_pump_mac_unittest_ii.o
obj/base/base_unittests/jumbo_ii.o
obj/base/base_unittests/message_pump_mac_ii.o
thakis@M1BP gn % for f in $(head -4 base_unittests.rsp); do echo $f; xcrun
unwinddump $f | grep '__ZN4base4test15TaskEnvironmentD1Ev'; done
obj/base/base_unittests/call_with_eh_frame_asm.o
obj/base/base_unittests/message_pump_mac_unittest_ii.o
obj/base/base_unittests/jumbo_ii.o
obj/base/base_unittests/message_pump_mac_ii.o
thakis@M1BP gn % for f in $(head -4 base_unittests.rsp); do echo $f; xcrun
unwinddump $f | grep '__ZN4base4test15TaskEnvironment14MockTimeDomainD1Ev'; done
obj/base/base_unittests/call_with_eh_frame_asm.o
obj/base/base_unittests/message_pump_mac_unittest_ii.o
obj/base/base_unittests/jumbo_ii.o
obj/base/base_unittests/message_pump_mac_ii.o
thakis@M1BP gn % for f in $(head -4 base_unittests.rsp); do echo $f; xcrun
unwinddump $f | grep '__ZNK4base4test15TaskEnvironment14MockTimeDomain3NowEv';
done
obj/base/base_unittests/call_with_eh_frame_asm.o
obj/base/base_unittests/message_pump_mac_unittest_ii.o
obj/base/base_unittests/jumbo_ii.o
obj/base/base_unittests/message_pump_mac_ii.o
thakis@M1BP gn % for f in $(head -4 base_unittests.rsp); do echo $f; xcrun
unwinddump $f | grep
'__ZNK4base4test15TaskEnvironment14MockTimeDomain7GetNameEv'; done
obj/base/base_unittests/call_with_eh_frame_asm.o
obj/base/base_unittests/message_pump_mac_unittest_ii.o
obj/base/base_unittests/jumbo_ii.o
obj/base/base_unittests/message_pump_mac_ii.o
thakis@M1BP gn % for f in $(head -4 base_unittests.rsp); do echo $f; xcrun
unwinddump $f | grep
'__ZThn56_N4base4test15TaskEnvironment14MockTimeDomainD1Ev'; done
obj/base/base_unittests/call_with_eh_frame_asm.o
obj/base/base_unittests/message_pump_mac_unittest_ii.o
obj/base/base_unittests/jumbo_ii.o
obj/base/base_unittests/message_pump_mac_ii.o
thakis@M1BP gn % for f in $(head -4 base_unittests.rsp); do echo $f; xcrun
unwinddump $f | grep
'__ZThn56_NK4base4test15TaskEnvironment14MockTimeDomain8NowTicksEv'; done
obj/base/base_unittests/call_with_eh_frame_asm.o
obj/base/base_unittests/message_pump_mac_unittest_ii.o
obj/base/base_unittests/jumbo_ii.o
obj/base/base_unittests/message_pump_mac_ii.o
  start:        0x00001F48   __ZThn56_NK4base4test15TaskEnvironment14MockTimeDomain8NowTicksEv
thakis@M1BP gn % for f in $(head -4 base_unittests.rsp); do echo $f; xcrun
unwinddump $f | grep
'__ZN4base24MessagePumpCFRunLoopBase17ScopedModeEnablerD2Ev'; done
obj/base/base_unittests/call_with_eh_frame_asm.o
obj/base/base_unittests/message_pump_mac_unittest_ii.o
obj/base/base_unittests/jumbo_ii.o
obj/base/base_unittests/message_pump_mac_ii.o
thakis@M1BP gn % for f in $(head -4 base_unittests.rsp); do echo $f; xcrun
unwinddump $f | grep '___clang_call_terminate'; done
obj/base/base_unittests/call_with_eh_frame_asm.o
obj/base/base_unittests/message_pump_mac_unittest_ii.o
obj/base/base_unittests/jumbo_ii.o
obj/base/base_unittests/message_pump_mac_ii.o
Quuxplusone commented 3 years ago
getAllUnwindInfos() in ld64:

  if ( atom->beginUnwind() == atom->endUnwind() ) {
    // be sure to mark that we have no unwind info for stuff in the TEXT segment without unwind info
    if ( (atom->section().type() == ld::Section::typeCode) && (atom->size() !=0) ) {
          entries.push_back(UnwindEntry(atom, address, 0, NULL, NULL, NULL, 0));
    }
  }

I think ld64's algorithm is:

- at .o parse time, assign each symbol unwind info from __eh_frame (cfiArray)
or __unwind_info (cuInfoArray) in  Parser<A>::parse

- when processing compat info, go through all symbols, and either assign dummy
unwind info or unwind info from .o files

lld currently just merges __compact_unwind from all inputs, which means things
without explicit unwind entries don't get unwind info.

Not sure if it's intentional on clang's end that it omits some unwind
information but apparently it's important to use ld64's algorithm.
Quuxplusone commented 3 years ago

Interesting finds...

I'm still puzzled over the missing LSDA for __ZSt9terminatev though. At first I'd wondering if perhaps there were multiple definitions for that function in different object files, with only some having the associated LSDA, but it seems like the only place that it's defined is in cxa_handlers.o. And we do have an entry for that function in the second-level pages, so we are finding the one CU entry that points to it.

Quuxplusone commented 3 years ago

Another idea I had was to diff xcrun unwinddump output for when I swap the order of the first few files. If I swap the 2nd and 3rd file (original order of these 2:

obj/base/base_unittests/message_pump_mac_unittest_ii.o obj/base/base_unittests/jumbo_ii.o

) then unwinddump output is basically identical except for a few offsets but the test passes. So I guess the missing entries aren't really a problem. Hmm.

I also tried building libunwind locally to add some debugging stuff there and then running the test with DYLD_LIBRARY_PATH=$HOME/src/llvm-project/out/gn/lib DYLD_PRINT_LIBRARIES=1 set, but my locally built libunwind complains libunwind: malformed __unwind_info at 0x1944E328C bad second level page (for both ld64 and lld used to built base_unittests). Not sure why. Maybe I'll debug that part a bit.

Quuxplusone commented 3 years ago

Here's something resembling a slightly more concrete theory. Part of the unwinddump output when the binary crashes:

encodingsCount=0x00000000 funcOffset=0x00000B90 (std frame:) ZN4base3mac15CallWithEHFrameEU13block_pointerFvvE funcOffset=0x00000BB0 (std frame: x19/20) _main funcOffset=0x00000DA0 (std frame: x19/20) ZL3foov funcOffset=0x00000E44 (std frame: x19/20) ZN4base8internal16BindLambdaHelperIZ4mainE3$0FvvEE3RunERKS2 funcOffset=0x00000E7C (no frame, no saved registers) ZN4base8internal7InvokerINS0_9BindStateIPFvRKZ4mainE3$_0EJS3_EEEFvvEE3RunEPNS0_13BindStateBaseE funcOffset=0x00017960 (std frame: x19/20 x21/22 x23/24) __ZN4base24MessagePumpCFRunLoopBase3RunEPNS_11MessagePump8DelegateE

Here's the same with lines 2/3 switched as mentioned in the previous comment, where there's no crash:

encodingsCount=0x00000000 funcOffset=0x00000B90 (std frame:) ZN4base3mac15CallWithEHFrameEU13block_pointerFvvE funcOffset=0x00017670 (std frame: x19/20) _main funcOffset=0x00017860 (std frame: x19/20) ZL3foov funcOffset=0x00017904 (std frame: x19/20) ZN4base8internal16BindLambdaHelperIZ4mainE3$0FvvEE3RunERKS2 funcOffset=0x0001793C (no frame, no saved registers) ZN4base8internal7InvokerINS0_9BindStateIPFvRKZ4mainE3$_0EJS3_EEEFvvEE3RunEPNS0_13BindStateBaseE funcOffset=0x00017960 (std frame: x19/20 x21/22 x23/24) __ZN4base24MessagePumpCFRunLoopBase3RunEPNS_11MessagePump8DelegateE

This is identical except for the funcOffsets.

In the first (crashing) output, there's a big hole between 0x00000E7C and 0x00017960 between second-to-last and last line.

In the second (working) output, the hole is between 0x00000B90 and 0x00017670 between first and second line.

Looking at nm -n base_unittests output, here's where the "missing" symbols from comment 11 are in both cases.

crashing:

thakis@M1BP gn % nm -n base_unittests | grep -E 'ZN4base8internal9BindStateIPFvRKZ4mainE3\$_0EJS2_EE7DestroyEPKNS0_13BindStateBaseE|__ZN4base8internal7InvokerINS0_9BindStateIPFvvEJEEES3_E7RunOnceEPNS0_13BindStateBaseE|ZN4base8internal9BindStateIPFvvEJEE7DestroyEPKNS0_13BindStateBaseE|ZN4base24MessagePumpCFRunLoopBase13SetTimerSlackENS_10TimerSlackE|ZNSt3110unique_ptrIN4base24MessagePumpCFRunLoopBase17ScopedModeEnablerENS_14default_deleteIS3_EEED1Ev|ZNSt3110unique_ptrIN4base24MessagePumpCFRunLoopBase17ScopedModeEnablerENS_14default_deleteIS3_EEED1Ev|ZN4base24MessagePumpCFRunLoopBase16EnterExitRunLoopEm|ZNSt3110unique_ptrIN4base8internal34ScopedLazyTaskRunnerListForTestingENS_14default_deleteIS3_EEED1Ev|ZN4base4test15TaskEnvironmentD1Ev|ZN4base4test15TaskEnvironment14MockTimeDomainD1Ev|ZNK4base4test15TaskEnvironment14MockTimeDomain3NowEv|ZNK4base4test15TaskEnvironment14MockTimeDomain7GetNameEv|ZThn56_N4base4test15TaskEnvironment14MockTimeDomainD1Ev|ZThn56_NK4base4test15TaskEnvironment14MockTimeDomain8NowTicksEv|__ZN4base24MessagePumpCFRunLoopBase17ScopedModeEnablerD2Ev|___clang_call_terminate'

0000000100000e8c t ZN4base8internal9BindStateIPFvRKZ4mainE3$_0EJS2_EE7DestroyEPKNS0_13BindStateBaseE 0000000100000e98 t __ZN4base8internal7InvokerINS0_9BindStateIPFvvEJEEES3_E7RunOnceEPNS0_13BindStateBaseE 0000000100000ea0 t ZN4base8internal9BindStateIPFvvEJEE7DestroyEPKNS0_13BindStateBaseE 0000000100017b24 t ZN4base24MessagePumpCFRunLoopBase13SetTimerSlackENS_10TimerSlackE 0000000100018170 t ZNSt3110unique_ptrIN4base24MessagePumpCFRunLoopBase17ScopedModeEnablerENS_14default_deleteIS3_EEED1Ev 00000001000186b0 t ZN4base24MessagePumpCFRunLoopBase16EnterExitRunLoopEm 0000000100018dd8 t ZN4base4test15TaskEnvironmentD1Ev 0000000100018df0 t ZN4base4test15TaskEnvironment14MockTimeDomainD1Ev 0000000100018e50 t ZNK4base4test15TaskEnvironment14MockTimeDomain3NowEv 0000000100018edc t ZNK4base4test15TaskEnvironment14MockTimeDomain7GetNameEv 0000000100018ef4 t ZThn56_N4base4test15TaskEnvironment14MockTimeDomainD1Ev 0000000100018f40 t ZThn56_NK4base4test15TaskEnvironment14MockTimeDomain8NowTicksEv 0000000100018f4c t __ZN4base24MessagePumpCFRunLoopBase17ScopedModeEnablerD2Ev 0000000100019668 t ___clang_call_terminate

working:

% nm -n base_unittests | grep -E 'ZN4base8internal9BindStateIPFvRKZ4mainE3\$_0EJS2_EE7DestroyEPKNS0_13BindStateBaseE|__ZN4base8internal7InvokerINS0_9BindStateIPFvvEJEEES3_E7RunOnceEPNS0_13BindStateBaseE|ZN4base8internal9BindStateIPFvvEJEE7DestroyEPKNS0_13BindStateBaseE|ZN4base24MessagePumpCFRunLoopBase13SetTimerSlackENS_10TimerSlackE|ZNSt3110unique_ptrIN4base24MessagePumpCFRunLoopBase17ScopedModeEnablerENS_14default_deleteIS3_EEED1Ev|ZNSt3110unique_ptrIN4base24MessagePumpCFRunLoopBase17ScopedModeEnablerENS_14default_deleteIS3_EEED1Ev|ZN4base24MessagePumpCFRunLoopBase16EnterExitRunLoopEm|ZNSt3110unique_ptrIN4base8internal34ScopedLazyTaskRunnerListForTestingENS_14default_deleteIS3_EEED1Ev|ZN4base4test15TaskEnvironmentD1Ev|ZN4base4test15TaskEnvironment14MockTimeDomainD1Ev|ZNK4base4test15TaskEnvironment14MockTimeDomain3NowEv|ZNK4base4test15TaskEnvironment14MockTimeDomain7GetNameEv|ZThn56_N4base4test15TaskEnvironment14MockTimeDomainD1Ev|ZThn56_NK4base4test15TaskEnvironment14MockTimeDomain8NowTicksEv|__ZN4base24MessagePumpCFRunLoopBase17ScopedModeEnablerD2Ev|___clang_call_terminate'

0000000100011d64 t ZN4base8internal9BindStateIPFvvEJEE7DestroyEPKNS0_13BindStateBaseE 000000010001794c t __ZN4base8internal9BindStateIPFvRKZ4mainE3$_0EJS2_EE7DestroyEPKNS0_13BindStateBaseE 0000000100017958 t ZN4base8internal7InvokerINS0_9BindStateIPFvvEJEEES3_E7RunOnceEPNS0_13BindStateBaseE 0000000100017b24 t ZN4base24MessagePumpCFRunLoopBase13SetTimerSlackENS_10TimerSlackE 0000000100018170 t ZNSt3110unique_ptrIN4base24MessagePumpCFRunLoopBase17ScopedModeEnablerENS_14default_deleteIS3_EEED1Ev 00000001000186b0 t ZN4base24MessagePumpCFRunLoopBase16EnterExitRunLoopEm 0000000100018dd8 t ZN4base4test15TaskEnvironmentD1Ev 0000000100018df0 t ZN4base4test15TaskEnvironment14MockTimeDomainD1Ev 0000000100018e50 t ZNK4base4test15TaskEnvironment14MockTimeDomain3NowEv 0000000100018edc t ZNK4base4test15TaskEnvironment14MockTimeDomain7GetNameEv 0000000100018ef4 t ZThn56_N4base4test15TaskEnvironment14MockTimeDomainD1Ev 0000000100018f40 t ZThn56_NK4base4test15TaskEnvironment14MockTimeDomain8NowTicksEv 0000000100018f4c t __ZN4base24MessagePumpCFRunLoopBase17ScopedModeEnablerD2Ev 0000000100019668 t ___clang_call_terminate

In the first case, the first 3 entries are in the hole (assuming 100000000 is the load address). In the second case, just the first entry is.

When libunwind looks for unwind information for a function, as far as I know it picks up the unwind info from the nearest entry at a lower address. So if something is in that hole, it gets the first entry above the hole.

In the crashing case, the last entry before the hole is

funcOffset=0x00000E7C (no frame, no saved registers) __ZN4base8internal7InvokerINS0_9BindStateIPFvRKZ4mainE3$_0EJS3_EEEFvvEE3RunEPNS0_13BindStateBaseE

In the non-crashing cash, the last entry before the hole is

funcOffset=0x00000B90 (std frame:) __ZN4base3mac15CallWithEHFrameEU13block_pointerFvvE

And I guess "std frame" is what the missing entry needs to work.

This theory explains why this is dependent on .o order, and it suggests it is due to missing unwind entries.

(Why is this arm-only? Don't yet know for sure, but some ideas:

Quuxplusone commented 3 years ago

Maybe clang/llvm do that optimization to omit compact unwind info for one function if it'd be identical to the previous function too? And then if the previous function is elided (either due to it being a weak coealesced symbol or a dead symbol -- in this case the former I suppose since it repros without -dead_strip too) then it'd be important that the linker explicitly inserts the unwind kind from the previous symbol.

Quuxplusone commented 3 years ago

Hm, if I set a breakpoint in foo and run bt at that breakpoint in lldb, here's what I get with the crashing binary:

(lldb) bt

So lldb also has trouble unwinding.

With the working binary (by tweaking the rsp file):

(lldb) bt

"non-virtual thunk to base::sequence_manager::internal::ThreadControllerWithMessagePumpImpl::DoWork()" is __ZThn8_N4base16sequence_manager8internal35ThreadControllerWithMessagePumpImpl6DoWorkEv in mangled.

"base::MessagePumpCFRunLoopBase::RunWork()" is __ZN4base24MessagePumpCFRunLoopBase7RunWorkEv in mangled.

Crashing binary:

% nm -n base_unittests | grep -E 'ZThn8_N4base16sequence_manager8internal35ThreadControllerWithMessagePumpImpl6DoWorkEv|ZN4base24MessagePumpCFRunLoopBase7RunWorkEv'

000000010000e3a8 t ZThn8_N4base16sequence_manager8internal35ThreadControllerWithMessagePumpImpl6DoWorkEv 00000001000182dc t ZN4base24MessagePumpCFRunLoopBase7RunWorkEv

Working binary:

thakis@M1BP gn % nm -n base_unittests | grep -E 'ZThn8_N4base16sequence_manager8internal35ThreadControllerWithMessagePumpImpl6DoWorkEv|ZN4base24MessagePumpCFRunLoopBase7RunWorkEv'

000000010000e0ac t ZThn8_N4base16sequence_manager8internal35ThreadControllerWithMessagePumpImpl6DoWorkEv 00000001000182dc t ZN4base24MessagePumpCFRunLoopBase7RunWorkEv

Reminder form an earlier comment:

In the crashing case, the last entry before the hole is

funcOffset=0x00000E7C (no frame, no saved registers) __ZN4base8internal7InvokerINS0_9BindStateIPFvRKZ4mainE3$_0EJS3_EEEFvvEE3RunEPNS0_13BindStateBaseE

And the first after it is

funcOffset=0x00017960 (std frame: x19/20 x21/22 x23/24) __ZN4base24MessagePumpCFRunLoopBase3RunEPNS_11MessagePump8DelegateE

In the non-crashing cash, the last entry before the hole is

funcOffset=0x00000B90 (std frame:) __ZN4base3mac15CallWithEHFrameEU13block_pointerFvvE

And the first after it is

funcOffset=0x00017670 (std frame: x19/20) _main

So __ZThn8_N4base16sequence_manager8internal35ThreadControllerWithMessagePumpImpl6DoWorkEv is also in that hole. So I guess that's the function that has the wrong unwind info, but it usually picks up the right unwind info from a different symbol, even with ld64, since it's never appears in unwinddump output, even when things work.

Quuxplusone commented 3 years ago

__ZThn8_N4base16sequence_manager8internal35ThreadControllerWithMessagePumpImpl6DoWorkEv is in obj/base/base_unittests/jumbo_ii.o which doesn't have any unwind info, likely due to it being a cc file built with -fno-exceptions.

Quuxplusone commented 3 years ago

Here's a tiny stand-alone repro based on this theory. It successfully reproduces the issue :)

The idea is to put a "no frame, no saved registers" entry in file0.o, to not have any entries in file1.o (due to using -fno-exceptions, which omits the entries, which is supposed to give everything the default entry), and then put something from file1.o on the stack when calling the unwinder (which I called "call").

The unwinder will look for the entry for "call", but since lld doesn't write one, will find the entry at the highest lower address, which is the entry from file0.o which is "no frame, no registers" -- but "call" has a default frame. ...at least that's what I thought, but now that I compare unwinddump output for the binary linked by lld and ld64, the difference is that lld writes

encodingsCount=0x00000000 [ 0] encoding[2]=0x02000000 (no frame, no saved registers) Z5dummyv [ 1] encoding[0]=0x04000000 (std frame:) Z3foov [ 2] encoding[1]=0x02001000 (stack size=16:) __ZL7handlerP11objc_objectPv [ 3] encoding[0]=0x04000000 (std frame:) _main

while ld64 writes

encodingsCount=0x00000003 [ 0] encoding[3]=0x02000000 (no frame, no saved registers) Z5dummyv [ 1] encoding[2]=0x00000000 (no frame, no saved registers) Z4callPFvvE [ 2] encoding[0]=0x04000000 (std frame:) Z3foov [ 3] encoding[1]=0x02001000 (stack size=16:) ZL7handlerP11objc_objectPv [ 4] encoding[0]=0x04000000 (std frame:) _main

So ld64 does have an explicit entry for __Z4callPFvvE but it's also "no frame, no saved registers". The encoding is 0x00000000 instead of 0x02000000 which I suppose is the difference that matters (?)

Repro:

thakis@M1BP gn % cat file0.mm
void dummy() {}

thakis@M1BP gn % cat file1.cc void call(void(*f)()) { f(); }

thakis@M1BP gn % cat file2.mm

include <objc/objc-exception.h>

void call(void(*)());

static void handler(id a, void* b) {}

void foo() { // Anything that calls unw_getcontext/unw_init_local/unw_step should do, // objc_addExceptionHandler() does that, so calling that for convenience. objc_addExceptionHandler(&handler, 0); }

int main() { call(foo); }

thakis@M1BP gn % clang -c file0.mm file2.mm thakis@M1BP gn % clang -c -fno-exceptions file1.cc

thakis@M1BP gn % ../../third_party/llvm-build/Release+Asserts/bin/clang -fuse-ld=lld file0.o file1.o file2.o -isysroot $(xcrun -show-sdk-path) -lobjc thakis@M1BP gn % ./a.out zsh: trace trap ./a.out

thakis@M1BP gn % ../../third_party/llvm-build/Release+Asserts/bin/clang file0.o file1.o file2.o -isysroot $(xcrun -show-sdk-path) -lobjc
thakis@M1BP gn % ./a.out # Works with ld64

Quuxplusone commented 3 years ago

Nice find! I guess we could deal with this by having cuPtrVector include dummy entries for functions without unwind info, right before the sort+fold step here: https://github.com/llvm/llvm-project/blob/main/lld/MachO/UnwindInfoSection.cpp#L297

FWIW, compiling your small repro example for x86_64 generates nonzero (and identical!) unwind info for all 4 functions, including __Z4callPFvvE.

Quuxplusone commented 3 years ago

If you build the cc file with -fomit-frame-pointer, you can get that file to not include any unwind info. Locally I've also hacked up things to not emit eh_frame on x86_64 either [1], not sure that makes a difference. With that, bt in lldb doesn't show call, but that's with both lld and ld64. I guess things are better on x86_64 because everything always unwinds off rbp if it unwinds, at least for clang-generated object files (?)

It's a bit surprising that ld64 writes an encoding of 0x00000000; as far a I can tell that's not a valid encoding. I would expect that libunwind/src/CompactUnwinder.hpp,

CompactUnwinder_arm64::stepWithCompactEncoding

would hit the

_LIBUNWIND_ABORT("invalid compact unwind encoding");

but that's not what's happening. So I think I don't fully understand things yet.

[1]:

% git diff diff --git a/llvm/lib/MC/MCObjectFileInfo.cpp b/llvm/lib/MC/MCObjectFileInfo.cpp index 99526b384c6e..b4a53f9225f1 100644 --- a/llvm/lib/MC/MCObjectFileInfo.cpp +++ b/llvm/lib/MC/MCObjectFileInfo.cpp @@ -57,10 +57,11 @@ void MCObjectFileInfo::initMachOMCObjectFileInfo(const Triple &T) { MachO::S_ATTR_STRIP_STATIC_SYMS | MachO::S_ATTR_LIVE_SUPPORT, SectionKind::getReadOnly());

  • if (T.isOSDarwin()) {
  • if (T.isOSDarwin() && (T.getArch() != Triple::x86)) SupportsCompactUnwindWithoutEHFrame = true;
  • if (T.isWatchABI() || T.isOSDarwin() && T.getArch() != Triple::x86) OmitDwarfIfHaveCompactUnwind = true;
  • }

    FDECFIEncoding = dwarf::DW_EH_PE_pcrel;

we could deal with this by having cuPtrVector include dummy entries for functions without unwind info, right before the sort+fold step here:

We also need to be able to identify all functions without unwind info somehow, right?

Quuxplusone commented 3 years ago

Looks like UnwindCursor<A, R>::getInfoFromCompactEncodingSection() copies the encoding to _info.format and that being 0 causes _unwindInfoMissing to become true. So an encoding of 0 is a special case.

(However, it looks like _unwindInfoMissing true causes step() to return UNW_STEP_END which is a bit strange too?)

lldb also special-cases encoding of 0 here: https://github.com/llvm/llvm-project/blob/main/lldb/source/Symbol/CompactUnwindInfo.cpp#L180

But lldb manages to unwind fine past call() somehow…

Quuxplusone commented 3 years ago

We also need to be able to identify all functions without unwind info somehow, right?

Yeah I was thinking that we would scan the compactUnwindSection once to gather all function pointers, then scan each InputFile for symbols in __text that don't have a corresponding entry. Not exactly cheap but I guess it's unavoidable

Quuxplusone commented 3 years ago

(In reply to Jez Ng from comment #26)

Yeah I was thinking that we would scan the compactUnwindSection once to gather all function pointers, then scan each InputFile for symbols in __text that don't have a corresponding entry. Not exactly cheap but I guess it's unavoidable

It might be cheaper to give each Defined an associated unwind info range (when parsing the .o file this is cheap-ish: just walk the sorted unwind info in parallel) and then write the whole __unwind_info section based on that like ld64 does. That would also implement only writing unwind stuff for non-coalesced symbols as a side effect.

Not sure if that's better or worse.

The __eh_frame / __unwind_info stuff doesn't fit terribly neatly into the normal linker design :/

Quuxplusone commented 3 years ago

gkm said he's interested in working on the actual fix, so assigning to him. (Feel free to give back to me if that should change :) )

Quuxplusone commented 3 years ago

We have another crash bug in libunwind, this one in x86_64: https://bugs.chromium.org/p/chromium/issues/detail?id=1220175

Chances are it's due to this bug too, but I haven't verified that yet.

Quuxplusone commented 3 years ago

Attached tmp.diff (2740 bytes, text/plain): patch sketch

Quuxplusone commented 3 years ago

Uploaded that as a stop gap to here: https://reviews.llvm.org/D104681