NativeAOT: Silent unwind info corruption

filipnavara commented 1 year ago

On macOS the unwind information is stored as the compact unwinding encoding and the DWARF EH encoding. The compact unwinding serves as a lookup table to the DWARF section (if the whole unwinding cannot be expressed using compact code, which NativeAOT doesn't currently produce). The "hint offset" into the DWARF table is 24-bit on both ARM64 and x64. Turns out, if the offset is longer, then it gets silently truncated and results in incorrect pointers into the DWARF section. This in turn results in unwinding not working properly and app freeze due to live lock between stuck FindMethodInfo and GC suspensions.

Example stack trace:

  * frame #0: 0x0000000100120694 eM Client`libunwind::CFI_Parser<libunwind::LocalAddressSpace>::parseCIE(libunwind::LocalAddressSpace&, unsigned long, libunwind::CFI_Parser<libunwind::LocalAddressSpace>::CIE_Info*) at AddressSpace.hpp:0 [opt]
    frame #1: 0x0000000100120684 eM Client`libunwind::CFI_Parser<libunwind::LocalAddressSpace>::parseCIE(addressSpace=0x000000010e9c57f8, cie=4460851648, cieInfo=0x0000000175a21ad8) at DwarfParser.hpp:371:5 [opt]
    frame #2: 0x000000010012294c eM Client`libunwind::CFI_Parser<libunwind::LocalAddressSpace>::findFDE(addressSpace=0x000000010e9c57f8, pc=4349025824, ehSectionStart=4460851648, sectionLength=<unavailable>, fdeHint=<unavailable>, fdeInfo=0x0000000175a21b10, cieInfo=0x0000000175a21ad8) at DwarfParser.hpp:265:13 [opt]
    frame #3: 0x000000010011ea18 eM Client`libunwind::UnwindCursor<libunwind::LocalAddressSpace, libunwind::Registers_arm64>::getInfoFromDwarfSection(this=0x0000000175a221c0, pc=4349025824, sects=0x00006000017b85d0, fdeSectionOffsetHint=851328) at UnwindCursor.hpp:1693:16 [opt]
    frame #4: 0x000000010011e508 eM Client`UnwindHelpers::GetUnwindProcInfo(pc=4349025824, uwInfoSections=0x00006000017b85d0, procInfo=0x0000000175a22578) at UnwindHelpers.cpp:867:22 [opt]
    frame #5: 0x0000000100126670 eM Client`UnixNativeCodeManager::FindMethodInfo(this=<unavailable>, ControlPC=<unavailable>, pMethodInfoOut=0x0000000175a225e0) at UnixNativeCodeManager.cpp:89:10 [opt]
    frame #6: 0x0000000100127020 eM Client`UnixNativeCodeManager::GetAssociatedData(this=<unavailable>, ControlPC=<unavailable>) at UnixNativeCodeManager.cpp:916:10 [opt]
    frame #7: 0x00000001000cedf8 eM Client`RuntimeInstance::GetTargetOfUnboxingAndInstantiatingStub(this=<unavailable>, ControlPC=<unavailable>) at RuntimeInstance.cpp:119:52 [opt]
    frame #8: 0x0000000101f807b0 eM Client`S_P_Reflection_Execution_Internal_Reflection_Execution_ExecutionEnvironmentImplementation__ComputeLdftnReverseLookup_InvokeMap + 560
    frame #9: 0x0000000101f80134 eM Client`S_P_Reflection_Execution_Internal_Reflection_Execution_ExecutionEnvironmentImplementation__GetLdFtnReverseLookups_Helper + 532
    frame #10: 0x0000000101f802e4 eM Client`S_P_Reflection_Execution_Internal_Reflection_Execution_ExecutionEnvironmentImplementation__TryGetMethodForOriginalLdFtnResult + 164
    frame #11: 0x0000000101f7cbb4 eM Client`S_P_Reflection_Execution_Internal_Reflection_Extensions_NonPortable_DelegateMethodInfoRetriever__GetDelegateMethodInfo + 388
    frame #12: 0x0000000104d5e828 eM Client`MailClient_Accounts_MailClient_Utils_EventHandlerUtils__MakeWeak<System___Canon> + 56
    frame #13: 0x0000000100db89d4 eM Client`MailClient_Accounts_MailClient_Storage_Application_Folder__RegisterPropertyChangedWeakHandler + 148
    frame #14: 0x0000000100dacf44 eM Client`MailClient_Accounts_MailClient_Storage_Application_Folder___ctor + 676
    frame #15: 0x0000000100df201c eM Client`MailClient_Accounts_MailClient_Accounts_AccountFolderCache__GetFolder_0 + 172
    frame #16: 0x0000000100df1efc eM Client`MailClient_Accounts_MailClient_Accounts_AccountFolderCache__GetFolder + 684
    frame #17: 0x0000000100e0165c eM Client`MailClient_Accounts_MailClient_Accounts_BindingAccountBase__InitializeStorage + 252
    frame #18: 0x0000000100e15160 eM Client`MailClient_Accounts_MailClient_Accounts_Mail_MailAccount__InitializeStorage + 64
    frame #19: 0x0000000100e014d8 eM Client`MailClient_Accounts_MailClient_Accounts_BindingAccountBase___ctor + 472
    frame #20: 0x0000000100df6e54 eM Client`MailClient_Accounts_MailClient_Accounts_AccountManager__get_FallbackMailAccount + 324
    frame #21: 0x0000000100df7c74 eM Client`MailClient_Accounts_MailClient_Accounts_AccountManager___ctor + 948
    frame #22: 0x00000001002e33c0 eM Client`eM_Client_MailClient_Program__InitOnBackground + 4256
    frame #23: 0x00000001002ec90c eM Client`eM_Client_MailClient_Program___c___RunInitOnBackground_b__200_0 + 44
    frame #24: 0x0000000101285964 eM Client`S_P_CoreLib_System_Threading_ExecutionContext__RunFromThreadPoolDispatchLoop + 68
    frame #25: 0x0000000101293cf4 eM Client`S_P_CoreLib_System_Threading_Tasks_Task__ExecuteWithThreadLocal + 228
    frame #26: 0x000000010128c0f0 eM Client`S_P_CoreLib_System_Threading_ThreadPoolWorkQueue__DispatchItemWithAutoreleasePool + 96
    frame #27: 0x000000010128bdf0 eM Client`S_P_CoreLib_System_Threading_ThreadPoolWorkQueue__Dispatch + 752
    frame #28: 0x000000010135f594 eM Client`S_P_CoreLib_System_Threading_PortableThreadPool_WorkerThread__WorkerThreadStart + 244
    frame #29: 0x00000001012825f8 eM Client`S_P_CoreLib_System_Threading_Thread__StartThread + 376
    frame #30: 0x0000000101282b10 eM Client`S_P_CoreLib_System_Threading_Thread__ThreadEntryPoint + 32
    frame #31: 0x000000018c367034 libsystem_pthread.dylib`_pthread_start + 136

The fdeSectionOffsetHint=851328 is 0xCFD80. The DWARF dump is a bit too big too upload but 0xCFD80 points into a middle of a record. There is, however, a start of record at 0x10CFD80 and it matches the PC 0x10338DE20 from the stack trace:

010cfd80 0000002c 010cfd84 FDE cie=00000000 pc=10338de20...10338de4c
  Format:       DWARF32
  LSDA Address: 000000010c0b4df8
  DW_CFA_advance_loc: 4
  DW_CFA_def_cfa_offset: +16
  DW_CFA_offset: W29 -16
  DW_CFA_offset: W30 -8
  DW_CFA_advance_loc: 4
  DW_CFA_def_cfa_register: W29
  DW_CFA_nop:
  DW_CFA_nop:
  DW_CFA_nop:
  DW_CFA_nop:
  DW_CFA_nop:

  0x10338de20: CFA=WSP
  0x10338de24: CFA=WSP+16: W29=[CFA-16], W30=[CFA-8]
  0x10338de28: CFA=W29+16: W29=[CFA-16], W30=[CFA-8]

ghost commented 1 year ago

Tagging subscribers to this area: @agocke, @MichalStrehovsky, @jkotas See info in area-owners.md if you want to be subscribed.

Issue Details

On macOS the unwind information is stored as the compact unwinding encoding and the DWARF EH encoding. The compact unwinding serves as a lookup table to the DWARF section (if the whole unwinding cannot be expressed using compact code, which NativeAOT doesn't currently produce). The "hint offset" into the DWARF table is 24-bit on both ARM64 and x64. Turns out, if the offset is longer, then it gets silently truncated and results in incorrect pointers into the DWARF section. This in turn results in unwinding not working properly and app freeze due to live lock between stuck `FindMethodInfo` and GC suspensions. Example stack trace: ``` * frame #0: 0x0000000100120694 eM Client`libunwind::CFI_Parser::parseCIE(libunwind::LocalAddressSpace&, unsigned long, libunwind::CFI_Parser::CIE_Info*) at AddressSpace.hpp:0 [opt] frame #1: 0x0000000100120684 eM Client`libunwind::CFI_Parser::parseCIE(addressSpace=0x000000010e9c57f8, cie=4460851648, cieInfo=0x0000000175a21ad8) at DwarfParser.hpp:371:5 [opt] frame #2: 0x000000010012294c eM Client`libunwind::CFI_Parser::findFDE(addressSpace=0x000000010e9c57f8, pc=4349025824, ehSectionStart=4460851648, sectionLength=, fdeHint=, fdeInfo=0x0000000175a21b10, cieInfo=0x0000000175a21ad8) at DwarfParser.hpp:265:13 [opt] frame #3: 0x000000010011ea18 eM Client`libunwind::UnwindCursor::getInfoFromDwarfSection(this=0x0000000175a221c0, pc=4349025824, sects=0x00006000017b85d0, fdeSectionOffsetHint=851328) at UnwindCursor.hpp:1693:16 [opt] frame #4: 0x000000010011e508 eM Client`UnwindHelpers::GetUnwindProcInfo(pc=4349025824, uwInfoSections=0x00006000017b85d0, procInfo=0x0000000175a22578) at UnwindHelpers.cpp:867:22 [opt] frame #5: 0x0000000100126670 eM Client`UnixNativeCodeManager::FindMethodInfo(this=, ControlPC=, pMethodInfoOut=0x0000000175a225e0) at UnixNativeCodeManager.cpp:89:10 [opt] frame #6: 0x0000000100127020 eM Client`UnixNativeCodeManager::GetAssociatedData(this=, ControlPC=) at UnixNativeCodeManager.cpp:916:10 [opt] frame #7: 0x00000001000cedf8 eM Client`RuntimeInstance::GetTargetOfUnboxingAndInstantiatingStub(this=, ControlPC=) at RuntimeInstance.cpp:119:52 [opt] frame #8: 0x0000000101f807b0 eM Client`S_P_Reflection_Execution_Internal_Reflection_Execution_ExecutionEnvironmentImplementation__ComputeLdftnReverseLookup_InvokeMap + 560 frame #9: 0x0000000101f80134 eM Client`S_P_Reflection_Execution_Internal_Reflection_Execution_ExecutionEnvironmentImplementation__GetLdFtnReverseLookups_Helper + 532 frame #10: 0x0000000101f802e4 eM Client`S_P_Reflection_Execution_Internal_Reflection_Execution_ExecutionEnvironmentImplementation__TryGetMethodForOriginalLdFtnResult + 164 frame #11: 0x0000000101f7cbb4 eM Client`S_P_Reflection_Execution_Internal_Reflection_Extensions_NonPortable_DelegateMethodInfoRetriever__GetDelegateMethodInfo + 388 frame #12: 0x0000000104d5e828 eM Client`MailClient_Accounts_MailClient_Utils_EventHandlerUtils__MakeWeak + 56 frame #13: 0x0000000100db89d4 eM Client`MailClient_Accounts_MailClient_Storage_Application_Folder__RegisterPropertyChangedWeakHandler + 148 frame #14: 0x0000000100dacf44 eM Client`MailClient_Accounts_MailClient_Storage_Application_Folder___ctor + 676 frame #15: 0x0000000100df201c eM Client`MailClient_Accounts_MailClient_Accounts_AccountFolderCache__GetFolder_0 + 172 frame #16: 0x0000000100df1efc eM Client`MailClient_Accounts_MailClient_Accounts_AccountFolderCache__GetFolder + 684 frame #17: 0x0000000100e0165c eM Client`MailClient_Accounts_MailClient_Accounts_BindingAccountBase__InitializeStorage + 252 frame #18: 0x0000000100e15160 eM Client`MailClient_Accounts_MailClient_Accounts_Mail_MailAccount__InitializeStorage + 64 frame #19: 0x0000000100e014d8 eM Client`MailClient_Accounts_MailClient_Accounts_BindingAccountBase___ctor + 472 frame #20: 0x0000000100df6e54 eM Client`MailClient_Accounts_MailClient_Accounts_AccountManager__get_FallbackMailAccount + 324 frame #21: 0x0000000100df7c74 eM Client`MailClient_Accounts_MailClient_Accounts_AccountManager___ctor + 948 frame #22: 0x00000001002e33c0 eM Client`eM_Client_MailClient_Program__InitOnBackground + 4256 frame #23: 0x00000001002ec90c eM Client`eM_Client_MailClient_Program___c___RunInitOnBackground_b__200_0 + 44 frame #24: 0x0000000101285964 eM Client`S_P_CoreLib_System_Threading_ExecutionContext__RunFromThreadPoolDispatchLoop + 68 frame #25: 0x0000000101293cf4 eM Client`S_P_CoreLib_System_Threading_Tasks_Task__ExecuteWithThreadLocal + 228 frame #26: 0x000000010128c0f0 eM Client`S_P_CoreLib_System_Threading_ThreadPoolWorkQueue__DispatchItemWithAutoreleasePool + 96 frame #27: 0x000000010128bdf0 eM Client`S_P_CoreLib_System_Threading_ThreadPoolWorkQueue__Dispatch + 752 frame #28: 0x000000010135f594 eM Client`S_P_CoreLib_System_Threading_PortableThreadPool_WorkerThread__WorkerThreadStart + 244 frame #29: 0x00000001012825f8 eM Client`S_P_CoreLib_System_Threading_Thread__StartThread + 376 frame #30: 0x0000000101282b10 eM Client`S_P_CoreLib_System_Threading_Thread__ThreadEntryPoint + 32 frame #31: 0x000000018c367034 libsystem_pthread.dylib`_pthread_start + 136 ``` The `fdeSectionOffsetHint=851328` is `0xCFD80`. The DWARF dump is a bit too big too upload but 0xCFD80 points into a middle of a record. There is, however, a start of record at 0x10CFD80 and it matches the PC 0x10338DE20 from the stack trace: ``` 010cfd80 0000002c 010cfd84 FDE cie=00000000 pc=10338de20...10338de4c Format: DWARF32 LSDA Address: 000000010c0b4df8 DW_CFA_advance_loc: 4 DW_CFA_def_cfa_offset: +16 DW_CFA_offset: W29 -16 DW_CFA_offset: W30 -8 DW_CFA_advance_loc: 4 DW_CFA_def_cfa_register: W29 DW_CFA_nop: DW_CFA_nop: DW_CFA_nop: DW_CFA_nop: DW_CFA_nop: 0x10338de20: CFA=WSP 0x10338de24: CFA=WSP+16: W29=[CFA-16], W30=[CFA-8] 0x10338de28: CFA=W29+16: W29=[CFA-16], W30=[CFA-8] ```

Author:	filipnavara
Assignees:	-
Labels:	`os-mac-os-x`, `area-NativeAOT-coreclr`
Milestone:	-

filipnavara commented 1 year ago

I guess we have three options how to deal with it:

Short-term: Bail out with error if the DWARF section is too big (> 0xffffff bytes). This is tricky because the problem is the size of the section in the linked image, not the ILCompiler output .o file.
Check again if we can generate compact unwind codes instead of DWARF for the common cases: https://github.com/dotnet/runtime/issues/76371
Build a different index (eg. Linux .eh_hdr format) if the data are too big, and consume it.

jkotas commented 1 year ago

Another option is to compensate for this issue in the libunwind implementation. We can loop over all candidates that match the hint.

filipnavara commented 1 year ago

Another option is to compensate for this issue in the libunwind implementation. We can loop over all candidates that match the hint.

I don't think that's possible. The offset points into middle of a stream so it's essentially decoding garbage. Sometimes the garbage can make sense, sometimes not, but it's not easy to tell whether it's a false hit.

jkotas commented 1 year ago

Can we build our own hint table from the broken hint table? Something like:

If the dwarf stream is more than 16MB:

Create copy of the hint table, with 32 offsets
Scan the hint table and fill in the top 8-bit offsets. We should be able to tell that where the offsets wrap around
Use this private hint table for lookups

This would fix our unwinder, but it would not fix other unwinders. For example, I would expect C++ EH to be still broken.

filipnavara commented 1 year ago

We can probably build our own hint table from scratch during compilation. It needs to cover only "managed code" section and hence doesn't need much of a linker input, as long as the DWARF section is preserved in one piece (I think it is).

Reconstructing the hints at runtime from the linker output may be possible but at that point you get a penalty similar to not using the hints at all and just creating the cache by sequentially reading the DWARF section. That's incredibly slow even on small executables though.

This would fix our unwinder, but it would not fix other unwinders. For example, I would expect C++ EH to be still broken.

That's a fair point. I didn't consider other unwinders. If we want other unwinders to work then we basically have to either 1) generate compact unwinding codes where possible (needs codegen changes), 2) implement "compression" for DWARF by identifying common prolog sequences and sharing their code (possible in the DWARF format but difficult to implement), 3) fix it the linker output to not use the hints when they overflow (the slow lookup is too slow for NativeAOT purposes thought). I didn't check what the new Apple linker (ld-prime) produces in this case.

jkotas commented 1 year ago

Reconstructing the hints at runtime from the linker output may be possible but at that point you get a penalty similar to not using the hints

The hint table is not big and the reconstructed hint table can be cached. I think the penalty would be fairly small.

filipnavara commented 1 year ago

The hint table is not big and the reconstructed hint table can be cached.

In the executable from OP the size of __unwind_info (compact unwinding table) is 0x89beb8 bytes. The size of __eh_frame is 0x1bc0e28. So, 9Mb for the hint table may not seem like much but it's definitely going to be noticeable. If it was done lazily then you risk running it during thread hijacking on GC suspend. That would almost certainly take long enough to cause live locks when the threads get hammered with the "suspend all thread hijack" logic.

filipnavara commented 1 year ago

On second thought I don't think it's even reliably possible to reconstruct the DWARF offsets solely from __unwind_info since it's sorted, and hence it's not guaranteed that the DWARF offsets are in order. I suspect that in this case they would be, but it feels fragile.

filipnavara commented 1 year ago

There’s potentially an easy win in terms of the ARM64 DWARF size with folding the extremely common sequences into a DWARF CIE and referencing that. That’s a variation of the “DWARF compression” strategy mentioned earlier, just restricted to specific known sequences.

For example, the prolog for frame with no callee saved registers (except LR and FP) is this:

  DW_CFA_advance_loc: 4
  DW_CFA_def_cfa_offset: +16
  DW_CFA_offset: W29 -16
  DW_CFA_offset: W30 -8
  DW_CFA_advance_loc: 4
  DW_CFA_def_cfa_register: W29
  DW_CFA_nop:
  DW_CFA_nop:
  DW_CFA_nop:
  DW_CFA_nop:
  DW_CFA_nop:

The code looks like this:

stp x29, x30, [sp, #-10]!
mov x29, sp
…
ldr x29, x30, [sp], #0x10
ret

This repeats 20000+ times in the OP executable. Besides being foldable in the DWARF codes it’s also likely expressible as compact unwind code with no codegen changes. We would still need to implement special prolog treatment for asynchronous unwinding with the compact unwind codes though, so the DWARF way could be easier (and benefit other platforms too).

filipnavara commented 1 year ago

I tried to replace the empty frame DWARF sequence with compact unwinding and it saves 32% of the DWARF section size for this particular executable. Similar savings are present for empty iOS app from template (dotnet new ios). It's not enough to push the DWARF size below the problematic size but it's significant enough that it may be an option worth exploring.

dotnet / runtime

NativeAOT: Silent unwind info corruption #88292