dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.44k stars 4.76k forks source link

[MIPS64] apply_reg_state: ip and cfa unchanged; stopping here #33335

Closed xiangzhai closed 4 years ago

xiangzhai commented 4 years ago

Hi,

Testcase: baseservices/compilerservices/RuntimeHelpers/ExecuteCodeWithGuaranteedCleanup/ExecuteCodeWithGuaranteedCleanup.exe

MIPS64 thrown such error: apply_reg_state: ip and cfa unchanged; stopping here (ip=0xfff6c20098)

Then PAL_VirtualUnwind will set PC to ZERO because curPc unchanged for native code (here is not Managed code, and the testcases for Manged code exception handling are PASSED) because PC = 0xfff6c20098:

(gdb) x/22i 0xfff6c20098-44
   0xfff6c2006c <CallDescrWorkerInternal+108>:  ld      t0,16(s0)
   0xfff6c20070 <CallDescrWorkerInternal+112>:  ld      t9,40(s0)
   0xfff6c20074 <CallDescrWorkerInternal+116>:  ld      a0,0(t0)
   0xfff6c20078 <CallDescrWorkerInternal+120>:  ld      a1,8(t0)
   0xfff6c2007c <CallDescrWorkerInternal+124>:  ld      a2,16(t0)
   0xfff6c20080 <CallDescrWorkerInternal+128>:  ld      a3,24(t0)
   0xfff6c20084 <CallDescrWorkerInternal+132>:  ld      a4,32(t0)
   0xfff6c20088 <CallDescrWorkerInternal+136>:  ld      a5,40(t0)
   0xfff6c2008c <CallDescrWorkerInternal+140>:  ld      a6,48(t0)
   0xfff6c20090 <CallDescrWorkerInternal+144>:  jalr    t9
   0xfff6c20094 <CallDescrWorkerInternal+148>:  ld      a7,56(t0)
=> 0xfff6c20098 <CallDescrWorkerInternal+152>:  lw      t1,32(s0)
   0xfff6c2009c <CallDescrWorkerInternal+156>:  beqz    t1,0xfff6c200e4 <CallDescrWorkerInternal+228>
   0xfff6c200a0 <CallDescrWorkerInternal+160>:  nop
   0xfff6c200a4 <CallDescrWorkerInternal+164>:  li      at,0x4
   0xfff6c200a8 <CallDescrWorkerInternal+168>:  beq     at,t1,0xfff6c200bc <CallDescrWorkerInternal+188>
   0xfff6c200ac <CallDescrWorkerInternal+172>:  nop
   0xfff6c200b0 <CallDescrWorkerInternal+176>:  li      t0,0x8
   0xfff6c200b4 <CallDescrWorkerInternal+180>:  bne     t0,t1,0xfff6c200c4 <CallDescrWorkerInternal+196>
   0xfff6c200b8 <CallDescrWorkerInternal+184>:  nop
   0xfff6c200bc <CallDescrWorkerInternal+188>:  b       0xfff6c200ec <CallDescrWorkerInternal+236>
   0xfff6c200c0 <CallDescrWorkerInternal+192>:  sdc1    $f0,56(s0)

As @janvorli suggested, MIPS64's log:

TID 0803: InitializeExceptionHandling(): ExceptionTracker size: 0x188 bytes
TID 0803: TrackerAllocator::Init() succeeded..
TID 0803: SetupThread  managed Thread 000000012CE33B90 Thread Id = 1
TID 0803: Debugger Thread spinning up
TID 080b: SetupThread  managed Thread 000000012CE4DBE0 Thread Id = 2
TID 0803: ******* MANAGED EXCEPTION THROWN: Object thrown: 000000FF4C011818 MT 000000FF71F14858T rethrow 0
TID 0803: Exception HRESULT = 0x80131500 Message String 0x000000FF4C011898 (db will display) InnerException 0000000000000000 MT 0000000000000000T
TID 0803: in Thread::SetLastThrownObject: obj = 000000FF4C011818
TID 0803: Processing exception at establisher=000000FFFF97FE90, ip=000000FF71EB69C4 disp->cxr: 000000FFFF97F8C0, sp: 000000FFFF97FE90, cxr @ exception: 000000012CF2FDA0
TID 0803: ..................................................................................
TID 0803: ProcessCLRException enter, sp = 0x000000FFFF97FE90, ControlPc = 0x000000FF71EB69C4
TID 0803: >>exr: 000000012CF2FFD0, code: e0434352, addr: 000000FF71EB69C4, flags: 0x00
TID 0803: >>NEW exception
TID 0803: TrackerAllocator: allocating tracker 0x000000012CE31A80, thread = 0x000000012CE33B90
TID 0803: ___________________________________________
TID 0803: creating new tracker object 0x000000012CE31A80, thread = 0x000000012CE33B90
TID 0803: StackTraceInfo::AllocateStackTrace (000000012CE31AC8)
TID 0803: CEHelper::SetupCorruptionSeverityForActiveException - Marked non-rethrow/non-nested exception as NotCorrupting.
TID 0803: CEHelper::SetupCorruptionSeverityForActiveException - Copied the corruption severity (2) to ThreadExceptionState.
TID 0803: ..ExceptionTracker::InitializeCurrentContextForCrawlFrame: DispatcherContext->ControlPC = 000000FF71EB69C4; IP in DispatcherContext->ContextRecord = 000000FF71EB6864.
TID 0803: ..ProcessCrawlFrame: PSP:  000000ff`ff97fe90  EstablisherFrame:  000000ff`ff97fe90
TID 0803: ..  A:|00| 000000012CE31A80: (FFFFFFFFFFFFFFFF 0000000000000000) 1st pass
TID 0803: ..  C:|00| 000000012CE31A80: (000000FFFF97FE90 000000FFFF97FE90) 1st pass
TID 0803: ..  [ ProcessExplicitFrame: pFrame:  000000ff`ff97fd20  pMD:  00000000`00000000  FIRST PASS ]
TID 0803: ..ExceptionTracker::InitializeCurrentContextForCrawlFrame: DispatcherContext->ControlPC = 000000FF71EB69C4; IP in DispatcherContext->ContextRecord = 000000FF71EB6864.
TID 0803: ..  [ ProcessManagedCallFrame this=000000012CE31A80, FIRST PASS ]
TID 0803: ..  [ method: g, GCD.GCD ]
TID 0803: ..  | uMethodStartPC: 000000FF71EB68A0, ControlPc at offset 120
TID 0803: CEHelper::CanMethodHandleException - Processing CorruptionSeverity: 2.
TID 0803: StackTraceInfo::SaveStackTrace (000000012CE31AC8), alloc = 1, replace = 1, skiplast = 0
TID 0803: StackTraceInfo::ClearStackTrace (000000012CE31AC8)
TID 0803: ..returning ExceptionContinueSearch
TID 0803: Processing exception at establisher=000000FFFF97FEB0, ip=000000FF71EB6864 disp->cxr: 000000FFFF97F8C0, sp: 000000FFFF97FEB0, cxr @ exception: 000000012CF2FDA0
TID 0803: ....................................................................................
TID 0803: ..ProcessCLRException enter, sp = 0x000000FFFF97FEB0, ControlPc = 0x000000FF71EB6864
TID 0803: ..>>exr: 000000012CF2FFD0, code: e0434352, addr: 000000FF71EB69C4, flags: 0x00
TID 0803: ..>>continued processing of PREVIOUS exception
TID 0803: CEHelper::SetupCorruptionSeverityForActiveException - Current tracker already has the corruption severity set.
TID 0803: ..ExceptionTracker::InitializeCurrentContextForCrawlFrame: DispatcherContext->ControlPC = 000000FF71EB6864; IP in DispatcherContext->ContextRecord = 000000FFEC1DCCC8.
TID 0803: ..ProcessCrawlFrame: PSP:  000000ff`ff97feb0  EstablisherFrame:  000000ff`ff97feb0
TID 0803: ..  A:|00| 000000012CE31A80: (000000FFFF97FE90 000000FFFF97FE90) 1st pass
TID 0803: ..  C:|00| 000000012CE31A80: (000000FFFF97FE90 000000FFFF97FEB0) 1st pass
TID 0803: ..  [ ProcessManagedCallFrame this=000000012CE31A80, FIRST PASS ]
TID 0803: ..  [ method: TryCode0, GCD.GCD ]
TID 0803: ..  | uMethodStartPC: 000000FF71EB6790, ControlPc at offset d0
TID 0803: CEHelper::CanMethodHandleException - Processing CorruptionSeverity: 2.
TID 0803: StackTraceInfo::SaveStackTrace (000000012CE31AC8), alloc = 1, replace = 0, skiplast = 0
TID 0803: StackTraceInfo::ClearStackTrace (000000012CE31AC8)
TID 0803: ..returning ExceptionContinueSearch

I also followed @sdmaclea 's bulk of the work about ARM64/Unix patchset, but we are not sure, for example:

CFI directives for MIPS64 is not implement yet, so I just on purpose commented the .cfi_XXX for ARM64 to see whether or not ARM64 works, but clang failed to build for ARM64:

clang -cc1as: fatal error: error in backend: No open frame

UPDATEREG(Fp) failed to work for MIPS64, so we just commented it.

I argue that it is mismatch commit log message? And we are not sure about the .cfi_personality 0x1b, C_FUNC(\Handler) // 0x1b == DW_EH_PE_pcrel | DW_EH_PE_sdata4 for MIPS64.

And in the libunwind side, there is no cfa_reg_sp, cfa_reg_offset, fp_cfa_offset, ra_cfa_offset or sp_cfa_offset in MIPS's unw_tdep_frame_t.

Do we need to implement the CFI directives for CoreCLR and unw tdep frame for libunwind? Please give us some hint!

\cc @jashook @gkhanna79 @rahku @theaoqi @QiaoVanke

Thanks, Leslie Zhai

janvorli commented 4 years ago

CFI directives in the assembler helpers are necessary for unwinding stack frames of those methods so that the unw_step can walk the stack. If I look at what e.g. GCC generates for MIPS64, I can see it uses CFI directives, so I am not sure what you mean by "CFI directives for MIPS64 is not implement yet". Did you mean that they are not being handled by the libunwind itself?

Below is an example what I can see being generated by GCC for a simple function (viewed using the compiler explorer online tool), it contains many .cfi directives.

These directives are basically instructions to the native unwinder (libunwind in our case), telling it how to get register values at a caller site from register values at the current frame. For example, typical function prolog subtracts some value from SP to make space for the frame of a function, then stores a frame pointer somewhere into that space and set the frame pointer to point to that location. The CFI directives for such a function then describe e.g. where is the caller stack frame relative to the current frame's SP, which register is used as a frame pointer, where is a specific callee saved register stored etc.

void test()
{
    int a = 4;
    float w = 5.0;
}

  .cfi_startproc
  .set nomips16
  .set nomicromips
  .ent test()
  .type test(), @function
test():
  .frame $fp,32,$31 # vars= 16, regs= 1/0, args= 0, gp= 0
  .mask 0x40000000,-8
  .fmask 0x00000000,0
  .set noreorder
  .set nomacro
  daddiu $sp,$sp,-32
  .cfi_def_cfa_offset 32
  sd $fp,24($sp)
  .cfi_offset 30, -8
  move $fp,$sp
  .cfi_def_cfa_register 30
  lui $3,%hi(%neg(%gp_rel(test())))
  daddu $3,$3,$25
  daddiu $3,$3,%lo(%neg(%gp_rel(test())))
  .loc 1 9 0
  li $2,4 # 0x4
  sw $2,0($fp)
  .loc 1 10 0
  ld $2,%got_page(.LC0)($3)
  lwc1 $f0,%got_ofst(.LC0)($2)
  swc1 $f0,4($fp)
  .loc 1 11 0
  nop
  move $sp,$fp
  .cfi_def_cfa_register 29
  ld $fp,24($sp)
  daddiu $sp,$sp,32
  .cfi_restore 30
  .cfi_def_cfa_offset 0
  j $31
  nop

  .set macro
  .set reorder
  .end test()
  .cfi_endproc
.LFE0:
  .size test(), .-test()

Few examples on what the libunwind does in the function above:

  daddiu $sp,$sp,-32
  .cfi_def_cfa_offset 32

When a function is entered and we are at the first instruction, the SP is always the frame pointer. Our sample function makes space for the current frame by subtracting 32 from SP. If the unwinder wants to unwind to the caller when the current instruction pointer points right after the daddiu instruction, it needs to know how to get the value the SP had at the caller site (which is called CFA - canonical frame address). The CFI directive .cfi_def_cfa_offset 32 tells it that it needs to add 32 to the current value of the stack frame address (which is still in the SP). Without this directive, the unwinder would not know what to do.

It is then followed by these instructions:

  sd $fp,24($sp)
  .cfi_offset 30, -8
  move $fp,$sp
  .cfi_def_cfa_register 30

It stores FP at address SP+24 and since FP is a callee saved register, then the .cfi_offset 30, -8 defines where it was stored so that the unwinder can restore it from there. The first parameter of the offset is the number of the frame pointer register, the second is the offset where it is stored relative to the CFA. Since the CFA is at SP+32 and we've stored the FP at SP+24, the offset is -8. Finally, we change the frame pointer from SP to FP. So we copy SP to FP and then use .cfi_def_cfa_register 30 to tell the unwinder that from the address of the next instruction on, it should use FP to compute locations of registers stored in the frame and the SP at the caller site.

The personality routine describes a function that is called in case exception handling is walking the stack and reaches the frame of the assembler helper. Its responsibility is to decide what the exception handling should do at that point. On Unix, we only use a single kind of personality routine - the UnhandledExceptionHandlerUnix. It reports an unhandled exception if an exception handling reaches a frame of an assembler helper with that personality routine and aborts the process. The reason is that we don't allow propagating exceptions through assembler helpers. We set the personality routine only for functions that can possibly be reached by an exception. There are only four assembler helper functions that use it - UMThunkStub, TheUMEntryPrestub in the runtime and ExceptionHijack / FuncEvalHijack in the debugger libraries. The ones in the runtime are used at the edge where native code called managed code and ensures that exceptions thrown in the managed code don't get propagated to the native caller, since we know nothing about the native code and about its ability to handle exceptions. So if you don't put the personality routine there, it would only affect such cases, which are always bugs in the user code.

UPDATEREG(Fp) failed to work for MIPS64, so we just commented it

I believe you need to make that work. What was the problem you were hitting if it was not commented out?

xiangzhai commented 4 years ago

Hi @janvorli

Thanks for your teaching!

CFI directives for MIPS64 is not implement yet

Sorry for my poor English! We just commented the .cfi_XXX in the src/pal/inc/unixasmmacrosmips64.inc. What even worse we didn't use, for example, PROLOG_SAVE_REG_PAIR and sort of macro assembly (also commented the .cfi_XXX too) in the src/vm/mips64/asmhelpers.S and src/vm/mips64/calldescrworkermips64.S. So I think the root cause of apply_reg_state: ip and cfa unchanged; stopping here is CFI directives for MIPS64 unimplemented in the CoreCLR side. It is our fault! We need to refactory src/vm/mips64/asmhelpers.S and src/vm/mips64/calldescrworkermips64.S to use the macro assembly.

The personality routine describes a function that is called in case exception handling is walking the stack and reaches the frame of the assembler helper... On Unix, we only use a single kind of personality routine - the UnhandledExceptionHandlerUnix... There are only four assembler helper functions that use it - UMThunkStub, TheUMEntryPrestub in the runtime and ExceptionHijack / FuncEvalHijack in the debugger libraries.

Sorry for my misunderstanding! I thought personality is cfi_personality. And we need also to double check the four assembler helper functions for MIPS64.

I believe you need to make that work. What was the problem you were hitting if it was not commented out?

Yes! It is just workaround to comment UPDATEREG(Fp) for testcase JIT/CodeGenBringUpTests/div2_d/div2_d.exe:

#elif defined(_TARGET_MIPS64_)

    UPDATEREG(S0);
    UPDATEREG(S1);
    UPDATEREG(S2);
    UPDATEREG(S3);
    UPDATEREG(S4);
    UPDATEREG(S5);
    UPDATEREG(S6);
    UPDATEREG(S7);
    UPDATEREG(Gp);
    //UPDATEREG(Fp);
    UPDATEREG(Ra);

\cc @theaoqi @QiaoVanke

Thanks, Leslie Zhai

xiangzhai commented 4 years ago

Hi @janvorli

Thank you for pointing out our fault! There is no .cfi_def_cfa_offset OFFSET or .cfi_restore REG in the prolog/epilog of src/vm/mips64/asmhelpers.S and sort of assembler functions.

But why there is no .cfi_XXX in the src/pal/src/arch/arm64/context2.S?

Thanks, Leslie Zhai

janvorli commented 4 years ago

But why there is no .cfi_XXX in the src/pal/src/arch/arm64/context2.S?

It should be there. The amd64 variant has them (in the push_eflags macro). But it isn't such a big problem here as this function is a leaf one (it doesn't call any other functions), so it would only cause trouble if a hardware exception occurred in this function. But we should definitely add it. The same is true for RtlCaptureContext.

xiangzhai commented 4 years ago

Fixed by @QiaoVanke 👍