clang 4.0.0 (and clang 5)/powerpc/powerpc64's _Unwind_RaiseException code generation has messed up r31 (frame pointer) save/restore code (SEGV's can result)

Quuxplusone commented 8 years ago


Bugzilla Link	PR26856
Status	NEW
Importance	P normal
Reported by	Mark Millard (marklmi26-fbsd@yahoo.com)
Reported on	2016-03-06 05:34:38 -0800
Last modified on	2020-11-04 12:11:16 -0800
Version	5.0
Hardware	Other FreeBSD
CC	dgregor@apple.com, emaste@freebsd.org, hfinkel@anl.gov, kit.barton@gmail.com, llvm-bugs@lists.llvm.org, nemanja.i.ibm@gmail.com, ulrich.weigand@de.ibm.com, uweigand@de.ibm.com, willdtz@gmail.com
Fixed by commit(s)
Attachments
Blocks
Blocked by
See also	PR25780

The below causes gdb difficulties for its stack handling: more than just
exception handling is at issue. I just happened to notice it via exception
handling.

Function _Unwind_RaiseException below is from a FreeBSD "buildworld" using
clang 3.8.0.

Dump of assembler code for function _Unwind_RaiseException:
   0x41b2ab80 <+0>: mflr    r0
   0x41b2ab84 <+4>: stw     r31,-148(r1)
   0x41b2ab88 <+8>: stw     r30,-152(r1)
   0x41b2ab8c <+12>:    stw     r0,4(r1)
   0x41b2ab90 <+16>:    stwu    r1,-2992(r1)
   0x41b2ab94 <+20>:    mr      r31,r1
. . .
   0x41b2abe0 <+96>:    stw     r31,2844(r31)
(which replaces the earlier save of the old Frame pointer R31
value with a copy of r1's current  value. Note the offset
relationships with the r1 adjustment: -2992+2844=-148)
. . .
   0x41b2add0 <+592>:   lwz     r31,2844(r31)
(This restores the r1 value that resulted from the "stwu r1,-2992(r1)" into
R31.)
. . .
   0x41b2ae30 <+688>:   lwz     r31,-148(r1)
(This restores the r1 value that resulted from the "stwu r1,-2992(r1)" into
R31.)
. . .

The wrong r31 value is present when _Unwind_RaiseException returns.

But before that while _Unwind_RaiseException is active the C++ exception
handling infrastructure has been given bad r31 information for around
_Unwind_RaiseException's frame.

Quuxplusone commented 8 years ago

The problem does not occur for TARGET_ARCH=powerpc64. It is powerpc specific.

Since clang 3.8.0 violates the SVR4 ABI for TARGET_ARCH=powerpc currently by needing a "red-zone" on the stack to cover decrementing the stack pointer (r1) late and incrementing it early and . . .

A) each r31 store is on a different side of the late r1 decrement

and

B) each r31 restore is on a different side of the early r1 increment

This problem may be tied to ABI problem: part of the code for one ABI and part of the code for the other ABI.

Quuxplusone commented 8 years ago

(In reply to comment #1)

Dumb mistake on my part for comment #1. . . Too bad I can not delete the
comment to avoid confusing anyone temporarily.

> The problem does not occur for TARGET_ARCH=powerpc64. It is powerpc specific.
>
> Since clang 3.8.0 violates the SVR4 ABI for TARGET_ARCH=powerpc currently by
> needing a "red-zone" on the stack to cover decrementing the stack pointer
> (r1) late and incrementing it early and . . .
>
> A) each r31 store is on a different side of the late r1 decrement
>
> and
>
> B) each r31 restore is on a different side of the early r1 increment
>
> This problem may be tied to ABI problem: part of the code for one ABI and
> part of the code for the other ABI.

The example was not compiled by clang 3.8.0. TARGET_ARCH=powerpc64 via clang
does not buildworld yet. I switched files to look at without thinking about the
implicit context change associated.

Quuxplusone commented 8 years ago

I compiled a .o for TARGET_ARCH=powerpc64 via letting "buildworld" get as far as it could and the resultant .o produced has the same sort of r31/frame-pointer problem as powerpc for _Unwind_RaiseException: The problem DOES occur for powerpc64.

A) r31 is stored twice to the same location, with the 2nd store destroying the frame-pointer value that is supposed to be saved and restored for the caller.

B) r31 is restored twice from the same location.

The FreeBSD TARGET_ARCH=powerpc64 does officially use a stack red-zone on the low-address side with officially "late" decrement and "early" increment (AIX like). Relative to this:

A) each r31 store is on a different side of the "late" r1 decrement

and

B) each r31 restore is on a different side of the "early" r1 increment

TARGET_ARCH=powerpc gets that same relationships but the late r1 decrement and early r1 increment are SVR4 ABI violations: SVR4'sABI does not require a "red-zone" on the low-address side of the stack.

(To get as far as I have for powerpc "buildworld" I had to add signal red-zone handling to my personal FreeBSD builds.)

Quuxplusone commented 8 years ago

For __builtin_unwind_init() consequences (desired vs. actual) and
TargetFrameLowering::determineCalleeSaves behavior for at least powerpc and
powerpc64. . .

When something like a Frame Pointer is already in use in how the call standard
is implemented the code will already save the register and update it
independent of the "SavedRegs.set(Reg)" status for the register.

And, in fact, "SavedRegs.set(Reg)" on the Frame Pointer Register that gets the
save behavior after Frame Register update will replace the value that should
later be restored with the updated value that should not be what is later
restored. Such would not be a redundant store but instead an incorrect store.

That is what is actually happening for powerpc and powerpc64. (Any others?)

So for a call standard such as for powerpc64 where the Frame Pointer Register
(such as r31) is saved and updated before the (other) callee saved registers
are saved the Frame Pointer Register must not also be one of the
"SavedRegs.set(Reg)" registers.

This is true even when "MF.getMMI().callsUnwindInit()" [__builtin_unwind_init()
usage] indicates that overall all the potential callee-saved registers should
be saved and restored: The Frame Pointer already is saved and restored and
should not be "double saves/double restored".

The same is true for the plain powerpc code generation technique that is
currently in use (but that violates the SVR4 call/return ABI properties by
being more AIX-like that requires a red-zone on the low address side of the
stack). It is likely true for correct SVR4 ABI stack pointer handling/timing as
well.

For all I know there would be other machine architectures with the issue as
well.

Another "MF.getMMI().callsUnwindInit()" [__builtin_unwind_init() usage] related
issue seems to be that the .eh_frame information needs to span things like
Fields CR2, CR3, and CR4 of the condition register (CR) for powerpc being non-
volatile in SVR4's ABI: The FDE information does not seem to be recording
everything that is non-volatile for this ABI (and possibly others). (Likely the
whole CR is saved and the restore code would be the part to deal with selective
restore if the other fields of CR are to actually guarantee volatile status.
Volatile status does not need to be guaranteed from what I can tell.)

Yet another issue related to "MF.getMMI().callsUnwindInit()"
[__builtin_unwind_init() usage] for when the Itanium C++ Exception ABI is in
use is that there are 4 scratch registers in addition to the normal callee-
saves/restored registers. As stands for powerpc and powerpc64 no place to save
the scratch register values has been set up so that they can be restored for
the landing pad code --and the .eh_frame information makes no reference to the
scratch registers at all.

gcc generates save/restore code for r3, r4, r5, and r6 for the 4 scratch
registers for powerpc and powerpc64 --and explicitly records the information in
the FDE in the .eh_frame information. FreeBSD's system libgcc_s depends on
this, as an example. It gets a SEGV from trying to use address zero restore r3.

It would appear to me that TargetFrameLowering::determineCalleeSaves should
deal with the exception ABI's scratch registers (when there are such) but that
is does not explicitly do so for any machine architecture. (Is all the logic
elsewhere?)

Quuxplusone commented 8 years ago

PPCFrameLowering::determineCalleeSaves may be too late or otherwise
inappropriate for handling the "double Frame Pointer Register store" problem
for __builtin_unwind_init() contexts.

Side notes:

PPCFrameLowering::determineCalleeSaves has no code for the Itanium-like C++
Exception ABI's 4 scratch registers [r3, r4, r5, r6] for
__builtin_unwind_init() contexts.

Apparently for __builtin_unwind_init() contexts (and any others?)

  // For 32-bit SVR4, allocate the nonvolatile CR spill slot iff the
  // function uses CR 2, 3, or 4.
  if (!isPPC64 && !isDarwinABI &&
      (SavedRegs.test(PPC::CR2) ||
       SavedRegs.test(PPC::CR3) ||
       SavedRegs.test(PPC::CR4))) {
    int FrameIdx = MFI->CreateFixedObject((uint64_t)4, (int64_t)-4, true);
    FI->setCRSpillFrameIndex(FrameIdx);
  }

in PPCFrameLowering::determineCalleeSaves [in
Target/PowerPC/PPCFrameLowering.cpp] is not sufficient for the .eh_frame
information to record the save/restore of (fields of) the CR.

This leaves the C++ Exception handing as not preserving these CR fields.

(Technically volatile-status need not be preserved but non-volatile-status
needs to be preserved: In other words the whole CR could be restored and still
comply with the SVR4 ABI.)

Quuxplusone commented 8 years ago

Yes, there's indeed a couple of problems here, which affect different areas.

1) On 32-bit ppc, LLVM violates the ABI by storing below the stack pointer even
though the ABI does not provide a "red zone".  This affects every function with
a stack frame, and could in theory lead to spurious crashes when an
asynchronous signal overwrites this area.  This seems to be a known issue; the
source code contains FIXME lines:
    // FIXME: On PPC32 SVR4, we must not spill before claiming the stackframe.

2) In some scenarios, registers may be spilled/restored twice to the stack.
This happens because while most of the spilling happens in
PPCFrameLowering::spillCalleeSavedRegisters, a few selected registers are also
spilled in PPCFrameLowering::emitPrologue.  Those registers are the frame
pointer, base pointer, PIC base pointer, link register, and condition code
register.  For the latter two, code ensures that they can never be spilled in
both places (for CR, there is extra code in spillCalleeSavedRegisters; for LR,
the register is removed from SavedRegs in determineCalleeSaves).

However, for FP, BP, and PBP, nothing ensures the registers are not spilled
twice.  It is probably *rare* for this to happen, because the register
allocator will not use those registers within the function if they're needed
for their special purpose, but it can happen in rare cases.  This includes the
case of a system unwinder routine that uses __builtin_unwind_init, but could
also include other routines that clobber one of those registers, e.g. the
following case:

void func (void);

void test (void)
{
  func ();
  asm ("nop" : : : "31");
}

When it happens that a register is spilled twice, the code as such still works
correctly, but the DWARF CFI unwind info associated with the routine will be
broken, which can mess up both C++ exception handling and debugging.

3) For the specific case of system unwinder routines that use
__builtin_unwind_init and/or __builtin_eh_return, special things need to happen
in the prolog and epilog that are not required for any other routine.  This in
particular includes setting up save areas and CFI records for the EH data
registers (r3 ... r6).  [See bug #26844. ]  For the ELFv2 ABI (powerpc64le), it
also include using three separate save areas for the three caller-saved
condition register fields, so that the EH logic can overwrite their values
independently.

None of this is currently implemented in LLVM, since on Linux generally GCC is
used to build the system unwind libraries, and no other code in the system ever
needs those special constructs.

Quuxplusone commented 8 years ago

(In reply to comment #6)
> Yes, there's indeed a couple of problems here, which affect different areas.
>
> 1) On 32-bit ppc, LLVM violates the ABI by storing below the stack pointer
> even though the ABI does not provide a "red zone".  This affects every
> function with a stack frame, and could in theory lead to spurious crashes
> when an asynchronous signal overwrites this area.  This seems to be a known
> issue; the source code contains FIXME lines:
>     // FIXME: On PPC32 SVR4, we must not spill before claiming the
> stackframe.

This is bug #26519. (I made personal FreeBSD changes so that signal delivery
used a red-zone in order to allow continuing my investigations of what was
working for FreeBSD buildworld use.)

>
> 2) In some scenarios, registers may be spilled/restored twice to the stack.
> This happens because while most of the spilling happens in
> PPCFrameLowering::spillCalleeSavedRegisters, a few selected registers are
> also spilled in PPCFrameLowering::emitPrologue.  Those registers are the
> frame pointer, base pointer, PIC base pointer, link register, and condition
> code register.  For the latter two, code ensures that they can never be
> spilled in both places (for CR, there is extra code in
> spillCalleeSavedRegisters; for LR, the register is removed from SavedRegs in
> determineCalleeSaves).
>
> However, for FP, BP, and PBP, nothing ensures the registers are not spilled
> twice.  It is probably *rare* for this to happen, because the register
> allocator will not use those registers within the function if they're needed
> for their special purpose, but it can happen in rare cases.  This includes
> the case of a system unwinder routine that uses __builtin_unwind_init, but
> could also include other routines that clobber one of those registers, e.g.
> the following case:
>
> void func (void);
>
> void test (void)
> {
>   func ();
>   asm ("nop" : : : "31");
> }
>
> When it happens that a register is spilled twice, the code as such still
> works correctly, but the DWARF CFI unwind info associated with the routine
> will be broken, which can mess up both C++ exception handling and debugging.

> 3) For the specific case of system unwinder routines that use
> __builtin_unwind_init and/or __builtin_eh_return, special things need to
> happen in the prolog and epilog that are not required for any other routine.
> This in particular includes setting up save areas and CFI records for the EH
> data registers (r3 ... r6).  [See bug #26844. ]  For the ELFv2 ABI
> (powerpc64le), it also include using three separate save areas for the three
> caller-saved condition register fields, so that the EH logic can overwrite
> their values independently.

The observed behavior for the powerpc (what should be) SVR4 context is that the
save/restore logic for CR fields 2, 3, 4 is present but no .eh_frame
information is written out for the EH logic to use for CR.

> None of this is currently implemented in LLVM, since on Linux generally GCC
> is used to build the system unwind libraries, and no other code in the
> system ever needs those special constructs.

I also submitted the somewhat related bug #26761 for __builtin_dwarf_cfa() not
returning high-address side of the frame (stack pointer "on entry" from before
its adjustment for the frame. It instead returns the frame pointer value after
its adjustment. In the implementation the distinction is Frame Depth 0 vs.
Frame Depth 1 for "FRAMEADDR" from what I can tell.

So far these issues seem to be what currently blocks FreeBSD from using clang
for powerpc buildworld and as a compiler for user-space code generally. (I've
not gotten into investigating buildkernel.)

powerpc64 has more needed, such as support for soft-float.

Quuxplusone commented 8 years ago

(In reply to comment #6)

> However, for FP, BP, and PBP, nothing ensures the registers are not spilled
> twice.. . .
>
> When it happens that a register is spilled twice, the code as such still
> works correctly, but the DWARF CFI unwind info associated with the routine
> will be broken, which can mess up both C++ exception handling and debugging.

I will note that the Frame Pointer Register (r31) being saved again to the same
location but after it was adjusted to match the adjusted stack pointer in the
callee does not work correctly in that the restore of the Frame Pointer for the
return to the caller will restore the wrong pointer value.

If the caller then uses r31 without separately also restoring r31 first then it
will be addressing the wrong memory on the stack.

The observed/reported code sequence had the 2nd r31 store in the callee after
r31 had been adjusted to match the adjusted stack pointer in the callee.

So more than C++ exception handling and debugging is broken for the reported
code sequence.

Quuxplusone commented 8 years ago

(In reply to comment #8)
> (In reply to comment #6)
>
> > However, for FP, BP, and PBP, nothing ensures the registers are not spilled
> > twice.. . .
> >
> > When it happens that a register is spilled twice, the code as such still
> > works correctly, but the DWARF CFI unwind info associated with the routine
> > will be broken, which can mess up both C++ exception handling and debugging.
>
> I will note that the Frame Pointer Register (r31) being saved again to the
> same location but after it was adjusted to match the adjusted stack pointer
> in the callee does not work correctly in that the restore of the Frame
> Pointer for the return to the caller will restore the wrong pointer value.
>
> If the caller then uses r31 without separately also restoring r31 first then
> it will be addressing the wrong memory on the stack.
>
> The observed/reported code sequence had the 2nd r31 store in the callee
> after r31 had been adjusted to match the adjusted stack pointer in the
> callee.
>
> So more than C++ exception handling and debugging is broken for the reported
> code sequence.

Ah, right.  I had been under the impression that the back-end would use two
different save areas, but indeed it does use the same area, just addressed
slightly differently.  The resulting code is then just simply incorrect.

Quuxplusone commented 8 years ago

(In reply to comment #7)

> The observed behavior for the powerpc (what should be) SVR4 context is that
> the save/restore logic for CR fields 2, 3, 4 is present but no .eh_frame
> information is written out for the EH logic to use for CR.

Huh, indeed.  This is yet another bug, which also affects functions for the
powerpc (SVR4) ABI in general, not just unwind system routines.

This seems to be a logic issue here:

      // For SVR4, don't emit a move for the CR spill slot if we haven't
      // spilled CRs.
      if (isSVR4ABI && (PPC::CR2 <= Reg && Reg <= PPC::CR4)
          && !MustSaveCR)
        continue;

MustSaveCR is always false for 32-bit, it is only used on 64-bit.  This has the
effect that on 32-bit, we never get any CFI for the CRs.

Quuxplusone commented 8 years ago

(In reply to comment #6)
...
>
> 2) In some scenarios, registers may be spilled/restored twice to the stack.
> This happens because while most of the spilling happens in
> PPCFrameLowering::spillCalleeSavedRegisters, a few selected registers are
> also spilled in PPCFrameLowering::emitPrologue.  Those registers are the
> frame pointer, base pointer, PIC base pointer, link register, and condition
> code register.  For the latter two, code ensures that they can never be
> spilled in both places (for CR, there is extra code in
> spillCalleeSavedRegisters; for LR, the register is removed from SavedRegs in
> determineCalleeSaves).
>
> However, for FP, BP, and PBP, nothing ensures the registers are not spilled
> twice.  It is probably *rare* for this to happen, because the register
> allocator will not use those registers within the function if they're needed
> for their special purpose, but it can happen in rare cases.  This includes
> the case of a system unwinder routine that uses __builtin_unwind_init, but
> could also include other routines that clobber one of those registers, e.g.
> the following case:
>
> void func (void);
>
> void test (void)
> {
>   func ();
>   asm ("nop" : : : "31");
> }

r280188 should address the spilling-twice problem.

Quuxplusone commented 8 years ago

(In reply to comment #11)
> r280188 should address the spilling-twice problem.

Thanks Hal.

Dimitry Andric (dim at FreeBSD.org) has written:

> I merged the upstream fix to projects/clang390-import:
>
> https://svnweb.freebsd.org/changeset/base/305681

So FreeBSD head/12 will be adopting your changes.

As for my activity:

I'll not have access to powerpc64s/powerpcs for a few weeks yet.

Quuxplusone commented 7 years ago

(In reply to comment #12)
> (In reply to comment #11)
> > r280188 should address the spilling-twice problem.

Context powerpc64 for this note. . .

For the code that I've checked so far: code
inspection and the operation and gdb's ability to bt where
before it could not all indicate no "spilling-twice"
problems in a FreeBSD head -r309179 minor variation and
in the code generated by its FreeBSD clang 3.9.0 variant.

Thanks!

[I've not gotten into the other specific issues from the
old discussion but as I understand the above is the only
aspect expected to have been fixed.]

Note: C++ exceptions still have lots of problems and:

#include <exception>

int main(void)
{
    try { throw std::exception(); }
    catch (std::exception& e) {}
    return 0;
}

aborts. But the failure details are different now and
gdb can bt correctly after it happens. That is a nice
improvement of itself.

Quuxplusone commented 7 years ago

FreeBSD status update. . .

stable/11 from FreeBSD has for a time had as its system
compiler its variant of 3.9.1 . So the FreeBSD 11.1
release will have the powerpc64 and powerpc fixes that
have been applied plus any others that they merge in
before then. And so on for 11.x (1<x).

(Unfortunately powerpc still has stack handling problems
relative to r30 usage [26519] and floating point code and
C++ exception handling does not work for powerpc or
powerpc64.)

I'd guess that head (12) and stable/11 would continue to
eventually pick up powerpc and powerpc64 fixes that fit
in clang 3.9.1 easily.

I'll also note that projects/clang400-import has started
so at least the eventual 12.x releases will likely be based
on clang 4.0.x . I've no clue if stable/11 might pick up
such at some point.

Quuxplusone commented 7 years ago

(In reply to Mark Millard from comment #14)
> FreeBSD status update. . .

FYI: FreeBSD has merged clang/llvm 4.0 into stable/11 (-r316423)
in addition to head (12) some time ago. release/11.1.0 will be
clang/llvm 4.0 based.

TARGET_ARCH=powerpc64 and TARGET_ARCH=powerpc still does not
handle thrown exceptions in this clang version.

TARGET_ARCH=powerpc still has the R30 vs. tack handling vs.
floating point code problem: restored R30 and then generates
floating point cleanup code that expects R30 to not have
changed yet.

Quuxplusone commented 7 years ago

The unresolved parts of this bugzilla submittal still
apply.

But Comment #11's report of a partial fix does apply.

Quuxplusone commented 7 years ago

The scratch register issue for the
powerpc64 and powerpc variants of
the Itanium C++ Exception ABI has
updated notes in llbm bugzilla
29844 based on looking at the
clang 5 results.

The save/restores of r2-r6 are
not happening based where
__builtin_eh_return(offset, handler)
is involved --but it should.
That in turn means the DQW_CFA_<?>
material for such is also not
present.

Quuxplusone commented 4 years ago

(In reply to Mark Millard from comment #17)

[Note: FreeBSD head (13) now uses clang 9.0.1 and lld
for powerpc64 (ELFv2 but big endian) and clang for
32-bit powerpc. This involved switching to llvm's
libunwind.]

Bug 26844 has been closed as invalid for a libunwind context,
which FreeBSD is switching to. (It is not trying to address
if clang++ should support other unwind libraries.)

However, it is noted that the technical detail for the older
unwind handling context still applies: clang does not
write out cfi information describing the scratch registers
r3-r6 on, say, powerpc64, but the older unwind library
depends on such being there and SIGSEGV happens when the
descriptions are not there and an exception is thrown.

Basically, clang++ parses, but ignores, the builtin that
leads g++ to write out the scratch register descriptions,
at least for powerpc64 and 32-bit powerpc. See bug 26844
for details. The lack of handling of the built-in was
figured out fairly late in the comment sequence.

See also bug 8541 .

The comments in 26856 (this bug) note things about ELFv2
powerpc64 context related defects at the time that I have
not looked into the current status of. Similarly, I've not
reviewed the status of various 32-bit powerpc notes.

The R31 part of the save/restore handling has been
operational for a long time. But some of the other
issues I do not know the status of.

Quuxplusone / LLVMBugzillaTest

clang 4.0.0 (and clang 5)/powerpc/powerpc64's _Unwind_RaiseException code generation has messed up r31 (frame pointer) save/restore code (SEGV's can result) #26855