Offset in CFI directives for frameless indirect compact unwind is wrong when the prologue is not in the entry block

Quuxplusone commented 8 years ago


Bugzilla Link	PR25614
Status	NEW
Importance	P normal
Reported by	Charles Davis (cdavis5x@gmail.com)
Reported on	2015-11-23 16:02:08 -0800
Last modified on	2016-04-30 08:35:41 -0700
Version	trunk
Hardware	PC All
CC	artem.tamazov@amd.com, hfinkel@anl.gov, jsweval@arxan.com, llvm-bugs@lists.llvm.org, quentin.colombet@gmail.com, rafael@espindo.la
Fixed by commit(s)
Attachments	`pr25614.patch` (1474 bytes, text/plain)
Blocks
Blocked by
See also

The compact unwind emitters (at least, the x86 one) currently assume that the
prologue is at the beginning of the function. That way, for a frameless
function with a large stack allocation, it can figure the offset of the stack
allocation amount based solely on the CFI.

Shrink-wrapping breaks this assumption, because now the prologue may no longer
be at the beginning of the function. But, for a frameless function with a large
stack allocation, the compact unwind emitter stores the offset of the stack
allocation amount relative to the prologue, not the beginning of the function.
Thus, when the unwinder reads the unwind information for this function, it will
happily use the offset given in the unwind info... and get a bogus result,
leading to a crash in the unwinder... or worse, no crash and a bad
backtrace/unwind.

This assembly test case demonstrates the problem:

        .section        __TEXT,__text,regular,pure_instructions
        .globl  _test2
_test2:                                 ## @test2
        .cfi_startproc
## BB#0:                                ## %prologue
        movzbl  _guard(%rip), %eax
        andl    $1, %eax
        cmpl    $1, %eax
        jne     LBB0_2
## BB#1:
        retq
LBB0_2:                                 ## %.0.entry
        subq    $3232, %rsp             ## imm = 0xca0
Ltmp20:
        .cfi_def_cfa_offset 3240
        callq   _foo
        addq    $3232, %rsp             ## imm = 0xca0
        retq
        .cfi_endproc

.subsections_via_symbols

Note the code preceding the prologue, checking to see if the function should be
executed. We should produce a compact encoding of 0x03132000, but we wind up
producing 0x03032000 (i.e. stack allocation offset is 3 when it should be 19).

Unfortunately, at this point, the information we need (i.e. the offset of the
prologue) is missing. We certainly won't have this information for raw assembly.

Quuxplusone commented 8 years ago

Hi Charles,

This is not clear to me if we have to fix the unwinder, the information emitted
by the compiler (like cfa offset 0, before we get to the prologue), or shrink-
wrapping, e.g., by doing that only for nounwind function.

What are your thoughts?

Thanks,
-Quentin

Quuxplusone commented 8 years ago

(In reply to comment #1)
> Hi Charles,
>
> This is not clear to me if we have to fix the unwinder, the information
> emitted by the compiler (like cfa offset 0, before we get to the prologue),
That depends. Is the offset to the stack allocation amount in a frameless
indirect entry supposed to be relative to the beginning of the function, or the
prologue? If it's the former, we have to fix the compiler, and if it's the
latter, we have to fix the unwinder. Based on <mach-
o/compact_unwind_encoding.h> (emphasis added):

  The compact encoding contains the offset to the nnnnnnnn value *in the function*
  in UNWIND_X86_64_FRAMELESS_STACK_SIZE.

I think it's the former. (Am I wrong?)

In that case, I think the right way to fix it is to have the compact unwind
emitter scan the function for 'subl $nnnnnnnn, %esp' (i386)/'subq $nnnnnnnn,
%rsp' (x86-64) instructions. If there is exactly one of them, we'll encode the
offset of that instruction, plus 2 (i386)/3 (x86-64) (to advance past the
opcode and Mod-R/M bytes, and on x86-64, the REX byte). If there's more than
one, or none at all*, we'll bail and fall back to full DWARF CFI.

Only problem is, I don't know how to get either the instruction bytes or the
list of MCInst objects from the compact unwind emitter, and it's not entirely
clear to me how to do this from reading the source.

* It might be possible to make the emitter a little bit smarter and detect,
e.g., moving the stack pointer to another register and doing the subtraction
there, then moving the result back to SP, but I don't think that's really worth
the effort.

> or shrink-wrapping, e.g., by doing that only for nounwind function.
Honestly, I think this is the easiest way to fix this bug for now (but we'll
lose the benefits of shrink-wrapping for such functions if we do that).
Alternatively, we could attempt to detect a shrink-wrapped function in the
compact unwind emitter and bail out if we find one (but then we'll have to fall
back to full DWARF for all shrink-wrapped functions; also, this requires us to
scan the instruction bytes; see above).

Quuxplusone commented 8 years ago

(In reply to comment #2)
> (In reply to comment #1)
> > Hi Charles,
> >
...
>
> Only problem is, I don't know how to get either the instruction bytes or the
> list of MCInst objects from the compact unwind emitter, and it's not
> entirely clear to me how to do this from reading the source.
>

It does not look like you can from the current MCStreamer API. Can't you
generate a label in some appropriate place, and a corresponding symbolic
expression, in order to let the assembler work out the value for the offset?

Also, that system header defines UNWIND_IS_NOT_FUNCTION_START, is that relevant?

> ...

Quuxplusone commented 8 years ago

(In reply to comment #2)
> (In reply to comment #1)
> > Hi Charles,
> >
> > This is not clear to me if we have to fix the unwinder, the information
> > emitted by the compiler (like cfa offset 0, before we get to the prologue),
> That depends. Is the offset to the stack allocation amount in a frameless
> indirect entry supposed to be relative to the beginning of the function, or
> the prologue? If it's the former, we have to fix the compiler, and if it's
> the latter, we have to fix the unwinder. Based on
> <mach-o/compact_unwind_encoding.h> (emphasis added):
>
>   The compact encoding contains the offset to the nnnnnnnn value *in the
> function*
>   in UNWIND_X86_64_FRAMELESS_STACK_SIZE.
>
> I think it's the former. (Am I wrong?)

Based on the comment, I think it’s the former as well. However, I don’t get why
the distinction you are pointing out would make a difference on where to fix
the problem and what we should fix in the compiler.

> > or shrink-wrapping, e.g., by doing that only for nounwind function.
> Honestly, I think this is the easiest way to fix this bug for now (but we'll
> lose the benefits of shrink-wrapping for such functions if we do that).

Do you happen to have a test case to reproduce the problem?
As far as I can tell, the unwinder did not get any trouble to reconstruct the
backtrace on my system.

Quuxplusone commented 8 years ago

(In reply to comment #3)
> (In reply to comment #2)
> > (In reply to comment #1)
> > > Hi Charles,
> > >
> ...
> >
> > Only problem is, I don't know how to get either the instruction bytes or the
> > list of MCInst objects from the compact unwind emitter, and it's not
> > entirely clear to me how to do this from reading the source.
> >
>
> It does not look like you can from the current MCStreamer API. Can't you
> generate a label in some appropriate place, and a corresponding symbolic
> expression, in order to let the assembler work out the value for the offset?
That's an interesting idea. We could mark the 'sub $nn, %sp' instruction with
an MCSymbol while building up the CFI, just like we mark the .cfi_startproc
location; then when it's time to emit the compact unwind data, we could pass an
MCExpr for the difference between this symbol and the Begin symbol. Is that
what you have in mind?

>
> Also, that system header defines UNWIND_IS_NOT_FUNCTION_START, is that
> relevant?
It's not used anywhere in libunwind. It's not documented in that header,
either. I'm not even sure what it's for; if I had to guess, I'd say it's for
when the .cfi_startproc doesn't line up with the actual beginning of the
function (i.e. the entry in the symtab). I wonder if we could fix the unwinder
to account for this, and then have the compiler stick the .cfi_startproc at the
Save point (where the shrink-wrapping pass puts the prologue). Then if the
start of the function and the location of the .cfi_startproc don't match, we'll
set this bit in the compact unwind data. I don't know, though. Someone who
knows more about this than I do should really comment on that.

(In reply to comment #4)
> (In reply to comment #2)
> > (In reply to comment #1)
> > > Hi Charles,
> > >
> > > This is not clear to me if we have to fix the unwinder, the information
> > > emitted by the compiler (like cfa offset 0, before we get to the
prologue),
> > That depends. Is the offset to the stack allocation amount in a frameless
> > indirect entry supposed to be relative to the beginning of the function, or
> > the prologue? If it's the former, we have to fix the compiler, and if it's
> > the latter, we have to fix the unwinder. Based on
> > <mach-o/compact_unwind_encoding.h> (emphasis added):
> >
> >   The compact encoding contains the offset to the nnnnnnnn value *in the
> > function*
> >   in UNWIND_X86_64_FRAMELESS_STACK_SIZE.
> >
> > I think it's the former. (Am I wrong?)
>
> Based on the comment, I think it’s the former as well. However, I don’t get
> why the distinction you are pointing out would make a difference on where to
> fix the problem and what we should fix in the compiler.
I guess we could fix the unwinder to scan the instruction bytes. But if we're
going to do that, then why even encode the offset to the stack allocation at
all? Or did you have something else in mind?

>
> > > or shrink-wrapping, e.g., by doing that only for nounwind function.
> > Honestly, I think this is the easiest way to fix this bug for now (but we'll
> > lose the benefits of shrink-wrapping for such functions if we do that).
>
> Do you happen to have a test case to reproduce the problem?
I had an LLVM IR file; I think I can regenerate that for you.

The original context was in getting llgo up and running on OS X. That assembly
test case was compiled from an LLVM IR test case that I reduced from the output
of llgo. I have the object file With shrink-wrapping on, I get weird crashes
trying to run some programs that were built with llgo. (Unfortunately, you need
a whole bunch of patches to do this right now; I've attached them to a radar
bug I filed against ld64 for a completely different compact unwind problem, but
if you want, I can send them to you, too. Oh, and unfortunately, you also need
a fairly recent gcc installed; that's something I forgot to mention in that
radar. I know Apple doesn't particularly like the GPLv3, so I don't know how
you feel about installing gcc on your system.)

> As far as I can tell, the unwinder did not get any trouble to reconstruct
> the backtrace on my system.
Huh. Since there's no frame pointer in a frameless function, I would've figured
that the bogus stack offset would've completely thrown off the unwinder.

Quuxplusone commented 8 years ago

...
> >
> > It does not look like you can from the current MCStreamer API. Can't you
> > generate a label in some appropriate place, and a corresponding symbolic
> > expression, in order to let the assembler work out the value for the offset?
> That's an interesting idea. We could mark the 'sub $nn, %sp' instruction
> with an MCSymbol while building up the CFI, just like we mark the
> .cfi_startproc location; then when it's time to emit the compact unwind
> data, we could pass an MCExpr for the difference between this symbol and the
> Begin symbol. Is that what you have in mind?

Yes, something along those lines.

> ...

Quuxplusone commented 8 years ago

> > As far as I can tell, the unwinder did not get any trouble to reconstruct
> > the backtrace on my system.
> Huh. Since there's no frame pointer in a frameless function, I would've
> figured that the bogus stack offset would've completely thrown off the
> unwinder.

Actually, I haven’t checked without a frame-pointer. That would explain why I
am not seeing it :).

Quuxplusone commented 8 years ago

I had a quick look at the CFI directives emission and indeed they are wrong.
The offset is expressed from the beginning of the basic block and there are no
way to provide the correct information in the MC layer for now AFAICT.

Thus, I’ll go with a work around that disable shrink-wrapping for function
without nounwind and that are frameless.

Charles, I’ll send a patch for you to try.

Quuxplusone commented 8 years ago

Attached pr25614.patch (1474 bytes, text/plain): Proposed workaround

Quuxplusone commented 8 years ago

Workaround landed:
Committed revision 255175.

Quuxplusone / LLVMBugzillaTest

Offset in CFI directives for frameless indirect compact unwind is wrong when the prologue is not in the entry block #25613