Quuxplusone / LLVMBugzillaTest

0 stars 0 forks source link

clang corrupts rsp when used as output constraint (Linux kernel >= 4.6 inline asm) #32885

Open Quuxplusone opened 7 years ago

Quuxplusone commented 7 years ago
Bugzilla Link PR33913
Status NEW
Importance P enhancement
Reported by Manoj Gupta (manojgupta@google.com)
Reported on 2017-07-24 11:59:47 -0700
Last modified on 2017-09-04 03:23:47 -0700
Version trunk
Hardware PC Linux
CC coby.tayree@intel.com, dvyukov@google.com, efriedma@quicinc.com, glider@google.com, llvm-bugs@lists.llvm.org, rengolin@gmail.com, rnk@google.com, zvirack@gmail.com
Fixed by commit(s)
Attachments
Blocks
Blocked by
See also
Reported by mka@chromium.org.
chromium bug: https://bugs.chromium.org/p/chromium/issues/detail?id=737659

The following upstream kernel commit intends to forces a stack frame before
inline assembly code if it doesn't already exist:

commit f05058c4d652b619adfda6c78d8f5b341169c264
Author: Chris J Arges <chris.j.arges@canonical.com>
Date:   Thu Jan 21 16:49:25 2016 -0600

    x86/uaccess: Add stack frame output operand in get_user() inline asm

    Numerous 'call without frame pointer save/setup' warnings are introduced
    by stacktool because of functions using the get_user() macro. Bad stack
    traces could occur due to lack of or misplacement of stack frame setup
    code.

    This patch forces a stack frame to be created before the inline asm code
    if CONFIG_FRAME_POINTER is enabled by listing the stack pointer as an
    output operand for the get_user() inline assembly statement.

diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h
index a4a30e4b2d34..9bbb3b2d0372 100644
--- a/arch/x86/include/asm/uaccess.h
+++ b/arch/x86/include/asm/uaccess.h
@@ -179,10 +179,11 @@ __typeof__(__builtin_choose_expr(sizeof(x) > sizeof(0UL),
0ULL, 0UL))
 ({                                                                     \
        int __ret_gu;                                                   \
        register __inttype(*(ptr)) __val_gu asm("%"_ASM_DX);            \
+       register void *__sp asm(_ASM_SP);                               \
        __chk_user_ptr(ptr);                                            \
        might_fault();                                                  \
-       asm volatile("call __get_user_%P3"                              \
-                    : "=a" (__ret_gu), "=r" (__val_gu)                 \
+       asm volatile("call __get_user_%P4"                              \
+                    : "=a" (__ret_gu), "=r" (__val_gu), "+r" (__sp)    \
                     : "0" (ptr), "i" (sizeof(*(ptr))));                \

This inline asm causes double fault when compiled with clang.

Analysis by Josh Poimboeuf:

  Here's the reason for the double fault.  First it puts zero on the stack
  at offset -0x58:

  > ffffffff81367616:     31 c0                   xor    %eax,%eax
  > ffffffff81367618:     48 89 45 c8             mov    %rax,-0x38(%rbp)
  > ffffffff8136761c:     45 31 ff                xor    %r15d,%r15d
  > ffffffff8136761f:     48 89 45 a8             mov    %rax,-0x58(%rbp)

  Then, later, it copies that zeroed word from the stack to RSP:

  > ffffffff81367874:     48 8b 65 a8             mov    -0x58(%rbp),%rsp

  Then it double faults because the call instruction tries to write RIP on
  the stack, but RSP is zero:

  > ffffffff81367878:     e8 73 26 f1 ff          callq  ffffffff81279ef0 <__get_user_4>

  Then clang tries to put RSP's value on the stack, at the same stack slot
  where the original zero was stored (though it never reaches this point):

  > ffffffff8136787d:     49 89 d4                mov    %rdx,%r12
  > ffffffff81367880:     48 89 65 a8             mov    %rsp,-0x58(%rbp)

  The panic is consistent with the above.  RIP points to the call
  instruction, RSP is zero:

  > [    3.798722] PANIC: double fault, error_code: 0x0
  > [    3.807387] CPU: 1 PID: 605 Comm: frecon Not tainted 4.12.0-00023-g711d82c128ff #107
  > [    3.816040] Hardware name: GOOGLE Squawks, BIOS Google_Squawks.5216.152.76 03/04/2016
  > [    3.824792] task: ffff880075b92f00 task.stack: ffffc90000d6c000
  > [    3.829599] EXT4-fs (mmcblk0p1): re-mounted. Opts: commit=600,data=ordered
  > [    3.839092] RIP: 0010:drm_mode_setcrtc+0x328/0x51f
  > [    3.844443] RSP: 0018:0000000000000000 EFLAGS: 00010206
  > [    3.850280] RAX: 0000559e707c4d60 RBX: 0000000000000000 RCX: 0000000000000008
  > [    3.858253] RDX: 0000000000000001 RSI: ffffc90000d6fcc8 RDI: ffffffff81367805
  > [    3.866225] RBP: ffffc90000d6fd90 R08: 00000000014000c0 R09: 0000000000000308
  > [    3.874199] R10: 0000000000000300 R11: 0000000000000556 R12: 0000000000000000
  > [    3.882163] R13: ffff880077a25000 R14: ffffc90000d6fdd0 R15: 0000000000000000
  > [    3.890136] FS:  00007fb2dbd62740(0000) GS:ffff88007ad00000(0000) knlGS:0000000000000000
  > [    3.899177] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  > [    3.905595] CR2: fffffffffffffff8 CR3: 000000007596b000 CR4: 00000000001006e0
  > [    3.913568] Call Trace:
  > [    3.916296] Code: b8 48 8d b5 38 ff ff ff 41 8b 56 08 39 d3 0f 83 85 00 00 00 4c 63 fb 4a 83 24 f8 00 4a 8d 04 bd 00 00 00 00 49 03 06 48 8b 65 a8 <e8> 73 26 f1 ff 49 89 d4 48 89 65 a8 85 c0 0f 85 a0 00 00 00 ba
  > [    3.937440] Kernel panic - not syncing: Machine halted.
  > [    3.943268] CPU: 1 PID: 605 Comm: frecon Not tainted 4.12.0-00023-g711d82c128ff #107
  > [    3.951921] Hardware name: GOOGLE Squawks, BIOS Google_Squawks.5216.152.76 03/04/2016
  > [    3.960671] Call Trace:
  > [    3.963398]  <#DF>
  > [    3.965637]  __dump_stack+0x19/0x1b
  > [    3.969531]  dump_stack+0x42/0x60

  clang is obviously getting confused by the RSP output constraint.  I
  think it tries to take the constraint literally, since it takes RSP as
  an output from the inline asm and stores it on the stack.  However, that
  behavior doesn't really make sense for a "register" variable.  It also
  doesn't explain why it's zeroing the register out first.

Link with the discussion: https://patchwork.kernel.org/patch/9837437/

More info:
 there are two separate issues here.

  1) The first issue is whether it's supported behavior to specify RSP as
     an output constraint in order to force GCC to create a stack frame.
     As far as I know, this is a quirk of GCC, and not really considered
     defined behavior.

     However, the idea was suggested by some GCC developers:

       https://gcc.gnu.org/ml/gcc/2015-07/msg00079.html

     So at least it seems to be endorsed by GCC to some degree.  If you
     need details on why it works, that thread has the details.

  2) The second issue is whether clang should corrupt RSP.  I don't see a
     reason for clang to do that.  IMO, when using a local register
     variable as an input or output to inline asm, the compiler should
     leave the contents of the register alone.

     FWIW, my reading of the GCC manual seems to support that:

       https://gcc.gnu.org/onlinedocs/gcc/Local-Register-Variables.html#Local-Register-Variables
Quuxplusone commented 7 years ago
Consider the following:

void a() {
  register int a asm("sp") = 0;
  asm volatile("nop":"+r"(a));
}

In this case, both gcc and clang zero out "sp".

If you don't initialize the variable, you're basically asking the compiler to
put uninitialized data into rsp.  If you're lucky, the compiler realizes that
putting uninitialized data into rsp is a no-op, and therefore does nothing...
but if you're unlucky, the compiler shoves some other unrelated value into rsp,
and it explodes (which is what is happening here).

I think the right approach here is to propose some well-defined mechanism for
getting the result you want... and then maybe add a hack to clang to map this
particular construct to the same mechanism.
Quuxplusone commented 7 years ago

According to what

Quuxplusone commented 7 years ago
(Sorry, accidentally sent a truncated message.)

According to what Renato wrote here:
https://lists.linuxfoundation.org/pipermail/llvmlinux/2014-May/000946.html, GCC
doesn't seem to always handle local register variables correctly either (I've
just checked this is also true for x86_64), e.g. it may drop a store to such a
variable.

The easiest way to fix the crashes is to move __sp to the global scope:

diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h
index a969ae6..6adc0a7 100644
--- a/arch/x86/include/asm/uaccess.h
+++ b/arch/x86/include/asm/uaccess.h
@@ -174,11 +174,13 @@ __typeof__(__builtin_choose_expr(sizeof(x) > sizeof(0UL),
0ULL, 0UL))
  * Clang/LLVM cares about the size of the register, but still wants
  * the base register for something that ends up being a pair.
  */
+
+register unsigned long int __sp asm(_ASM_SP);
+
 #define get_user(x, ptr)                       \
 ({                                 \
    int __ret_gu;                           \
    register __inttype(*(ptr)) __val_gu asm("%"_ASM_DX);        \
-   register void *__sp asm(_ASM_SP);               \
    __chk_user_ptr(ptr);                        \
    might_fault();                          \
    asm volatile("call __get_user_%P4"              \
Quuxplusone commented 7 years ago

Actually, the linked thread (https://lkml.org/lkml/2017/7/12/555) contains a deeper analysis by Josh Poimboeuf who also notes that simply making __sp a global variable leads to a kernel .text size regression under GCC.

Quuxplusone commented 7 years ago

My reading from that thread is that both clang and gcc treat the __sp variable different and each has its own benefits/problems. Since this is undefined and largely undocumented behaviour, I find it hard to believe either side will be convinced to change.

However, there is one hint in that thread that may bring the final solution. Just add SP directly to the clobber list. It should work on both compilers and have the intended effect without additional movs.

Quuxplusone commented 7 years ago
> My reading from that thread is that both clang and gcc treat the __sp variable
> different and each has its own benefits/problems. Since this is undefined and
> largely undocumented behaviour, I find it hard to believe either side will be
> convinced to change.
Agreed.

> However, there is one hint in that thread that may bring the final solution.
> Just add SP directly to the clobber list. It should work on both compilers and
> have the intended effect without additional movs.

Quoting https://lkml.org/lkml/2017/7/19/1144:

"""
> > IIRC, clobbering SP does at least force the stack frame on GCC, though I
> > need to double check that.  I can try to work up an official patch in
> > the next week or so (need to do some testing first).
>
> Sounds great.
>
> Thanks again for looking into this and coming up with a solution!

After doing some testing, I don't think this approach is going to work
after all.  In addition to forcing the stack frame, it also causes GCC
to add an unnecessary extra instruction to the epilogue of each affected
function:

  lea    -0x10(%rbp),%rsp
"""
, so a patch that clobbers SP is unlikely to be accepted upstream (although it
makes Clang build work :))

Josh is currently working on a more intrusive kernel patch that's likely to
solve the problem:
https://git.kernel.org/pub/scm/linux/kernel/git/jpoimboe/linux.git/log/?h=ASM_CALL
Quuxplusone commented 7 years ago
(In reply to Alexander Potapenko from comment #6)
> After doing some testing, I don't think this approach is going to work
> after all.  In addition to forcing the stack frame, it also causes GCC
> to add an unnecessary extra instruction to the epilogue of each affected
> function:

Right, that's not good either. :(

> Josh is currently working on a more intrusive kernel patch that's likely to
> solve the problem:
> https://git.kernel.org/pub/scm/linux/kernel/git/jpoimboe/linux.git/log/
> ?h=ASM_CALL

Looks hairy, but mostly mechanical.
Quuxplusone commented 7 years ago
Josh uploaded a new patch for this (https://lkml.org/lkml/2017/8/31/513). But
there are some questions raised in particular by Linus Torvalds
(https://lkml.org/lkml/2017/8/31/627):

On Thu, Aug 31, 2017 at 09:11:54AM -0700, Linus Torvalds wrote:

On the whole, I'm not entirely sure this is the right approach. I
think we should

 (a) approach clang about their obvious bug (a compiler that clobbers
%rsp because we mark it as in/out is clearly buggy)

 (b) ask gcc people if there's some other alternative that would work
with clang as-is rather than the "mark %rsp register as clobbered"

I couldn't actually find the %rsp trick in any docs, I assume it came
from discussions with gcc developers directly. Maybe there is
something else we could do that doesn't upset clang?

Perhaps we can mark the frame pointer as an input, for example? Inputs
also have the advantage that appending to the input list doesn't
change the argument numbering, so we don't need to worry about
numbered arguments (not that I mind the naming of arguments, but I
kind of hate having to do it as part of this series).

Hmm?

              Linus
Quuxplusone commented 7 years ago

Do we have a short repro for this problem that doesn't require building the whole kernel?