Quuxplusone / LLVMBugzillaTest

0 stars 0 forks source link

PPC32 fails to extend i1 in stack arguments #37634

Closed Quuxplusone closed 6 years ago

Quuxplusone commented 6 years ago
Bugzilla Link PR38661
Status RESOLVED FIXED
Importance P normal
Reported by Josh Stone (jistone@redhat.com)
Reported on 2018-08-21 10:05:09 -0700
Last modified on 2018-09-14 16:03:24 -0700
Version trunk
Hardware Other Linux
CC dje.gcc@gmail.com, glaubitz@physik.fu-berlin.de, hfinkel@anl.gov, lion@aosc.io, llvm-bugs@lists.llvm.org, nemanja.i.ibm@gmail.com, simonas+llvm.org@kazlauskas.me, tstellar@redhat.com
Fixed by commit(s)
Attachments
Blocks
Blocked by
See also https://github.com/rust-lang/rust/issues/50960
GitHub user @LionNatsu managed to reduce rust#50960 to a simple LLVM-IR test:

https://github.com/rust-lang/rust/issues/50960#issuecomment-414482775

    target datalayout = "E-m:e-p:32:32-i64:64-n32"
    target triple = "powerpc-unknown-linux-gnu"

    declare zeroext i1 @a_strange_function(i1 zeroext, i1 zeroext, i1 zeroext, i1 zeroext, i1 zeroext, i1 zeroext, i1 zeroext, i1 zeroext, i1 zeroext, i1 zeroext)

    define i32 @main(i32, i8**) {
    top:
      %2 = call zeroext i1 @a_strange_function(i1 zeroext true, i1 zeroext true, i1 zeroext true, i1 zeroext true, i1 zeroext true, i1 zeroext true, i1 zeroext true, i1 zeroext true, i1 zeroext true, i1 zeroext true)
      ret i32 0
    }

The call is generated like this:

    li 3, 1
    li 4, 1
    li 5, 1
    li 6, 1
    stb 3, 12(1)
    stb 3, 8(1)
    li 3, 1
    li 7, 1
    li 8, 1
    li 9, 1
    li 10, 1
    bl a_strange_function@PLT

Those "stb" should be "stw" to properly fill the "i1 zeroext" argument to i32,
as ABI requires.  Otherwise we've only written the most-significant byte of the
full argument that the callee will read.

As it happens, Rust's callee was compiled with GCC, but we can also see a
problem in the function definition if compiled with LLVM.  Something like:

    define zeroext i1 @a_strange_function(i1 zeroext, i1 zeroext, i1 zeroext, i1 zeroext, i1 zeroext, i1 zeroext, i1 zeroext, i1 zeroext, i1 zeroext, i1 zeroext) {
      %result = and i1 %8, %9
      ret i1 %result
    }

This produces:

    lbz 3, 12(1)
    lbz 4, 8(1)
    and 3, 4, 3
    clrlwi  3, 3, 31
    blr

Those "lbz" are bug-compatible with what the caller did above, but still wrong
for the ABI.  It should either read the whole word, or just read the least-
significant bytes at 15(1) and 11(1).

These problems do not arise with "llc -O0" or "-O1", which also means these two
won't be bug-compatible if they're compiled with different optimization levels.
Of course they should both stick to ABI.
Quuxplusone commented 6 years ago
I bisected this back to r202451, which makes it a regression from LLVM 3.4 to
3.5.  At that commit, the testcase throws a hard error:

    Impossible reg-to-reg copy
    UNREACHABLE executed at ../lib/Target/PowerPC/PPCInstrInfo.cpp:672!

This much was fixed soon after by r203041, and that gave me a clue.  Moving
that fix a few lines earlier also corrects the issue here in call stack args:

diff --git a/lib/Target/PowerPC/PPCISelLowering.cpp
b/lib/Target/PowerPC/PPCISelLowering.cpp
index 037c4b5de9d1..a903f431ccee 100644
--- a/lib/Target/PowerPC/PPCISelLowering.cpp
+++ b/lib/Target/PowerPC/PPCISelLowering.cpp
@@ -5463,10 +5463,11 @@ SDValue PPCTargetLowering::LowerCall_32SVR4(
       Arg = PtrOff;
     }

-    if (VA.isRegLoc()) {
-      if (Arg.getValueType() == MVT::i1)
-        Arg = DAG.getNode(ISD::ZERO_EXTEND, dl, MVT::i32, Arg);
+    // Promote booleans to 32-bit values.
+    if (Arg.getValueType() == MVT::i1)
+      Arg = DAG.getNode(ISD::ZERO_EXTEND, dl, MVT::i32, Arg);

+    if (VA.isRegLoc()) {
       seenFloatArg |= VA.getLocVT().isFloatingPoint();
       // Put argument in a physical register.
       RegsToPass.push_back(std::make_pair(VA.getLocReg(), Arg));

I haven't looked deeper at the problem in the function definition yet, but I
believe that will be in PPCTargetLowering::LowerFormalArguments_32SVR4.
Quuxplusone commented 6 years ago
I am here. (waving)

For convenience, I will attach the C++ PoC here:

Library:

    #include <iostream>
    extern "C" void strange_function(
        bool r3, bool r4, bool r5, bool r6, bool r7, bool r8, bool r9, bool r10,
        bool s1, bool s2
    ) {
        std::cout << r3 << r4 << r5 << r6 << r7 << r8 << r9 << r10 << s1 << s2 << std::endl;
    };

Main program:

    extern "C" void strange_function(
        bool r3, bool r4, bool r5, bool r6, bool r7, bool r8, bool r9, bool r10,
        bool s1, bool s2
    );
    int main() {
        strange_function(true, true, true, true, true, true, true, true, true, true);
    }

The expected result is '1111111111', but it printed '1111111100' on PPC, if
these two parts are compiled with -O0/-O1 vs. -O2/-O3.

The expected calling convention: Passing first 8 integer arguments by register
from r3 to r10, the stack part:

 |  XX XX XX XX <- r1 (a pointer to the previous frame, big-endian)
 |  00 00 00 01       (0x01 extended to 0x00000001 and stored in big-endian)
 V  00 00 00 01       (0x01 extended to 0x00000001 and stored in big-endian)
higher

But now the behaviour of LLVM is:

 |  XX XX XX XX <- r1 (a pointer to the previous frame, big-endian)
 |  01 ?? ?? ??       (stored 0x01 to r1+0x8)
 V  01 ?? ?? ??       (stored 0x01 to r1+0xC)
higher

It doesn't even clean the stack object and ignores 'zeroext' convention.

If a callee is using correct calling convention, it will read '0x??' and extend
to 32 bits, thus the program logic turns out to be an unknown behaviour.

Say we have a library for some Internet services, compiled with GCC.
start_server(Context *ctx, Address *addr, bool use_tls);

A frontend compiled with LLVM and -02(which is very common in release build)
called: start_server(.., .., true);
On callee side, use_tls can be true or false, depends on the background garbage
in the stack.

I checked i8 and i16, they don't have such issue.

> These problems do not arise with "llc -O0" or "-O1", which also means these
> two won't be bug-compatible if they're compiled with different optimization
> levels.

Yes and all conditions are as follow:
1. there are more than 8 integer arguments
2. using bool(i8) type in one of the 9th, 10th... arguments
3. optimisation level is >= 2
Quuxplusone commented 6 years ago
Minimum test:
#=======================================================================
target triple = "powerpc-unknown-linux-gnu"
define zeroext i1 @check_callee(
    i1 zeroext %r3, i1 zeroext %r4, i1 zeroext %r5, i1 zeroext %r6,
    i1 zeroext %r7, i1 zeroext %r8, i1 zeroext %r9, i1 zeroext %r10,
    i1 zeroext %s1, i1 zeroext %s2
) {
  call void @check_caller(
    i1 zeroext true, i1 zeroext true, i1 zeroext true, i1 zeroext true,
    i1 zeroext true, i1 zeroext true, i1 zeroext true, i1 zeroext true,
    i1 zeroext %s1, i1 zeroext %s2)
  ret i1 true
}
declare void @check_caller(
    i1 zeroext, i1 zeroext, i1 zeroext, i1 zeroext,
    i1 zeroext, i1 zeroext, i1 zeroext, i1 zeroext,
    i1 zeroext, i1 zeroext
)
#=======================================================================
$ llc test.ll -o /dev/stdout -ppc-asm-full-reg-names -mattr=-crbits
  => Good
$ llc test.ll -o /dev/stdout -ppc-asm-full-reg-names -mattr=+crbits
  => Bad
Quuxplusone commented 6 years ago

After checking the whole PPCISelLowering.cpp I wrote this patch and submitted a revision at here: https://reviews.llvm.org/D51108.

Quuxplusone commented 6 years ago

Fix committed on trunk: https://reviews.llvm.org/rL342288