Quuxplusone / LLVMBugzillaTest

0 stars 0 forks source link

PowerPC - miscompile yields infinite looping when using -code-model=large #32519

Open Quuxplusone opened 7 years ago

Quuxplusone commented 7 years ago
Bugzilla Link PR33547
Status CONFIRMED
Importance P enhancement
Reported by Eric Schweitz (eschweitz@nvidia.com)
Reported on 2017-06-21 09:52:51 -0700
Last modified on 2018-02-23 07:22:02 -0800
Version 4.0
Hardware PC Windows NT
CC hfinkel@anl.gov, joerg@NetBSD.org, kit.barton@gmail.com, llvm-bugs@lists.llvm.org, nemanja.i.ibm@gmail.com
Fixed by commit(s)
Attachments bug.bc (1828 bytes, application/octet-stream)
whittled.bc (1888 bytes, application/octet-stream)
Blocks
Blocked by
See also
Created attachment 18684
reproducer reduced from larger program

This bug appears in the release_40 branch on PowerPC.

To reproduce:

% opt -O2 bug.bc -o bug.opt.bc
% llc bug.opt.bc -mcpu=native -code-model=large -o bug.s

In the prologue for z0042(), there should be the following sequence

# BB#0:                                 # %L.entry
        addis 3, 2, .LC0@toc@ha
        bl .L0$pb
.L0$pb:
        ld 4, .LC0@toc@l(3)
        mflr 3
        lwz 4, 0(4)
        cmpwi    4, 1
        bltlr 0

which is a call from the prologue to the next instruction. This call sends the
program into an infinite loop.

The reproducer has reduced from a much larger code to isolate this specific
code gen pattern.

The SVN commit that results in this failure was 287059. Reverting that
changeset eliminates the infinite loop and corrects the test application.
Quuxplusone commented 7 years ago

Attached bug.bc (1828 bytes, application/octet-stream): reproducer reduced from larger program

Quuxplusone commented 7 years ago

FYI: r287059 - Always use relative jump table encodings on PowerPC64.

Quuxplusone commented 7 years ago

Well, is it looping in the prologue or really in the epilogue? I don't understand how it could do the former, but there might be a clobber issue for the latter.

Quuxplusone commented 7 years ago
It's right at the top of the function. It is after the basic block #0 comment.
So, replace "in the prologue" with "immediately after the prologue".

.Lfunc_toc0:                            # @z0042
        .quad   .TOC.-.Lfunc_gep0
z0042:
.Lfunc_begin0:
.Lfunc_gep0:
        ld 2, .Lfunc_toc0-.Lfunc_gep0(12)
        add 2, 2, 12
.Lfunc_lep0:
        .localentry     z0042, .Lfunc_lep0-.Lfunc_gep0
# BB#0:                                 # %L.entry
        addis 3, 2, .LC0@toc@ha
        bl .L0$pb
.L0$pb:
Quuxplusone commented 7 years ago

By the way, the test provided is executable. When assembled and linked, the a.out file will run forever.

Quuxplusone commented 7 years ago
% llc --version
LLVM (http://llvm.org/):
  LLVM version 4.0.1
  DEBUG build with assertions.
  Default target: powerpc64le-unknown-linux-gnu
  Host CPU: pwr8

  Registered Targets:
    ppc32   - PowerPC 32
    ppc64   - PowerPC 64
    ppc64le - PowerPC 64 LE
Quuxplusone commented 6 years ago

Does this still reproduce for you on ToT? I can't seem to reproduce it. In fact, there is no jump table in the produced assembly for the switch.

Furthermore, I'm not sure I really understand what the claimed problem is/was. Was it that the contents of the link register are incorrect so the bltlr branches to the wrong place? Or was it that the branch wasn't taken when it should have been and then we loop forever in the loop within the test case? Or something entirely different?

Quuxplusone commented 6 years ago
Second question:

The code sequence

        bl .L0$pb
.L0$pb:
        ld 4, .LC0@toc@l(3)

loads the link register with the address of the ld instruction.

The code sequence

        bltlr

branches to the link register if the condition is less than. The link register
that is 4 instructions previous. This forms a conditional loop.

If we look at the body of the loop, there are no instructions that can change
the condition, so if it is less than, it will always be less than in future
iterations.

First question:

I am not sure if this bug was fixed after release 4.0. It is possible.
Quuxplusone commented 6 years ago
A similar code gen issue happens in release_60.  In the test application, we
have a call sequence as follows.

   0x0000000010000ccc <+436>:   bl      0x10000dc0
=> 0x0000000010000cd0 <+440>:   nop

The function called (0x10000dc0) is:

   0x0000000010000db8 <+0>:     ld      r2,-8(r12)
   0x0000000010000dbc <+4>:     add     r2,r2,r12
=> 0x0000000010000dc0 <+8>:     bl      0x10000dc4
   0x0000000010000dc4 <+12>:    mflr    r6

So, we have a bl instruction jumping to another bl instruction, which will
immediately overwrite the link register from the first call. After that, the
application fails.

As before, reverting 883332301 fixes the bug.
Quuxplusone commented 6 years ago

(In reply to Eric Schweitz from comment #8)

A similar code gen issue happens in release_60. In the test application, we have a call sequence as follows.

0x0000000010000ccc <+436>: bl 0x10000dc0 => 0x0000000010000cd0 <+440>: nop

The function called (0x10000dc0) is:

0x0000000010000db8 <+0>: ld r2,-8(r12) 0x0000000010000dbc <+4>: add r2,r2,r12 => 0x0000000010000dc0 <+8>: bl 0x10000dc4 0x0000000010000dc4 <+12>: mflr r6

So, we have a bl instruction jumping to another bl instruction, which will immediately overwrite the link register from the first call. After that, the application fails.

Oh wow. There's something really wrong if the first instruction in the local entry point is an unconditional branch and link instruction. The loop makes sense to me now - we set the link register to the beginning of the prologue (0x0000000010000dc4), save it and presumably restore it so the blr brings control back to 0x0000000010000dc4.

I'll try to reproduce this.

As before, reverting 883332301 fixes the bug. What does this number mean? Is this the same revision (relative jump table encodings)?

Quuxplusone commented 6 years ago

Unfortunately, I'm still unable to reproduce this. We should try to look into what the differences are between your environment and mine.

Quuxplusone commented 6 years ago
Just updated to r325786 on release_60 this morning, rebuilt, and get the same
bug.  The build is on a Power8 machine wiht gcc 5.4.0 and the test is run on a
Power8 machine.  There are no mods to the LLVM sources at all in this build.

% git remote -v
origin  http://llvm.org/git/llvm.git (fetch)
origin  http://llvm.org/git/llvm.git (push)
% git checkout release_60
% git pull origin release_60

% cmake ReleaseSixSourceDir -DLLVM_TARGETS_TO_BUILD=PowerPC -
DCMAKE_BUILD_TYPE=Release -DLLVM_ENABLE_TERMINFO=OFF -DCMAKE_EXE_LINKER_FLAGS="-
static-libgcc -static-libstdc++"
% make
% make install

Note again that this bug requires the large code model. If I remove that, the
test application will pass/run to correct completion.

Here are the updated LLVM tool commands.

% opt -O2 ./test.ll -S -o ./test.llvm
% llc ./test.llvm -mcpu=pwr8 -O2 -code-model=large -o ./test.s
% /usr/bin/as ./test.s -mpower8 -many -o test.o

PS: The number in the previous post is the git version hash that corresponds to
r287059 (see, e.g. Hal's comment, above).

PPS: Community editions of the PGI compilers are available for download.
Quuxplusone commented 6 years ago

It does look like I need a new reproducer, since the characteristics of the original have been tickled somehow over time.

Quuxplusone commented 6 years ago

Attached whittled.bc (1888 bytes, application/octet-stream): reproducer for release_60

Quuxplusone commented 6 years ago

Ok, this time I can reproduce the failure. The problem is that shrink-wrapping doesn't see MovePCtoLR8 as a def of a CSR. It then freely inserts the prologue in a successor to entry. As a result, we end up clobbering the LR before we've saved it in the prologue, which is obviously bad.

I'm testing a fix and will update this PR tomorrow with some results.

Quuxplusone commented 6 years ago

Fix in https://reviews.llvm.org/D43677.