Open Quuxplusone opened 7 years ago
Attached bug.bc
(1828 bytes, application/octet-stream): reproducer reduced from larger program
FYI: r287059 - Always use relative jump table encodings on PowerPC64.
Well, is it looping in the prologue or really in the epilogue? I don't understand how it could do the former, but there might be a clobber issue for the latter.
It's right at the top of the function. It is after the basic block #0 comment.
So, replace "in the prologue" with "immediately after the prologue".
.Lfunc_toc0: # @z0042
.quad .TOC.-.Lfunc_gep0
z0042:
.Lfunc_begin0:
.Lfunc_gep0:
ld 2, .Lfunc_toc0-.Lfunc_gep0(12)
add 2, 2, 12
.Lfunc_lep0:
.localentry z0042, .Lfunc_lep0-.Lfunc_gep0
# BB#0: # %L.entry
addis 3, 2, .LC0@toc@ha
bl .L0$pb
.L0$pb:
By the way, the test provided is executable. When assembled and linked, the a.out file will run forever.
% llc --version
LLVM (http://llvm.org/):
LLVM version 4.0.1
DEBUG build with assertions.
Default target: powerpc64le-unknown-linux-gnu
Host CPU: pwr8
Registered Targets:
ppc32 - PowerPC 32
ppc64 - PowerPC 64
ppc64le - PowerPC 64 LE
Does this still reproduce for you on ToT? I can't seem to reproduce it. In fact, there is no jump table in the produced assembly for the switch.
Furthermore, I'm not sure I really understand what the claimed problem is/was. Was it that the contents of the link register are incorrect so the bltlr
branches to the wrong place? Or was it that the branch wasn't taken when it should have been and then we loop forever in the loop within the test case? Or something entirely different?
Second question:
The code sequence
bl .L0$pb
.L0$pb:
ld 4, .LC0@toc@l(3)
loads the link register with the address of the ld instruction.
The code sequence
bltlr
branches to the link register if the condition is less than. The link register
that is 4 instructions previous. This forms a conditional loop.
If we look at the body of the loop, there are no instructions that can change
the condition, so if it is less than, it will always be less than in future
iterations.
First question:
I am not sure if this bug was fixed after release 4.0. It is possible.
A similar code gen issue happens in release_60. In the test application, we
have a call sequence as follows.
0x0000000010000ccc <+436>: bl 0x10000dc0
=> 0x0000000010000cd0 <+440>: nop
The function called (0x10000dc0) is:
0x0000000010000db8 <+0>: ld r2,-8(r12)
0x0000000010000dbc <+4>: add r2,r2,r12
=> 0x0000000010000dc0 <+8>: bl 0x10000dc4
0x0000000010000dc4 <+12>: mflr r6
So, we have a bl instruction jumping to another bl instruction, which will
immediately overwrite the link register from the first call. After that, the
application fails.
As before, reverting 883332301 fixes the bug.
(In reply to Eric Schweitz from comment #8)
A similar code gen issue happens in release_60. In the test application, we have a call sequence as follows.
0x0000000010000ccc <+436>: bl 0x10000dc0 => 0x0000000010000cd0 <+440>: nop
The function called (0x10000dc0) is:
0x0000000010000db8 <+0>: ld r2,-8(r12) 0x0000000010000dbc <+4>: add r2,r2,r12 => 0x0000000010000dc0 <+8>: bl 0x10000dc4 0x0000000010000dc4 <+12>: mflr r6
So, we have a bl instruction jumping to another bl instruction, which will immediately overwrite the link register from the first call. After that, the application fails.
Oh wow. There's something really wrong if the first instruction in the local entry point is an unconditional branch and link instruction. The loop makes sense to me now - we set the link register to the beginning of the prologue (0x0000000010000dc4), save it and presumably restore it so the blr
brings control back to 0x0000000010000dc4.
I'll try to reproduce this.
As before, reverting 883332301 fixes the bug. What does this number mean? Is this the same revision (relative jump table encodings)?
Unfortunately, I'm still unable to reproduce this. We should try to look into what the differences are between your environment and mine.
Just updated to r325786 on release_60 this morning, rebuilt, and get the same
bug. The build is on a Power8 machine wiht gcc 5.4.0 and the test is run on a
Power8 machine. There are no mods to the LLVM sources at all in this build.
% git remote -v
origin http://llvm.org/git/llvm.git (fetch)
origin http://llvm.org/git/llvm.git (push)
% git checkout release_60
% git pull origin release_60
% cmake ReleaseSixSourceDir -DLLVM_TARGETS_TO_BUILD=PowerPC -
DCMAKE_BUILD_TYPE=Release -DLLVM_ENABLE_TERMINFO=OFF -DCMAKE_EXE_LINKER_FLAGS="-
static-libgcc -static-libstdc++"
% make
% make install
Note again that this bug requires the large code model. If I remove that, the
test application will pass/run to correct completion.
Here are the updated LLVM tool commands.
% opt -O2 ./test.ll -S -o ./test.llvm
% llc ./test.llvm -mcpu=pwr8 -O2 -code-model=large -o ./test.s
% /usr/bin/as ./test.s -mpower8 -many -o test.o
PS: The number in the previous post is the git version hash that corresponds to
r287059 (see, e.g. Hal's comment, above).
PPS: Community editions of the PGI compilers are available for download.
It does look like I need a new reproducer, since the characteristics of the original have been tickled somehow over time.
Attached whittled.bc
(1888 bytes, application/octet-stream): reproducer for release_60
Ok, this time I can reproduce the failure. The problem is that shrink-wrapping doesn't see MovePCtoLR8
as a def of a CSR. It then freely inserts the prologue in a successor to entry. As a result, we end up clobbering the LR before we've saved it in the prologue, which is obviously bad.
I'm testing a fix and will update this PR tomorrow with some results.
Fix in https://reviews.llvm.org/D43677.
bug.bc
(1828 bytes, application/octet-stream)whittled.bc
(1888 bytes, application/octet-stream)