Quuxplusone / LLVMBugzillaTest

0 stars 0 forks source link

[IPRA] wrong code #35560

Closed Quuxplusone closed 5 years ago

Quuxplusone commented 6 years ago
Bugzilla Link PR36587
Status RESOLVED FIXED
Importance P enhancement
Reported by Jonas Paulsson (paulsson@linux.vnet.ibm.com)
Reported on 2018-03-03 02:03:47 -0800
Last modified on 2018-12-05 01:07:18 -0800
Version trunk
Hardware PC Linux
CC francisvm@yahoo.com, hfinkel@anl.gov, llvm-bugs@lists.llvm.org, paulsson@linux.vnet.ibm.com, quentin.colombet@gmail.com, uweigand@de.ibm.com, vivekvpandya@gmail.com
Fixed by commit(s)
Attachments tc_ipra_reduced.ll (26682 bytes, text/plain)
Blocks
Blocked by
See also
Created attachment 19992
reduced testcase

It seems that enabling interprocedural register allocation with (reduced)
csmith program gives wrong code:

bin/llc -mtriple=s390x-linux-gnu -mcpu=z13  tc_ipra_reduced.ll -o tc.s -enable-
ipra
bin/clang -O3 -march=z13 tc.s -o a.out -w; ./a.out
checksum = 0

bin/llc -mtriple=s390x-linux-gnu -mcpu=z13  tc_ipra_reduced.ll -o tc.s
bin/clang -O3 -march=z13 tc.s -o a.out -w; ./a.out
checksum = 5
(checksum = 5 also with gcc -O0 and more)

At first glance, it seems that while fn2() actually uses %r0 (which is caller
saved), ipra (right) seems to result in %r0 being live accross the call to
fn2():

                                        IPRA

        lgrl    %r0, q       |          llihl   %r0, 0
        lgrl    %r1, o       |          stgrl   %r1, n
        larl    %r13, a      |          stg     %r1, 0(%r2)
        lhi     %r12, 0      <
        stgrl   %r0, n       <
        stg     %r0, 0(%r1)  <
        brasl   %r14, fn2@PL            brasl   %r14, fn2@PL
        larl    %r1, f                  larl    %r1, f
        chsi    0(%r1), 7               chsi    0(%r1), 7
        lgrl    %r1, m                  lgrl    %r1, m
        lochie  %r12, 1      |          lochhie %r0, 1
        st      %r12, 0(%r1) |          stfh    %r0, 0(%r1)

To the left, %r12 is used by first setting it to 0, then after fn2()
conditionally set it to 1, and then store it to @m.

To the right, %r0 is used the exact same way, which is incorrect since fn2()
changes its value.
Quuxplusone commented 6 years ago

Attached tc_ipra_reduced.ll (26682 bytes, text/plain): reduced testcase

Quuxplusone commented 6 years ago
The clobber information looks right on the collector side:
 -------------------- Register Usage Information Collector Pass --------------------
Function Name : fn2
Clobbered Registers: $cc $r0d $r1d $r2d $r3d $r4d $r0h $r1h $r2h $r3h $r4h $r0l
$r1l $r2l $r3l $r4l $r0q $r2q $r4q

On the propagation side, the output is truncated so that’s hard to tell:

Call Instruction Before Register Usage Info Propagation :
CallBRASL @fn2, <regmask $f8d $f9d $f10d $f11d $f12d $f13d $f14d $f15d $f8q
$f9q $f12q $f13q $f8s $f9s $f10s $f11s $f12s $f13s $f14s $f15s $r6d $r7d $r8d
$r9d $r10d $r11d $r12d $r13d $r14d $r15d $r6h $r7h $r8h and 22 more...>,
implicit-def dead $r14d, implicit-def dead $cc, implicit-def $r2d

Call Instruction After Register Usage Info Propagation : CallBRASL @fn2,
<regmask $noreg $a0 $a1 $a2 $a3 $a4 $a5 $a6 $a7 $a8 $a9 $a10 $a11 $a12 $a13
$a14 $a15 $c0 $c1 $c2 $c3 $c4 $c5 $c6 $c7 $c8 $c9 $c10 $c11 $c12 $c13 $c14 $c15
and 142 more...>, implicit-def dead $r14d, implicit-def dead $cc, implicit-def
$r2d
Quuxplusone commented 6 years ago
Printing the full preserved list for fn2 looks correct as well:
Call Instruction After Register Usage Info Propagation : CallBRASL @fn2,
<regmask $noreg $a0 $a1 $a2 $a3 $a4 $a5 $a6 $a7 $a8 $a9 $a10 $a11 $a12 $a13
$a14 $a15 $c0 $c1 $c2 $c3 $c4 $c5 $c6 $c7 $c8 $c9 $c10 $c11 $c12 $c13 $c14 $c15
$v0 $v1 $v2 $v3 $v4 $v5 $v6 $v7 $v8 $v9 $v10 $v11 $v12 $v13 $v14 $v15 $v16 $v17
$v18 $v19 $v20 $v21 $v22 $v23 $v24 $v25 $v26 $v27 $v28 $v29 $v30 $v31 $f0d $f1d
$f2d $f3d $f4d $f5d $f6d $f7d $f8d $f9d $f10d $f11d $f12d $f13d $f14d $f15d
$f16d $f17d $f18d $f19d $f20d $f21d $f22d $f23d $f24d $f25d $f26d $f27d $f28d
$f29d $f30d $f31d $f0q $f1q $f4q $f5q $f8q $f9q $f12q $f13q $f0s $f1s $f2s $f3s
$f4s $f5s $f6s $f7s $f8s $f9s $f10s $f11s $f12s $f13s $f14s $f15s $f16s $f17s
$f18s $f19s $f20s $f21s $f22s $f23s $f24s $f25s $f26s $f27s $f28s $f29s $f30s
$f31s $r5d $r6d $r7d $r8d $r9d $r10d $r11d $r12d $r13d $r14d $r15d $r5h $r6h
$r7h $r8h $r9h $r10h $r11h $r12h $r13h $r14h $r15h $r5l $r6l $r7l $r8l $r9l
$r10l $r11l $r12l $r13l $r14l $r15l $r6q $r8q $r10q $r12q $r14q>, implicit-def
dead $r14d, implicit-def dead $cc, implicit-def $r2d

Indeed, r0 is not in the list of the preserved registers as far as I can tell.
Quuxplusone commented 6 years ago
Scratch that!
The output I gave was actually with a fix I was trying in the collector code.
Basically, it looks like one of the assumption made in that code is not true
for SystemZ:
In lib/CodeGen/RegUsageInfoCollector.cpp one can read:
  // Scan all the physical registers. When a register is defined in the current
  // function set it and all the aliasing registers as defined in the regmask.
  for (unsigned PReg = 1, PRegE = TRI->getNumRegs(); PReg < PRegE; ++PReg) {
    // If a register is in the UsedPhysRegsMask set then mark it as defined.
    // All it's aliases will also be in the set, so we can skip setting
    // as defined all the aliases here.
    if (UsedPhysRegsMask.test(PReg)) {
      SetRegAsDefined(PReg);
      continue;
    }
[...]

The continue in the if block should be removed if the getUsedPhysRegsMask
doesn't cover all the aliases of PReg and it looks like this is what is
happening here.
Quuxplusone commented 6 years ago
Looking at the comment on MachineRegisterInfo, it seems the mask must have the
aliases set:
  /// UsedPhysRegMask - Additional used physregs including aliases.
  /// This bit vector represents all the registers clobbered by function calls.

Jonas, looks like something is not updating the UsedPhysRegMask correctly.
I.e., r0d is marked as clobbered, so as r0h but not r0l.
Could you double check on your side?

Side note: would be nice to catch that in the MachineVerifier so that it is
obvious when things go wrong!
Quuxplusone commented 6 years ago
(In reply to Quentin Colombet from comment #4)
> Looking at the comment on MachineRegisterInfo, it seems the mask must have
> the aliases set:
>   /// UsedPhysRegMask - Additional used physregs including aliases.
>   /// This bit vector represents all the registers clobbered by function
> calls.
>
> Jonas, looks like something is not updating the UsedPhysRegMask correctly.
> I.e., r0d is marked as clobbered, so as r0h but not r0l.
> Could you double check on your side?
>

I believe the error is in RegUsageInfoCollector.cpp. With this test case:

@fn1 (no calls)   : defines R0L.
                  : SetRegAsDefined(*AI):  R0L, R0D, R0Q
@fn2 (calls @fn1) : defines R0L and R0D
                  : SetRegAsDefined(PReg): R0D.  The R0H alias is skipped!
main (calls @fn2) : has a live-range across call to @fn2 that gets assigned R0H

It is not true that UsedPhysRegMask will contain all aliases of all registers
of the defined register, and in this case R0L adds R0D, but *not R0H* (correct
for @fn1). While processing @fn2, R0D is already in the list, but R0H is not.
Therefore R0H must be added, which is currently assumed it must not (wrong).

Proposed patch: https://reviews.llvm.org/D45157

> Side note: would be nice to catch that in the MachineVerifier so that it is
> obvious when things go wrong!

I certainly agree :-)
Proposed patch: https://reviews.llvm.org/D45156
Quuxplusone commented 5 years ago

@Jonas, is this fixed by r331509?

Quuxplusone commented 5 years ago
(In reply to Francis Visoiu Mistrih from comment #6)
> @Jonas, is this fixed by r331509?

Hi Francis,

as I remember it, r331509 should have fixed one of the issues I found, while
not fixing everything related to IPRA...

I reran the test case and at least it is giving the right checksum, so I think
we can close this one now.

Thanks,

Jonas