davidgiven / ack

The Amsterdam Compiler Kit
http://tack.sf.net
Other
420 stars 59 forks source link

Optimiser crash for -O3 #203

Closed davidgiven closed 4 years ago

davidgiven commented 5 years ago

MRTC:

void output(int* J)
{
    int i;
    for (; i; i=1)
    {
        while (J[i] == J[i+1])
            i++;
    }
}

Run with ack -mlinuxppc -O3 -c diff.c. This happens on all architectures I've found.

It looks like ego is producing bogus code, causing opt2 to crash:

...
 sli 4
 stl -20
 bra *3
 end 0
 bra *3
 ret 0

Having em instructions appear after an end can cause Very Bad Things™ to happen. Sometimes I've seen ego/ca crash (this converts from ego's intermediate form into compact bytecode). The code generated by opt looks absolutely fine.

davidgiven commented 5 years ago

For reference, opt's output is:

 mes 2,4,4
 exp $output
 pro $output,4
 mes 3,0,4,2,2
 mes 3,-4,4,0,6
 mes 3
 mes 9,4
6
 lol -4
 zeq *1
8
 lol 0
 lol -4
 inc 
 loc 2
 sli 4
 ads 4
 loi 4
 lol 0
 lol -4
 loc 2
 sli 4
 ads 4
 loi 4
 bne *4
 inl -4
 bra *8
4
 loc 1
 stl -4
 bra *6
1
 ret 0
 end 4
 mes 4,9,'diff.i\000'

and ego's output is:

 mes 2,4,4
 exp $output
 pro $output,0
 mes 3,-28,4,0,10000
 mes 3,-24,4,0,10000
 mes 3,-16,4,0,10000
 mes 3,-20,4,0,10000
 mes 3,-12,4,0,0
 mes 3,-8,4,0,0
 mes 3,0,4,2,0
 mes 3,-4,4,0,0
 mes 3
 mes 9,4
 lol 0
 stl -24
1
 loc 1
 zeq *2
 bra *5
6
 inl -28
 lol -20
 loc 4
 adi 4
 stl -20
 lol -16
 loc 4
 adi 4
 stl -16
3
 lol -24
 lol -16
 ads 4
 loi 4
 lol -24
 lol -20
 ads 4
 loi 4
 beq *6
4
 loc 1
 stl -28
 bra *1
2
5
 loc 1
 inc 
 loc 2
 sli 4
 stl -16
 loc 1
 loc 2
 sli 4
 stl -20
 bra *3
 end 0
 bra *3
 ret 0
davidgiven commented 5 years ago

Whatever's happening is happening in the sr pass, which does strength reduction. What's particularly interesting is that the code above is completely mangled --- there is no way out of that loop.

My immediate gut feeling is that it's getting confused by there being two loops which use the same induction variable (i in this case)... but this code hasn't been touched for years; surely someone else should have hit this by now?

kernigh commented 4 years ago

This example removes the pointer J but still crashes:

void output(int i)
{
    while (i = 1)
    {
        while ((3 * i) & (5 * i))
            i++;
    }
}

I edited O3phases in util/ego/em_ego/em_ego.c so -O3 only runs the SR phase, and I added -DTRACE to cflags in both util/ego/build.lua and util/ego/share/build.lua. I put ACKM=linuxppc in the environment. Then the command ack -Rego-V -O3 -t diff.c reproduces the crash, while also showing extra output (trace and verbose) and keeping intermediate files.

This diff turns off any actual strength reduction, but doesn't prevent the crash:

diff --git a/util/ego/sr/sr_reduce.c b/util/ego/sr/sr_reduce.c
index c78e456a1..1c43c390a 100644
--- a/util/ego/sr/sr_reduce.c
+++ b/util/ego/sr/sr_reduce.c
@@ -517,6 +517,7 @@ STATIC void reduce(code,vars)
        remcode(code);
    } else {
        make_header(code->co_loop);
+#if 0
        /* make sure there's a header block */
        tmp = tmplocal(curproc,(offset) code->co_tmpsize);
        code->co_temp = tmp;
@@ -538,6 +539,7 @@ STATIC void reduce(code,vars)
        incr_code(code,tmp); /* emit code to increment temp. local */
        OUTTRACE("emitted increment code",0);
        Ladd(code,&avail);
+#endif
        fix_header(code->co_loop);
    }
 }

With this diff, ego only renumbers some labels and tries to insert an extra basic block (but the wrong insertion causes the bug). I show this by decoding and comparing diff.m (before ego) with diff.gk (after ego). The input is the diff.c using (3 * i) & (5 * i).

$ ack -Rego-V -O3 -t diff.c
$ ack -c.e -o diff1.e diff.m
$ ack -c.e -o diff2.e diff.gk
$ diff -u diff[12].e
--- diff1.e     Wed Nov  6 22:20:17 2019
+++ diff2.e     Wed Nov  6 22:20:22 2019
@@ -4,12 +4,13 @@
  mes 3,0,4,0,5
  mes 3
  mes 9,4
-4
+1
  loc 1
  dup 4
  stl 0
- zeq *1
-7
+ zeq *2
+ bra *4
+3
  lol 0
  loc 3
  mli 4
@@ -17,10 +18,12 @@
  loc 5
  mli 4
  and 4
- zeq *4
+ zeq *1
  inl 0
- bra *7
-1
- ret 0
+ bra *3
+2
+4
+ bra *3
  end 0
- mes 4,8,'diff.i\000'
+ bra *3
+ ret 0

The lab 4 bra *3 near the bottom is a new basic block. It would run from bra *4 lab 3 to begin each iteration of the while loop. The code around the new block is a mess; it should have lab 2 ret 0, but wrongly splits the return from the label.

If I disable the calls to make_header() and fix_header(), then it prevent the crash. The bug might be in make_header() or fix_header().

kernigh commented 4 years ago

I wanted to check some assertions in fix_header(), but when I enabled assertions (with -DDEBUG in the same 2 places as -DDTRACE), I got

assertion "sizeof(int) == sizeof(offset)" failed: file "util/ego/share/put.c", l
ine 116, function "outint"
ego: /home/kernigh/park/ack-build/staging/lib/ack/ego/cf got a unix signal
kernigh commented 4 years ago

I changed the outer while loop to an if; it still crashes:

void output(int i)
{
    if (i) {
        while ((3 * i) & (5 * i))
            i++;
    }
}
$ ack -O3 -t diff.c                                                  
opt2: error on line 44: This is not allowed outside a procedure
ack: /home/kernigh/park/ack-build/staging/lib/ack/em_opt2 died with signal 6

This ack is from default branch (9cee18f), with no local changes except in my Makefile configuration, so -O3 is running its usual phases (not only SR), and tracing is off.

There is a problem util/ego/sr/sr_reduce.c fix_header(). The code should move the end pseudo to the new last basic block, but it may do so more than once, so it may wrongly move another line to after the end. The wrong move should fail assert(INSTR(e) == ps_end), but assertions are disabled.

The strength reduction (SR) phase looks for left shifts or multiplications using a loop variable. J[i] in C becomes something like *(int i)((char *)J + (i << 2) in EM, so SR may find the left shift. The crash happens after SR finds 2 different left shifts or multiplications for the same variable in the same loop. J[i] and J[i + 1]would use 2 different left shiftsi << 2and(i + 1) << 2`.

SR would optimize left shifts or multiplications:

                |   int j = 3 * i;
                |   int k = 5 * i;
while ((3 * i) & (5 * i))   |   while (j & k) {
    i++;            |       i++;
                |       j += 3;
                |       k += 5;
                |   }

This optimization needs a new "header" to initilialize j = 3 i and k = 5 i. The header is a new basic block. make_header() appends it to the current procedure. fix_header() needs to add a bra (unconditional branch) from the header to the beginning of the loop, and move the end pseudo from the previous basic block to the header.

With 2 different multiplications (3 i and 5 i), SR runs make_header() and fix_header() twice. A loop can only have one header, so make_header() doesn't make a second header, but fix_header() tries to move the end a second time, so it moves something else after end.