bagel99 / llvm-my66000

This is a fork of the LLVM project. The code in branch my66000 supports Mitch Alsup's MY66000. The code in branch mcore supports the Motorola MCore.
http://llvm.org
Other
2 stars 2 forks source link

Hoisting with pointer expressions #45

Open tkoenig1 opened 1 year ago

tkoenig1 commented 1 year ago

Here's something I just noticed, a possible enhancement (so it won't be forgotten).

The functoins foo and bar are equivalent, but tranlated quite differently:

void foo (double a[], int i, int j, int n)
{
  int k;
    for (k=0; k<n; k++)
      a[i+k] = a[j+k] + 10.;
}

void bar (double a[], int i, int j, int n)
{
  int k;
  double *ai, *aj;

  ai = a + i;
  aj = a + j;
  for (k=0; k<n; k++)
      ai[k] = aj[k] + 10.;
}

foo is

foo:                                    ; @foo
; %bb.0:                                ; %entry
        ble0    r4,.LBB0_3
; %bb.1:                                ; %for.body.preheader
        srl     r4,r4,<32:0>
        mov     r5,#0
.LBB0_2:                                ; %for.body
                                        ; =>This Inner Loop Header: Depth=1
        vec     r6,{}
        add     r7,r5,r3
        ldd     r7,[r1,r7<<3,0]
        fadd    r7,r7,#10
        add     r8,r5,r2
        std     r7,[r1,r8<<3,0]
        loop    ne,r5,r4,#1
.LBB0_3:                                ; %for.end
        ret

and bar is

bar:                                    ; @bar
; %bb.0:                                ; %entry
        ble0    r4,.LBB1_3
; %bb.1:                                ; %for.body.preheader
        la      r2,[r1,r2<<3,0]
        la      r1,[r1,r3<<3,0]
        srl     r3,r4,<32:0>
        mov     r4,#0
.LBB1_2:                                ; %for.body
                                        ; =>This Inner Loop Header: Depth=1
        vec     r5,{}
        ldd     r6,[r1,r4<<3,0]
        fadd    r6,r6,#10
        std     r6,[r2,r4<<3,0]
        loop    ne,r4,r3,#1
.LBB1_3:                                ; %for.end
        ret

The issue appears to be recognizting a[i] and a[j] as expressions which can be hoisted out of the loop.

Compile script is

a=${1%%.[ci]}
b=${a}_opt
clang -fverbose-asm -c --target=my66000 -O3 -fno-vectorize -fno-slp-vectorize  -emit-llvm -fno-unroll-loops -fomit-frame-pointer $1
opt  -disable-loop-unrolling -O3  --march=my66000 --frame-pointer=none --enable-vvm $a.bc  > $b.bc
llc -O2 -enable-remove-range-check --disable-lsr --enable-predication --enable-predication2 --enable-carry-generation --early-carry-coalesce --enable-vvm -march=my66000 $b.bc
bagel99 commented 1 year ago

This has to do with disabling loop-strength-reduction. If LSR is enabled, one gets equivalent code for both examples. We need to find a way to enable some of LSR, but not the part that ruins VVM optimations.

tkoenig1 commented 1 year ago

This has to do with disabling loop-strength-reduction. If LSR is enabled, one gets equivalent code for both examples. We need to find a way to enable some of LSR, but not the part that ruins VVM optimations.

Hm, a vague idea (but I don't know about LLVM, so...)

It could be beneficial to run the VEC pass quite early, to detect opportunities. The vectorized loops could then be annotated so they can be excluded from optimizations like loop unrolling which are detrimental for VEC/LOOP. Loop unrolling from an outer loop could be done, though. Strength reduction within a VEC loop would then also be OK, as would all the normal strateges for loops that cannot be vectorized.

Does this sound at all reasonable?