Differences between `base+width*i` and `base+=width`

Quuxplusone commented 4 years ago


Bugzilla Link	PR43835
Status	NEW
Importance	P enhancement
Reported by	Roman Lebedev (lebedev.ri@gmail.com)
Reported on	2019-10-29 05:42:17 -0700
Last modified on	2021-08-16 12:24:27 -0700
Version	trunk
Hardware	PC Linux
CC	efriedma@quicinc.com, florian_hahn@apple.com, hfinkel@anl.gov, johannes@jdoerfert.de, listmail@philipreames.com, llvm-bugs@lists.llvm.org, max.kazantsev@azul.com
Fixed by commit(s)
Attachments
Blocks
Blocked by
See also	PR43836

https://godbolt.org/z/TR-s8A

#include <array>
#include <algorithm>

void sink(int* row);

void v0(int* data, int len, int width) {
    for(int i = 0; i < len; i++) {
        int *row = data + width*i;
        sink(row);
    }
}
void v1(int* row, int len, int width) {
    for(int i = 0; i < len; i++, row+=width) {
        sink(row);
    }
}

In first case we get

9:                                                ; preds = %9, %5
  %10 = phi i64 [ 0, %5 ], [ %13, %9 ]
  %11 = mul nsw i64 %10, %6
  %12 = getelementptr inbounds i32, i32* %0, i64 %11
  tail call void @_Z4sinkPi(i32* %12)
  %13 = add nuw nsw i64 %10, 1

but in second:

  %9 = phi i32 [ 0, %5 ], [ %11, %8 ]
  %10 = phi i32* [ %0, %5 ], [ %12, %8 ]
  tail call void @_Z4sinkPi(i32* %10)
  %11 = add nuw nsw i32 %9, 1
  %12 = getelementptr inbounds i32, i32* %10, i64 %6

I'd guess the second variant is better?
The difference is interesting.

Quuxplusone commented 4 years ago

Or more globally, this was the original pattern i was looking at:
https://godbolt.org/z/Bzn5Lz

#include <array>
#include <algorithm>

void sink(int* row, int* predRow);

void bad(int* data, int len, int width) {
    for(int i = 1; i < len; i++) {
        int *row = data + width*i;
        int *predRow = data + width*(i-1);
        sink(row, predRow);
    }
}
void good(int* data, int len, int width) {
    int* predRow = data;
    for(int i = 1; i < len; i++, data+=width, predRow+=width) {
        sink(data, predRow);
    }
}

In first case there are two gep's, in second there is an extra phi
and only one gep.

Quuxplusone commented 4 years ago

Interesting. I suppose that IndVarSimplify doesn't catch this. Which form is actually better, in some general sense, is target dependent (depending on whethr a target has various pre/post increment operations), but LSR won't create new PHIs, and that's another complication in this space.