GEP scaling not combined with explicit scaling


Bugzilla Link	PR14173
Status	NEW
Importance	P enhancement
Reported by	Duncan Sands (baldrick@free.fr)
Reported on	2012-10-25 03:26:42 -0700
Last modified on	2012-10-26 02:51:39 -0700
Version	trunk
Hardware	PC Linux
CC	geek4civic@gmail.com, llvm-bugs@lists.llvm.org, pawel@32bitmicro.com, rafael@espindo.la
Fixed by commit(s)
Attachments
Blocks
Blocked by
See also

Consider compiling this on x86-64 with clang:

int foo(int *A, int N) {
  return A[2*N];
}

It multiplies N by 8 and offsets the pointer by that many bytes.  The
offset consists of an explicit multiplication by 2, and an implicit
multiplication by 4 (aka sizeof(int)) via the array access.  However
these two multiplications aren't combined into one in the final assembler:

The LLVM IR is

define i32 @foo(i32* nocapture %A, i32 %N) nounwind uwtable readonly {
entry:
  %mul = shl nsw i32 %N, 1
  %idxprom = sext i32 %mul to i64
  %arrayidx = getelementptr inbounds i32* %A, i64 %idxprom
  %0 = load i32* %arrayidx, align 4
  ret i32 %0
}

which codegens to

        addl    %esi, %esi           ; multiply by 2
        movslq  %esi, %rax
        movl    (%rdi,%rax,4), %eax  ; multiply by 4
        ret

The LLVM IR could be turned into

define i32 @foo(i32* nocapture %A, i32 %N) nounwind uwtable readonly {
entry:
  %Nprom = sext i32 %N to i64
  %mul = shl nsw i64 %Nprom, 1
  %arrayidx = getelementptr inbounds i32* %A, i64 %mul
  %0 = load i32* %arrayidx, align 4
  ret i32 %0
}

(the transform is correct because of the nsw flag on the multiplication) which
codegens to

        movslq  %esi, %rax
        movl    (%rdi,%rax,8), %eax ; multiply by 8
        ret

There are several possible strategies for fixing this.

(1) At the IR level, when a computation is feeding a GEP as an offset, could try
to promote the entire expression to intptrtype, producing the second example of
LLVM IR above.
(2) At the IR level, when an expression is feeding a GEP, could try to pull any
*constant* factors through casts like the sext, rather than arbitrary factors.
(3) Same kind of thing as (1) or (2) but during CodeGenPrepare.
(4) Same kind of thing as (1) or (2) but as a DAG combine.

Currently I'm leaning towards the following solution: instcombine pulls
expressions feeding a GEP offset through any sign extensions, as long as
it doesn't increase the number of sign extensions.  This assumes that
multiplying in a pointer-sized integer (eg i64) is just as efficient as
multiplying in a smaller sized integer (eg i32).  Is this true?

For example:

%m = mul nsw i32 %x, %y
%e = sext i32 %m to i64
GEP..., i64 %e

In this case pulling the mul through the sign extension would give

%xe = sext i32 %x to i64
%ye = sext i32 %y to i64
%m = mul nsw i64 %xe, %ye
GEP..., i64 %m

increasing the number of sign extensions, so wouldn't be done.

On the other hand consider

%m = mul nsw i32 %x, 42
%e = sext i32 %m to i64
GEP..., i64 %e

In this case pulling the mul through the sign extension would give

%xe = sext i32 %x to i64
%m = mul nsw i64 %xe, 42
GEP..., i64 %m

with no additional sign extensions, so would be considered good to go.

Quuxplusone / LLVMBugzillaTest

GEP scaling not combined with explicit scaling #14202