SLP vectorizer cost model should handle divide by power-of-two constants

Quuxplusone commented 10 years ago


Bugzilla Link	PR20714
Status	RESOLVED FIXED
Importance	P normal
Reported by	Jim Grosbach (grosbach@apple.com)
Reported on	2014-08-20 15:28:36 -0700
Last modified on	2014-09-29 09:00:51 -0700
Version	trunk
Hardware	PC All
CC	hfinkel@anl.gov, llvm-bugs@lists.llvm.org, mcrosier@codeaurora.org
Fixed by commit(s)
Attachments
Blocks
Blocked by
See also

Inspired by the equivalent X86 problem being handled in
http://reviews.llvm.org/D4971

Consider:

void f(int *restrict a, int *restrict b, int *restrict c) {
  a[0] = (b[0] + c[0]) / 2;
  a[1] = (b[1] + c[1]) / 2;
  a[2] = (b[2] + c[2]) / 2;
  a[3] = (b[3] + c[3]) / 2;
}

ARM64 can handle this via:

    ldr  q0, [x0]
    ldr  q1, [x1]
    add.4s  v0, v0, v1
    usra.4s v0, v0, #31
    sshr.4s v0, v0, #1
    str  q0, [x2]

Instead we generate the long scalarized sequence:
    ldp  w8, w9, [x1]
    ldp  w10, w11, [x2]
    add  w8, w10, w8
    cmp  w8, #0                 ; =0
    cinc     w8, w8, lt
    asr w8, w8, #1
    str  w8, [x0]
    add  w8, w11, w9
    cmp  w8, #0                 ; =0
    cinc     w8, w8, lt
    asr w8, w8, #1
    str w8, [x0, #4]
    ldp w8, w9, [x1, #8]
    ldp w10, w11, [x2, #8]
    add  w8, w10, w8
    cmp  w8, #0                 ; =0
    cinc     w8, w8, lt
    asr w8, w8, #1
    str w8, [x0, #8]
    add  w8, w11, w9
    cmp  w8, #0                 ; =0
    cinc     w8, w8, lt
    asr w8, w8, #1
    str w8, [x0, #12]

Quuxplusone commented 10 years ago

Should have a patch up for review shortly.

Quuxplusone commented 10 years ago

http://reviews.llvm.org/D5469

Quuxplusone commented 10 years ago

Committed r218607.

Quuxplusone / LLVMBugzillaTest

SLP vectorizer cost model should handle divide by power-of-two constants #20713