Quuxplusone / LLVMBugzillaTest

0 stars 0 forks source link

[SVE] Don't generate whilelo instruction to enhance the loop condition #51353

Open Quuxplusone opened 3 years ago

Quuxplusone commented 3 years ago
Bugzilla Link PR52386
Status NEW
Importance P enhancement
Reported by zhongyunde (zhongyunde@huawei.com)
Reported on 2021-11-03 08:58:59 -0700
Last modified on 2021-11-15 11:05:43 -0800
Version trunk
Hardware PC Windows NT
CC arnaud.degrandmaison@arm.com, efriedma@quicinc.com, llvm-bugs@lists.llvm.org, smithp352@googlemail.com, Ties.Stuij@arm.com
Fixed by commit(s)
Attachments
Blocks
Blocked by
See also
case from https://reviews.llvm.org/D91077, with this simple case, the newest
clang will not generate whilelo insn for SVE to control the loop condition
(https://godbolt.org/z/hTf5bnhxW, used normal compare insn), while gcc can
(https://godbolt.org/z/xq4vdEYPY, used whilelo insn).

void loop(int N, double *a, double *b) {
  #pragma clang loop vectorize_width(4, scalable)
  for (int i = 0; i < N; i++) {
    a[i] = b[i] + 1.0;
  }
}
Quuxplusone commented 3 years ago

The vectorized loop currently doesn't use predication at all.

Not sure we really need a bug report to track the implementation of predicated loop bodies.

Quuxplusone commented 3 years ago
But I see the kernel loop body in https://godbolt.org/z/hTf5bnhxW already use
predication. Am I missing something ?

.LBB0_9:                                // =>This Inner Loop Header: Depth=1
        ld1w    { z0.s }, p0/z, [x2, x11, lsl #2]
        add     z0.s, z0.s, #1                  // =0x1
        st1w    { z0.s }, p0, [x1, x11, lsl #2]
Quuxplusone commented 3 years ago
O, I known.

there's branch to test whether the remainder loop need be executed , so it
doesn't use predication

.LBB0_9:                                // =>This Inner Loop Header: Depth=1
        ld1w    { z0.s }, p0/z, [x2, x11, lsl #2]
        add     z0.s, z0.s, #1                  // =0x1
        st1w    { z0.s }, p0, [x1, x11, lsl #2]
        add     x11, x11, x9
        cmp     x11, x10
        b.ne    .LBB0_9
        cbnz    x12, .LBB0_5   ###   test the remainder loop count
        b       .LBB0_7
Quuxplusone commented 3 years ago

A bunch of SVE instructions unconditionally reference a predicate register. But the loop isn't really "predicated"; p0 is just an all-ones value generated by ptrue.

vfdff commented 2 years ago

very early stage for the Concept, it is not complete solution https://reviews.llvm.org/D99750

Quuxplusone commented 2 years ago

@vfdff: You meant to comment on https://github.com/llvm/llvm-project/issues/51728, not here! (I'll lock this repo to prevent such mistakes in the future.)