Quuxplusone / LLVMBugzillaTest

0 stars 0 forks source link

[aarch64] llvm.experimental.reduce.{and, any} don't lower properly for boolean vectors #40605

Open Quuxplusone opened 5 years ago

Quuxplusone commented 5 years ago
Bugzilla Link PR41635
Status NEW
Importance P enhancement
Reported by Simon Pilgrim (llvm-dev@redking.me.uk)
Reported on 2019-04-28 03:28:42 -0700
Last modified on 2021-07-17 01:40:46 -0700
Version trunk
Hardware PC Windows NT
CC a.bataev@hotmail.com, arnaud.degrandmaison@arm.com, gonzalo.gadeschi@gmail.com, llvm-bugs@lists.llvm.org, nikita.ppv@gmail.com, smithp352@googlemail.com, Ties.Stuij@arm.com
Fixed by commit(s) rL360054
Attachments
Blocks
Blocked by
See also PR36702, PR38841, PR51122
Split off from [Bug #36702], AARCH64 generates poor code for boolean reduction
from generic IR - either with the llvm.experimental.vector.reduce intrinsics
(which expand to a shuffle reduction chain) or with bitcasts of the comparison
result mask:

https://godbolt.org/z/wchsJ6

e.g.

## AArch64

LLVM6:

all_8x8:
 ldr     d0, [x0]
 umov    w8, v0.b[0]
 umov    w9, v0.b[1]
 tst     w8, #0xff
 umov    w10, v0.b[2]
 cset    w8, ne
 tst     w9, #0xff
 cset    w9, ne
 tst     w10, #0xff
 umov    w10, v0.b[3]
 and     w8, w8, w9
 cset    w9, ne
 tst     w10, #0xff
 umov    w10, v0.b[4]
 and     w8, w9, w8
 cset    w9, ne
 tst     w10, #0xff
 umov    w10, v0.b[5]
 and     w8, w9, w8
 cset    w9, ne
 tst     w10, #0xff
 umov    w10, v0.b[6]
 and     w8, w9, w8
 cset    w9, ne
 tst     w10, #0xff
 umov    w10, v0.b[7]
 and     w8, w9, w8
 cset    w9, ne
 tst     w10, #0xff
 and     w8, w9, w8
 cset    w9, ne
 and     w0, w9, w8
 ret
any_8x8:
 ldr     d0, [x0]
 umov    w8, v0.b[0]
 umov    w9, v0.b[1]
 orr     w8, w8, w9
 umov    w9, v0.b[2]
 orr     w8, w8, w9
 umov    w9, v0.b[3]
 orr     w8, w8, w9
 umov    w9, v0.b[4]
 orr     w8, w8, w9
 umov    w9, v0.b[5]
 orr     w8, w8, w9
 umov    w9, v0.b[6]
 orr     w8, w8, w9
 umov    w9, v0.b[7]
 orr     w8, w8, w9
 tst     w8, #0xff
 cset    w0, ne
 ret

Manually generated:

all_8x8:
 ldr     d0, [x0]
 mov     v0.d[1], v0.d[0]
 uminv   b0, v0.16b
 fmov    w8, s0
 tst     w8, #0xff
 cset    w0, ne
 ret
any_8x8:
 ldr     d0, [x0]
 mov     v0.d[1], v0.d[0]
 umaxv   b0, v0.16b
 fmov    w8, s0
 tst     w8, #0xff
 cset    w0, ne
 ret
Quuxplusone commented 5 years ago

As a target-independent transform, we should convert i1 VECREDUCE_AND and VECREDUCE_OR into VECREDUCE_UMIN and VECREDUCE_UMAX respectively, if it's favorable legality-wise.

Not sure where the right place to do that would be though.

Quuxplusone commented 5 years ago

Candidate Patch: https://reviews.llvm.org/D61398

Quuxplusone commented 5 years ago

The vector.reduce cases are fixed by https://reviews.llvm.org/rL360054. The bitcast part is still open.

Quuxplusone commented 3 years ago

Ping, there is https://reviews.llvm.org/D97961, which changes the default cost for the logical reductions.