Quuxplusone / LLVMBugzillaTest

0 stars 0 forks source link

[arm][neon] llvm.experimental.reduce.{and, any} don't lower properly for boolean vectors #40606

Open Quuxplusone opened 5 years ago

Quuxplusone commented 5 years ago
Bugzilla Link PR41636
Status NEW
Importance P enhancement
Reported by Simon Pilgrim (llvm-dev@redking.me.uk)
Reported on 2019-04-28 03:30:48 -0700
Last modified on 2021-03-19 10:43:11 -0700
Version trunk
Hardware PC Windows NT
CC a.bataev@hotmail.com, gonzalo.gadeschi@gmail.com, llvm-bugs@lists.llvm.org, smithp352@googlemail.com, Ties.Stuij@arm.com
Fixed by commit(s)
Attachments
Blocks
Blocked by
See also PR36702, PR38842
Split off from [Bug #36702], armv7a generates poor code for boolean reduction
from generic IR - either with the llvm.experimental.vector.reduce intrinsics
(which expand to a shuffle reduction chain) or with bitcasts of the comparison
result mask:

https://godbolt.org/z/U7C4n4

e.g.

## ARMv7+NEON

LLVM6:

all_8x8:
 vmov.i8 d0, #0x1
 vldr    d1, [r0]
 vtst.8  d0, d1, d0
 vext.8  d1, d0, d0, #4
 vand    d0, d0, d1
 vext.8  d1, d0, d0, #2
 vand    d0, d0, d1
 vdup.8  d1, d0[1]
 vand    d0, d0, d1
 vmov.u8 r0, d0[0]
 and     r0, r0, #1
 bx      lr
any_8x8:
 vmov.i8 d0, #0x1
 vldr    d1, [r0]
 vtst.8  d0, d1, d0
 vext.8  d1, d0, d0, #4
 vorr    d0, d0, d1
 vext.8  d1, d0, d0, #2
 vorr    d0, d0, d1
 vdup.8  d1, d0[1]
 vorr    d0, d0, d1
 vmov.u8 r0, d0[0]
 and     r0, r0, #1
 bx      lr

Manually generated:

all_8x8:
 vldr    d0, [r0]
 vpmin.u8 d16, d0, d16
 vpmin.u8 d16, d16, d16
 vpmin.u8 d0, d16, d16
 vmov.u8 r0, d0[0]
 bx      lr

any_8x8:
 vldr    d0, [r0]
 vpmax.u8 d16, d0, d16
 vpmax.u8 d16, d16, d16
 vpmax.u8 d0, d16, d16
 vmov.u8 r0, d0[0]
 bx      lr
Quuxplusone commented 3 years ago

Ping, there is https://reviews.llvm.org/D97961, which changes default cost for the logical reductions.