Open Quuxplusone opened 3 years ago
Bugzilla Link | PR49577 |
Status | NEW |
Importance | P enhancement |
Reported by | Danila Kutenin (kutdanila@yandex.ru) |
Reported on | 2021-03-13 05:12:19 -0800 |
Last modified on | 2021-05-18 16:18:47 -0700 |
Version | trunk |
Hardware | PC Linux |
CC | arnaud.degrandmaison@arm.com, david.green@arm.com, llvm-bugs@lists.llvm.org, smithp352@googlemail.com, Ties.Stuij@arm.com |
Fixed by commit(s) | |
Attachments | |
Blocks | |
Blocked by | |
See also |
Those instructions do indeed look redundant. I've put up https://reviews.llvm.org/D98599 to improve the lowering.
The ushr+orr I haven't looked at in any depth, but llvm is likely being helpful by turning the Add into an Or. That needs to be turned back using something like IsOrAdd or AddLikeOrOp used in other backends (although known bits may be needed for the aarch64 shifts to make that work).
Let me bump this once again, new zstd 1.5.0 lost performance with the new update because of this
https://github.com/facebook/zstd/pull/2494#issuecomment-829631487
I can try to fix on my own, just need some guidance probably
Okay, it is even more interesting
Removing first shrq_n_u8 turns the code into the good one
int MoveMask(uint8x16_t input) { uint32x4_t paired16 = vreinterpretq_u32_u16(vsraq_n_u16(input, input, 7)); uint64x2_t paired32 = vreinterpretq_u64_u32(vsraq_n_u32(paired16, paired16, 14)); uint8x16_t paired64 = vreinterpretq_u8_u64(vsraq_n_u64(paired32, paired32, 28)); return vgetq_lane_u8(paired64, 0) | ((int) vgetq_lane_u8(paired64, 8) << 8); }
usra v0.8h, v0.8h, #7
usra v0.4s, v0.4s, #14
usra v0.2d, v0.2d, #28
umov w0, v0.b[0]
umov w8, v0.b[8]
bfi w0, w8, #8, #8
ret
And actually there are tests on that in AArch64/arm64-vsra.ll to check that. I believe it comes from DAGCombiner::visitAdd and it can prove that + is the same as or in
if ((!LegalOperations || TLI.isOperationLegal(ISD::OR, VT)) && DAG.haveNoCommonBitsSet(N0, N1)) return DAG.getNode(ISD::OR, DL, VT, N0, N1);
Yeah, we probably need something like AddLikeOrOp