Open Quuxplusone opened 6 years ago
Bugzilla Link | PR37417 |
Status | NEW |
Importance | P enhancement |
Reported by | Sanjay Patel (spatel+llvm@rotateright.com) |
Reported on | 2018-05-11 09:04:30 -0700 |
Last modified on | 2020-10-25 11:26:31 -0700 |
Version | trunk |
Hardware | PC All |
CC | lebedev.ri@gmail.com, llvm-bugs@lists.llvm.org, llvm-dev@redking.me.uk, oneill+llvmbugs@cs.hmc.edu |
Fixed by commit(s) | |
Attachments | |
Blocks | |
Blocked by | PR46896 |
See also | PR37387, PR38152, PR39624, PR34924, PR15880, PR20750, PR35155 |
If we add a rotate intrinsic:
http://lists.llvm.org/pipermail/llvm-dev/2018-May/123292.html
...then we shouldn't need this chunk in CGP.
Please also consider the UB avoiding pattern:
(x << (n % width)) | (x >> (-n % width))
And how does this interact with vectorization?
_Bug 20750 has been marked as a duplicate of this bug._
Apologies for not updating rotate bugs with current status. We added more
general funnel shift intrinsics to LLVM:
https://reviews.llvm.org/D49242
These can be used to represent rotates in IR and will become canonical form
(hopefully not too far away).
Given that we have these new intrinsics, I'm going to morph this bug into a
request to canonicalize the IR shown here to the intrinsics. (There's no reason
to pursue a backend hack in CGP to accomplish the same thing.)
(In reply to Trass3r from comment #2)
> And how does this interact with vectorization?
We have to enhance the vectorizers to recognize the funnel shift intrinsics to
avoid regressions:
https://reviews.llvm.org/rL346661
(In reply to Sanjay Patel from comment #5)
> (In reply to Trass3r from comment #2)
> > And how does this interact with vectorization?
>
> We have to enhance the vectorizers to recognize the funnel shift intrinsics
> to avoid regressions:
> https://reviews.llvm.org/rL346661
Getting the cost models in shape:
https://reviews.llvm.org/rL346683
https://reviews.llvm.org/rL346688
Converting narrow-able rotate patterns to minimal shift+logic ops:
https://reviews.llvm.org/rL346713
https://reviews.llvm.org/rL346716
Bug 34924 shows a rotate variant that includes compare+branch, so we'll need to
handle that as well as a 'select' variant of that pattern:
define i32 @rot_select(i32 %x, i32 %shamt){
%cmp = icmp eq i32 %shamt, 0
%shr = lshr i32 %x, %shamt
%sub = sub i32 32, %shamt
%shl = shl i32 %x, %sub
%or = or i32 %shl, %shr
%r = select i1 %cmp, i32 %x, i32 %or
ret i32 %r
}
https://rise4fun.com/Alive/yMQ
AFAIK, this is the last step that was needed before we add the transform to
produce intrinsics in instcombine:
https://reviews.llvm.org/rL346854
We should be the pattern that includes select into the expected
{sub/and/and/shl/lshr/or} form with this change:
https://reviews.llvm.org/rL346807
And we have better support to reduce the IR once it is in intrinsic form with:
https://reviews.llvm.org/rL346814
Am i misreading the langref (and thus the ir is invalid),
or is the naive funnel shift IR not being canonicalized?
https://godbolt.org/z/W6WxrQ
(In reply to Roman Lebedev from comment #9)
> Am i misreading the langref (and thus the ir is invalid),
> or is the naive funnel shift IR not being canonicalized?
> https://godbolt.org/z/W6WxrQ
IR looks good. We don't have that, these alternate DAG funnel expansions:
// fshl: (X << (Z % BW)) | (Y >> (BW - (Z % BW)))
// fshr: (X << (BW - (Z % BW))) | (Y >> (Z % BW))
...or even the simpler rotate patterns canonicalized in IR yet. I keep thinking
we're close to done with this, but I've been perpetually too optimistic.
(In reply to Sanjay Patel from comment #10)
> (In reply to Roman Lebedev from comment #9)
> > Am i misreading the langref (and thus the ir is invalid),
> > or is the naive funnel shift IR not being canonicalized?
> > https://godbolt.org/z/W6WxrQ
>
> IR looks good.
Ok, thank you for checking, that is reassuring.
> We don't have that, these alternate DAG funnel expansions:
> // fshl: (X << (Z % BW)) | (Y >> (BW - (Z % BW)))
> // fshr: (X << (BW - (Z % BW))) | (Y >> (Z % BW))
>
> ...or even the simpler rotate patterns canonicalized in IR yet. I keep
> thinking we're close to done with this, but I've been perpetually too
> optimistic.
Copying the IR from the godbolt link here for reference. I'm guessing there are
a number of variations if the shift amount is not the same width as the other
variables (similar to the simpler rotate patterns). This is going to take some
large pattern matching.
Any help in writing tests/patches is appreciated. I'm not sure when I'll get to
these. The tricky part is that we can't canonicalize simpler UB-unsafe
sequences to the intrinsics because the DAG expansion might be worse perf. This
is discussed here:
https://reviews.llvm.org/D55604?id=177858#inline-492396 and bug 34924.
define i32 @fshl_pow2_bitwidth_using_wide_type(i32 %a, i32 %b, i32 %c) {
%widea = zext i32 %a to i64
%wideb = zext i32 %b to i64
%high = shl nuw i64 %widea, 32
%wide = or i64 %high, %wideb
%cmodwidth = and i32 %c, 31
%cmodwidthwide = zext i32 %cmodwidth to i64
%shifted = shl i64 %wide, %cmodwidthwide
%skiplowhalf = lshr i64 %shifted, 32
%trunc = trunc i64 %skiplowhalf to i32
ret i32 %trunc
}
define i32 @fshr_pow2_bitwidth_using_wide_type(i32 %a, i32 %b, i32 %c) {
%widea = zext i32 %a to i64
%wideb = zext i32 %b to i64
%high = shl nuw i64 %widea, 32
%wide = or i64 %high, %wideb
%cmodwidth = and i32 %c, 31
%cmodwidthwide = zext i32 %cmodwidth to i64
%shifted = lshr i64 %wide, %cmodwidthwide
%trunc = trunc i64 %shifted to i32
ret i32 %trunc
}
These are the canonicalizations we have so far:
1. UB-safe shift+logic, matching types
https://reviews.llvm.org/rL350672
2. Various flavors of shift+logic with mismatched types
https://reviews.llvm.org/rL350419
3. Semi-UB-safe shift+logic+select, matching types
https://reviews.llvm.org/rL350199
4. Semi-UB-safe shift+logic with guard branch, matching types:
https://reviews.llvm.org/rL349396