Open Quuxplusone opened 3 years ago
Bugzilla Link | PR49763 |
Status | NEW |
Importance | P enhancement |
Reported by | Jeremy R. (llvm@rifkin.dev) |
Reported on | 2021-03-29 13:13:28 -0700 |
Last modified on | 2021-04-03 07:02:53 -0700 |
Version | trunk |
Hardware | All All |
CC | llvm-bugs@lists.llvm.org, llvm-dev@redking.me.uk, mstaveleytaylor@gmail.com, spatel+llvm@rotateright.com |
Fixed by commit(s) | |
Attachments | |
Blocks | |
Blocked by | |
See also |
I think I was mistaken when I speculated about a problem with ignoring poison values from the absolute value calculation. It may be as simple as not optimizing abs(x) & 1 -> x & 1 and abs(x & 1) -> x & 1.
This should help the examples with "abs":
https://reviews.llvm.org/rGc2ebad8d55bd
The examples with "srem" are trickier. Clearly, we are missing these basic
transforms in IR:
https://alive2.llvm.org/ce/z/BagXUu
https://alive2.llvm.org/ce/z/aapvrz
But we would have to bend our normal value usage rules to get those most basic
folds to fire on the example with 2 uses of the srem result.
That might be easier to justify as a backend (codegen) optimization.
Here's another missed fold in IR (not sure if there's a way to generalize this
any further):
https://alive2.llvm.org/ce/z/xEHdTv
Here's a potential generalization of the above:
https://alive2.llvm.org/ce/z/Cav8Pv
; replace abs-of-srem-by-positive-divisor with urem-of-abs
define i5 @src(i5 %x, i5 %y) {
%d = call i5 @llvm.abs.i5(i5 %y, i1 false) ; int min arg doesn't matter
%s = srem i5 %x, %d
%r = call i5 @llvm.abs.i5(i5 %s, i1 false) ; int min arg doesn't matter
ret i5 %r
}
define i5 @tgt(i5 %x, i5 %y) {
%d = call i5 @llvm.abs.i5(i5 %y, i1 false)
%a = call i5 @llvm.abs.i5(i5 %x, i1 false)
%r = urem i5 %a, %d
ret i5 %r
}
-------------------------------------------------------------------------------
As with many folds, there's a trade-off that we have to limit the power of the
transform based on number of uses the more that we generalize (because we need
to create extra instructions to keep the transform valid).
If "x % 2" is the special and common case, we might as well just try to get
that unless/until it is shown that we need to try harder.
(In reply to Sanjay Patel from comment #3)
> Here's another missed fold in IR (not sure if there's a way to generalize
> this any further):
> https://alive2.llvm.org/ce/z/xEHdTv
Committed as:
https://reviews.llvm.org/rG1462bdf1b985
So if I'm seeing it correctly, that leaves just the "is_odd_3" example as sub-
optimal. That needs something like what I suggested in comment 2.
(In reply to Sanjay Patel from comment #5)
> (In reply to Sanjay Patel from comment #3)
> > Here's another missed fold in IR (not sure if there's a way to generalize
> > this any further):
> > https://alive2.llvm.org/ce/z/xEHdTv
>
> Committed as:
> https://reviews.llvm.org/rG1462bdf1b985
>
>
> So if I'm seeing it correctly, that leaves just the "is_odd_3" example as
> sub-optimal. That needs something like what I suggested in comment 2.
If I'm not mistaken, the abs (srem X, 2) --> and X, 1 transformation can be
made more generally as abs (srem X, n) --> urem X, n. The urem X, 2 special
case already has a transformation --> and X, 1.
The abs (srem X, n) --> urem X, n transformation makes most sense to me though I understand your point regarding limiting the power of the transformation. I don't have any examples right off-hand that would benefit from this more powerful transform.
(In reply to Jeremy Morgan Rifkin from comment #6)
> If I'm not mistaken, the abs (srem X, 2) --> and X, 1 transformation can be
> made more generally as abs (srem X, n) --> urem X, n.
That's almost the same as what I was suggesting in comment 4, but the transform
is only valid if we can guarantee that X is positive.
This might make it clearer than my earlier example:
https://alive2.llvm.org/ce/z/ASha97
If you comment out the 'assume' line, the transform doesn't hold.