Quuxplusone / LLVMBugzillaTest

0 stars 0 forks source link

[InstCombine] fold multiply and equality compare with constant #51369

Open Quuxplusone opened 2 years ago

Quuxplusone commented 2 years ago
Bugzilla Link PR52402
Status CONFIRMED
Importance P enhancement
Reported by Sanjay Patel (spatel+llvm@rotateright.com)
Reported on 2021-11-04 07:01:56 -0700
Last modified on 2021-11-05 12:45:00 -0700
Version trunk
Hardware PC All
CC anton.a.afanasyev@gmail.com, llvm-bugs@lists.llvm.org, llvm-dev@redking.me.uk
Fixed by commit(s)
Attachments
Blocks
Blocked by
See also PR52289, PR40493
Forking this off from bug 52289 - I don't know what the generalization is, but
we're missing some kind of overflowing/shifting/multiplying magic:

define i1 @src(i32 %x) {
  %m = mul i32 %x, 1355350016 ; 0x50c90000
  %r = icmp eq i32 %m, 65536  ; 0x00010000
  ret i1 %r
}

define i1 @tgt(i32 %x) {
  %t = trunc i32 %x to i16
  %r = icmp eq i16 %t, 51577 ; 0xc979
  ret i1 %r
}

https://alive2.llvm.org/ce/z/7Ft6jA
Quuxplusone commented 2 years ago
Here is the sample of similar nature, showing how we miss full folding at -O3:

define i1 @src(i32 %ta) {
  %t3 = trunc i32 %ta to i8
  %t4 = and i8 %t3, 8
  %t6.not = icmp eq i8 %t4, 0
  %t7 = select i1 %t6.not, i8 7, i8 0
  %t8 = shl i8 %t4, %t7
  %t10 = sext i8 %t8 to i32
  %t12 = mul i32 %t10, 1355350016
  %t13 = icmp eq i32 %t12, 65536
  ret i1 %t13
}

define i1 @tgt(i32 %ta) {
  ret i1 false
}

https://alive2.llvm.org/ce/z/pbFLS2
https://gcc.godbolt.org/z/5ooM4f4rr
Quuxplusone commented 2 years ago
(In reply to Anton Afanasyev from comment #1)
> Here is the sample of similar nature, showing how we miss full folding at
> -O3:
>
> define i1 @src(i32 %ta) {
>   %t3 = trunc i32 %ta to i8
>   %t4 = and i8 %t3, 8
>   %t6.not = icmp eq i8 %t4, 0
>   %t7 = select i1 %t6.not, i8 7, i8 0
>   %t8 = shl i8 %t4, %t7
>   %t10 = sext i8 %t8 to i32
>   %t12 = mul i32 %t10, 1355350016
>   %t13 = icmp eq i32 %t12, 65536
>   ret i1 %t13
> }
>
> define i1 @tgt(i32 %ta) {
>   ret i1 false
> }
>
> https://alive2.llvm.org/ce/z/pbFLS2
> https://gcc.godbolt.org/z/5ooM4f4rr

The multiply fold might unlock this case, but there's another problem here that
might be easier to solve - we don't thread the binop (shl in this case) back
through the select-of-constants to see if that reduces:
https://alive2.llvm.org/ce/z/zw2px6
Quuxplusone commented 2 years ago
(In reply to Sanjay Patel from comment #2)
> The multiply fold might unlock this case, but there's another problem here
> that might be easier to solve - we don't thread the binop (shl in this case)
> back through the select-of-constants to see if that reduces:
> https://alive2.llvm.org/ce/z/zw2px6

This should be a separate bug, so filed as bug 52406.
Quuxplusone commented 2 years ago

The generalization can be formalized in terms of modular arithmetic here (https://en.wikipedia.org/wiki/Modular_arithmetic). Optimizer is capable already of solving ordinary lineary equation (3x=9 <=> x=3) to get rid of multiplication, but can't do the same for modular linear equation (7x=5 (mod 16) <=> x=3 (mod 16) (https://gcc.godbolt.org/z/3WjKcxGhf).

The equation ax=b (mod c) can be solved (with certain constrains on constants) for any modulo, but c = 2^n case is of particular interest. We can try to solve this for patterns like icmp(and(mul(...))), using Euclidean algorithm (https://en.wikipedia.org/wiki/Euclidean_algorithm), though I'm not sure if this special case is worth it.