Quuxplusone / LLVMBugzillaTest

0 stars 0 forks source link

Integer promotion quirk prevents efficient power of 2 division #42208

Open Quuxplusone opened 5 years ago

Quuxplusone commented 5 years ago
Bugzilla Link PR43238
Status NEW
Importance P enhancement
Reported by Kim Nilsson (kim.nilsson@effnet.com)
Reported on 2019-09-06 05:17:37 -0700
Last modified on 2019-09-07 04:45:01 -0700
Version 8.0
Hardware PC Linux
CC david.bolvansky@gmail.com, efriedma@quicinc.com, llvm-bugs@lists.llvm.org, neeilans@live.com, nunoplopes@sapo.pt, richard-llvm@metafoo.co.uk
Fixed by commit(s)
Attachments integers.ll (1433 bytes, text/plain)
Blocks
Blocked by
See also
Created attachment 22476
Example LLVM IR

For the function,

uint_fast32_t foo_a(uint_fast8_t x) {
    const uint_fast32_t q = 1 << x;
    return 256 / q;
}

GCC-{8.*,9.*} generates inefficient division instructions for both x86_64 and
ARM. Since 'q' is a power of 2, the result should be a single right shift.
Indeed, if the function is changed to,

uint_fast32_t foo_b(uint_fast8_t x) {
    return 256 / (1 << x); // This is, of course, equivalent to (256 >> x)
},

the expected instructions are generated. Counter-intuitively, if the original
function is instead changed slightly to,

uint_fast32_t foo_c(uint_fast8_t x) {
    const uint_fast32_t q = (uint_fast32_t)1 << x;
    return 256 / q;
},

the expected instructions are once again generated.

See https://godbolt.org/z/X-It3b for examples.

(FWIW, GCC also has this behavior and I've reported the same thing there
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91680)
Quuxplusone commented 5 years ago

Attached integers.ll (1433 bytes, text/plain): Example LLVM IR

Quuxplusone commented 5 years ago
"GCC-{8.*,9.*}" should read "Clang-8" of course. Damn Copy/Paste ghosts.
Quuxplusone commented 5 years ago
https://rise4fun.com/Alive/PbSX describes the proposed transform:

Name: PR43238_sext
Pre: C < 2147483648 && C >= 0
%x = shl i32 1, %s
%xx = sext i32 %x to i64
%a = udiv i64 C, %xx
  =>
%ss = sext i32 %s to i64
%a = lshr i64 C, %ss
Quuxplusone commented 5 years ago
Precondition is C < TYPE_MAX-1, this works:
Name: PR43238_sext
Pre: C < 18446744073709551615 && C >= 0
%x = shl i32 1, %s
%xx = sext i32 %x to i64
%a = udiv i64 C, %xx
  =>
%ss = sext i32 %s to i64
%a = lshr i64 C, %ss

Optimization: PR43238_sext
Done: 1
Optimization is correct!

Also zext:
Name: PR43238_zext
Pre: C < 4294967295 && C >= 0
%x = shl i8 1, %s
%xx = zext i8 %x to i32
%a = udiv i32 C, %xx
  =>
%ss = zext i8 %s to i32
%a = lshr i32 C, %ss

Optimization: PR43238_zext
Done: 1
Optimization is correct!
Quuxplusone commented 5 years ago
(In reply to David Bolvansky from comment #3)
> Precondition is C < TYPE_MAX-1

That's wrong; try, for example, 4294967296.

Alive is somehow getting confused by your "Pre" line.  Try, for example:

Name: Impossible
Pre: C < 18446744073709551615 && C >= 0
%x = C
  =>
%x = C+1
Quuxplusone commented 5 years ago

Ah, sorry!

Quuxplusone commented 5 years ago

Nuno Lopes, can you take a look at this Alive case?

Quuxplusone commented 5 years ago
Well, 18446744073709551615 is negative in i64, so 'C < 0 && C >= 0' means the
precondition is always false.
I've now added a check to Alive to print a warning if the precondition is
always false.