Quuxplusone / LLVMBugzillaTest

0 stars 0 forks source link

Regression with builtin_expect over multiple predicates #32308

Open Quuxplusone opened 7 years ago

Quuxplusone commented 7 years ago
Bugzilla Link PR33336
Status NEW
Importance P normal
Reported by Zachary Amsden (zamsden@cloudera.com)
Reported on 2017-06-06 17:52:07 -0700
Last modified on 2017-06-08 10:40:23 -0700
Version trunk
Hardware PC Linux
CC llvm-bugs@lists.llvm.org, t.p.northover@gmail.com, zamsden@cloudera.com
Fixed by commit(s)
Attachments file_33336.txt (459 bytes, text/plain)
file_33336.txt (671 bytes, text/plain)
Blocks
Blocked by
See also
Created attachment 18584
Reproducing case

Testing using an UNLIKELY macro built around builtin_expect found a regression
from 3.7 -> 3.8 that is still present in the 4.0 and likely trunk (untested)
branches of llvm.  The unlikely condition is not propagated to both sub-
expressions when combined with an OR clause.  Using multiple invocations of
such a macro gives the expected result.

Apparently the branch probabilities are no longer being propagated correctly to
child expressions.

I bisected llvm and clang in parallel on release_37 and release_38 branches to
find the responsible change and found this change to be responsible.  Seems
entirely plausible as it introduces a new branch:

    [JumpThreading] Split select that has constant conditions coming from the PHI node

    Look for PHI/Select in the same BB of the form

    bb:
      %p = phi [false, %bb1], [true, %bb2], [false, %bb3], [true, %bb4], ...
      %s = select p, trueval, falseval

    And expand the select into a branch structure. This later enables
    jump-threading over bb in this pass.

    Using the similar approach of SimplifyCFG::FoldCondBranchOnPHI(), unfold
    select if the associated PHI has at least one constant.  If the unfolded
    select is not jump-threaded, it will be folded again in the later
    optimizations.

    git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@257198 91177308-0d34-0410-b5e6-96231b3b80d

Attached code which generates a reproduction - you can see the sub-optimal
branch generated in the asm output.  Let me know if you want IR output as well,
but this reproduces quite easily.  I copied my reproducing case verbatim, there
is almost certainly a more minimal case that doesn't involve floating point,
but this works reliably - Godbolt link: https://godbolt.org/g/vBwwQU
Quuxplusone commented 7 years ago

Attached file_33336.txt (459 bytes, text/plain): Reproducing case

Quuxplusone commented 7 years ago
It looks like the problem happens after CFG simplification.  We change the
control flow for the last block to:

lor.end:                                          ; preds = %lor.rhs, %entry
  %0 = phi i1 [ true, %entry ], [ %cmp, %lor.rhs ]
  %. = select i1 %0, i1 true, i1 false
  ret i1 %.

This in turn gets jump-threaded, which does not seem advantageous, since the
target in this case is simply a return.
Quuxplusone commented 7 years ago

Attached file_33336.txt (671 bytes, text/plain): Minimal fix