Open Quuxplusone opened 3 years ago
Bugzilla Link | PR50125 |
Status | NEW |
Importance | P enhancement |
Reported by | Hans Wennborg (hans@chromium.org) |
Reported on | 2021-04-26 06:58:43 -0700 |
Last modified on | 2021-10-16 10:43:28 -0700 |
Version | trunk |
Hardware | PC Linux |
CC | craig.topper@gmail.com, david.bolvansky@gmail.com, jhaberman@gmail.com, llvm-bugs@lists.llvm.org, ndesaulniers@google.com, nok.raven@gmail.com, yuanfang.chen@sony.com |
Fixed by commit(s) | |
Attachments | |
Blocks | |
Blocked by | |
See also | PR50130 |
https://godbolt.org/z/znGnaaWMn
It fails to fire when you have some instructions between jcc and label.
upb_pf32_1bt: # @upb_pf32_1bt
test r9b, r9b
jne .LBB0_1
inc rsi
jmp dispatch # TAILCALL
.LBB0_1:
jmp fallback # TAILCALL
Remove ptr += 1 and you have:
upb_pf32_1bt: # @upb_pf32_1bt
test r9b, r9b
je dispatch # TAILCALL
jmp fallback # TAILCALL
https://github.com/llvm/llvm-project-staging/blob/8e505ae71e3a493593884c692544f788e9c884ce/llvm/lib/Target/X86/X86InstrInfo.cpp#L2912
So maybe this is place to handle this case?
cc Craig.
A comment on HN (https://news.ycombinator.com/item?id=26934616) suggests that it is related to the following code:
On the other hand, the example above indicates that this is not optimizing even in -Os, so maybe this is a red herring.
PredTBB == MBB is false for case in OP.
if I dump PredTBB after analyseBranch:
bb.2.if.end:
; predecessors: %bb.0
liveins: $rdi
renamable $rdi = nuw INC64r killed renamable $rdi(tied-def 0), implicit-def dead $eflags
TCRETURNdi64 @stuff, 0, <regmask $bh $bl $bp $bph $bpl $bx $ebp $ebx $hbp $hbx $rbp $rbx $r12 $r13 $r14 $r15 $r12b $r13b $r14b $r15b $r12bh $r13bh $r14bh $r15bh $r12d $r13d $r14d $r15d $r12w $r13w $r14w $r15w $r12wh and 3 more...>, implicit $rsp, implicit $ssp, implicit $rdi
.globl foo # -- Begin function foo
.type foo,@function
foo: # @foo
.cfi_startproc
# %bb.0: # %entry
testb $1, (%rdi)
jne .LBB0_1
# %bb.2: # %if.end
incq %rdi
jmp stuff # TAILCALL
.LBB0_1: # %if.then
jmp fallback # TAILCALL
.Lfunc_end0:
.size foo, .Lfunc_end0-foo
.cfi_endproc
# -- End function
.ident "clang version 13.0.0 (https://github.com/llvm/llvm-project 942d2e19e11d6d65bf147ecab8513b7e8bd5d7e4)"
.section ".note.GNU-stack","",@progbits
.addrsig
If I disable this check (too conversative?) I get:
foo: # @foo
.cfi_startproc
# %bb.0: # %entry
testb $1, (%rdi)
je fallback # TAILCALL
# %bb.1: # %if.end
incq %rdi
jmp stuff # TAILCALL
Looks good.
... but it should be jne fallback. So something bad happens in analyzeBranch
_Bug 50130 has been marked as a duplicate of this bug._
I dug into this a bit. The full function prior to the branch folding pass is:
---
Function:
# Machine code for function foo: NoPHIs, TracksLiveness, NoVRegs,
TiedOpsRewritten
Function Live Ins: $rdi
bb.0.entry:
successors: %bb.2(0x7fef9fcb), %bb.1(0x00106035); %bb.2(99.95%), %bb.1(0.05%)
liveins: $rdi
TEST8mi renamable $rdi, 1, $noreg, 0, $noreg, 1, implicit-def $eflags :: (load (s8) from %ir.data, !tbaa !3)
JCC_1 %bb.2, 4, implicit killed $eflags
bb.1.if.then:
; predecessors: %bb.0
liveins: $rdi
TCRETURNdi64 @fallback, 0, <regmask $bh $bl $bp $bph $bpl $bx $ebp $ebx $hbp $hbx $rbp $rbx $r12 $r13 $r14 $r15 $r12b $r13b $r14b $r15b $r12bh $r13bh $r14bh $r15bh $r12d $r13d $r14d $r15d $r12w $r13w $r14w $r15w $r12wh and 3 more...>, implicit $rsp, implicit $ssp, implicit $rdi
bb.2.if.end:
; predecessors: %bb.0
liveins: $rdi
renamable $rdi = nuw ADD64ri8 killed renamable $rdi(tied-def 0), 1, implicit-def dead $eflags
TCRETURNdi64 @stuff, 0, <regmask $bh $bl $bp $bph $bpl $bx $ebp $ebx $hbp $hbx $rbp $rbx $r12 $r13 $r14 $r15 $r12b $r13b $r14b $r15b $r12bh $r13bh $r14bh $r15bh $r12d $r13d $r14d $r15d $r12w $r13w $r14w $r15w $r12wh and 3 more...>, implicit $rsp, implicit $ssp, implicit $rdi
---
At this point, the code is arranged such that "fallback" is the fallthrough,
corresponding to something like:
# bb.0:
test BYTE PTR [rdi],0x1
je bb.2
# bb.1
jmp fallback
bb.2:
add rdi,0x1
jmp stuff
The branch folding optimization at https://github.com/llvm/llvm-
project/blob/e439a463a30833f1c7d366ed722f0f12d1682638/llvm/lib/CodeGen/BranchFolding.cpp#L1520-
L1554 is looking for a MBB that fits these criteria:
1. MBB has only a single predecessor
2. MBB's first (non-debug) instruction is an unconditional tail call
3. MBB is the target of a conditional branch.
Both bb.1 and bb.2 fail this check:
- bb.1 is not the target of a conditional branch, failing (3)
- bb.2 is not a basic block consisting of a single tail call, failing (2)
Removing the "ptr += 1" makes bb.2 satisfy (2), so the optimization is
performed.
Removing the "PredTBB == MBB" check removes condition (3), which will make bb.1
satisfy the criteria as the fallthrough successor. But this makes the
optimization incorrect because the code gets rewritten to:
# bb.0:
test BYTE PTR [rdi],0x1
je fallback
# bb.2:
add rdi,0x1
jmp stuff
I'm not sure the best solution. One option would be to rewrite as:
# bb.0:
test BYTE PTR [rdi],0x1
jne fallback
# In the general case we need this, since we aren't
# guaranteed that the target is the fallthrough. But
# if we know it's the fallthrough, we can omit it.
jmp bb.2
bb.2:
add rdi,0x1
jmp stuff
Or maybe branch folding needs to run later? ie. after whatever pass is
rewriting from:
test BYTE PTR [rdi],0x1
je bb.2
jmp fallback
bb.2:
add rdi,0x1
jmp stuff
to the code we see in the output:
test BYTE PTR [rdi],0x1
jne fb
add rdi,0x1
jmp stuff
fb:
jmp fallback
When I use -print-after-all, it appears that "block-placement" is performing the rewrite I mentioned in my last comment: https://godbolt.org/z/nbhcj5hGd
But "block-placement" is running after "branch-folder", so the latter never sees the conditional branch in the form that it wants.
Should branch-folder run later? Or run twice? I'm new to LLVM so I don't have a sense for what the right fix is here.
Here is a better goldbolt link that filters out all the extraneous info: https://godbolt.org/z/ennsKWdTK