Open Quuxplusone opened 4 years ago
Could we handle this after isel, in TailDup?
(In reply to Eli Friedman from comment #1)
> Could we handle this after isel, in TailDup?
It's possible, but as I read the code, it could still hit the same issues as
handling it in ISel.
Currently I think tail duplication is structured to clone instructions from a
common tail block into the preceding blocks. Currently it is restricted from
doing this if the predecessor has any EHPad successors. If we remove that
restriction, we may run into the problems I hit with the ISel solution, where
downstream passes get upset when they see a BB with exactly one exceptional
successor. If we fix those issues, then the ISel solution seems better to me
(fewer MBBs earlier).
We could teach tail duplication to create new MBBs, but that would be a larger
change. It might be generally useful, though.
Oh, I see, we expect two successors, so we can't do conventional taildup.
Taildup that keeps the cloned code in a separate BB might be useful in other situations.
Consider this example:
$ cat t.cpp struct HasCleanup { ~HasCleanup(); }; extern "C" void mythrow1(); extern "C" void mythrow2(); extern "C" void mythrow3(); extern "C" int multi_throw(bool c1, bool c2, bool c3) { HasCleanup obj; if (c1) { mythrow1(); goto unreachable; } if (c2) { mythrow2(); goto unreachable; } if (c3) { mythrow3(); goto unreachable; } return 0; unreachable: __builtin_unreachable(); }
Clang emits extra branches after each call to mythrowN: $ clang -S -O1 t.cpp -o - | grep -A3 mythrow callq mythrow1 .Ltmp5: jmp .LBB0_7 .LBB0_4: # %if.then4
.Ltmp3: jmp .LBB0_7 .LBB0_6: # %if.then8
.Ltmp1: .LBB0_7: # %unreachable .LBB0_2: # %lpad
I encountered this issue because Clang uses the same IR pattern to emit calls to C++ throw expressions. See the code that uses
getUnreachableBlock
here: https://github.com/llvm/llvm-project/blob/master/clang/lib/CodeGen/CGCall.cpp#L3794C++ source & asm:
$ cat b.cpp struct HasCleanup { ~HasCleanup(); }; extern "C" int multi_throw(bool c1, bool c2, bool c3) { HasCleanup obj; if (c1) throw 1; if (c2) throw 2; if (c3) throw 3; return 0; }
$ clang -S -O1 b.cpp -o - | grep -A3 call.*throw callq __cxa_throw .Ltmp5: jmp .LBB0_7 .LBB0_4: # %if.then4
.Ltmp3: jmp .LBB0_7 .LBB0_6: # %if.then8
.Ltmp1: .LBB0_7: # %unreachable .LBB0_2: # %lpad
We could teach clang not to use this IR pattern, but the first example shows that this is an LLVM issue too.
There are a few possible places to fix this:
Clang: Most users probably don't write code like example 1, more like example 2, so we could change the generated IR and declare victory.
SimplifyCFG: This pass already does a lot of unreachable handling. However, this would increase the BB count and instruction count, so it could be considered non-canonical. Then again, we don't canonicalize in the opposite direction by tail merging in the IR. I did have a patch for this (https://reviews.llvm.org/D29428), but the results were not promising.
CodeGenPrepare: We do some CFG adjustment here already, but we'd need a new loop over all BBs just for this. It doesn't fit into the existing transforms. This would be limited to, duplicate unreachable BBs after invokes. We'd have to think about dbg.values as well.
ISel: We already have handling for switch () { default: unreachable; }, so we could handle invoke unreachable. I tried this locally, but it seems to cause bugs in machine CFG passes like branch folding. We end up with an MBB with one successor, but no terminator instruction, and passes try to merge the landing pad into the throwing block. Also, it would need to be done in SDISel and GlobalISel. Maybe these issues are related to https://llvm.org/pr27659.
See also https://llvm.org/PR45064#c4, where this code pattern caused issues with the win64 unwinder.