Inliner incorrectly combines cleanup and catch landing pads

Quuxplusone commented 12 years ago


Bugzilla Link	PR14116
Status	NEW
Importance	P enhancement
Reported by	David Chisnall (csdavec@swan.ac.uk)
Reported on	2012-10-18 07:05:58 -0700
Last modified on	2012-11-07 17:46:08 -0800
Version	trunk
Hardware	PC All
CC	baldrick@free.fr, llvm-bugs@lists.llvm.org, pawel@32bitmicro.com, rafael@espindo.la, rjmccall@apple.com, wendling@apple.com
Fixed by commit(s)
Attachments	`e.ll` (4559 bytes, application/octet-stream) `lp.ll` (3644 bytes, text/plain)
Blocks	PR13893
Blocked by
See also

The LLVM inliner combines catch and cleanups into a single landing pad and then
merges their tails.  This is problematic, because some personality functions
(e.g. GNU Objective-C) return different values in exception register 0
depending on whether the landing pad is a catch or a cleanup.  Consider the
following example:

define i32 @x() uwtable {
entry:
  %retval = alloca i32, align 4
  %x = alloca i32, align 4
  %exn.slot = alloca i8*
  %ehselector.slot = alloca i32
  invoke void @objc_exception_throw(i8* null)
          to label %invoke.cont unwind label %lpad

invoke.cont:                                      ; preds = %entry
  unreachable

lpad:                                             ; preds = %entry
  %0 = landingpad { i8*, i32 } personality i8* bitcast (i32 (...)* @__gnu_objc_personality_v0 to i8*)
          cleanup
  %1 = extractvalue { i8*, i32 } %0, 0
  store i8* %1, i8** %exn.slot
  %2 = extractvalue { i8*, i32 } %0, 1
  store i32 %2, i32* %ehselector.slot
  %3 = bitcast i32* %x to i8*
  invoke void @foo(i8* %3)
          to label %invoke.cont1 unwind label %terminate.lpad

invoke.cont1:                                     ; preds = %lpad
  br label %eh.resume

return:                                           ; No predecessors!
  %4 = load i32* %retval
  ret i32 %4

eh.resume:                                        ; preds = %invoke.cont1
  %exn = load i8** %exn.slot
  %sel = load i32* %ehselector.slot
  %lpad.val = insertvalue { i8*, i32 } undef, i8* %exn, 0
  %lpad.val2 = insertvalue { i8*, i32 } %lpad.val, i32 %sel, 1
  resume { i8*, i32 } %lpad.val2

terminate.lpad:                                   ; preds = %lpad
  %5 = landingpad { i8*, i32 } personality i8* bitcast (i32 (...)* @__gnu_objc_personality_v0 to i8*)
          catch i8* null
  call void @abort() noreturn nounwind
  unreachable
}

declare void @foo(i8*)

declare void @objc_exception_throw(i8*)

declare i32 @__gnu_objc_personality_v0(...)

declare void @abort()

define i32 @y() uwtable {
entry:
  %finally.exn = alloca i8*
  %finally.for-eh = alloca i1
  %exn.slot = alloca i8*
  %ehselector.slot = alloca i32
  %cleanup.dest.slot = alloca i32
  store i1 false, i1* %finally.for-eh
  %call = invoke i32 @x()
          to label %invoke.cont unwind label %lpad

invoke.cont:                                      ; preds = %entry
  store i32 0, i32* %cleanup.dest.slot
  br label %cleanup

cleanup:                                          ; preds = %invoke.cont,
%finally.catchall
  %cleanup.dest.saved = load i32* %cleanup.dest.slot
  %0 = load i32* @z, align 4
  %inc = add nsw i32 %0, 1
  store i32 %inc, i32* @z, align 4
  %finally.shouldthrow = load i1* %finally.for-eh
  br i1 %finally.shouldthrow, label %finally.rethrow, label %finally.cont

finally.rethrow:                                  ; preds = %cleanup
  %1 = load i8** %finally.exn
  call void @objc_exception_throw(i8* %1)
  unreachable

finally.cont:                                     ; preds = %cleanup
  store i32 %cleanup.dest.saved, i32* %cleanup.dest.slot
  %cleanup.dest = load i32* %cleanup.dest.slot
  switch i32 %cleanup.dest, label %unreachable [
    i32 0, label %cleanup.cont
    i32 2, label %unreachable
  ]

cleanup.cont:                                     ; preds = %finally.cont
  ret i32 1

lpad:                                             ; preds = %entry
  %2 = landingpad { i8*, i32 } personality i8* bitcast (i32 (...)* @__gnu_objc_personality_v0 to i8*)
          catch i8* null
  %3 = extractvalue { i8*, i32 } %2, 0
  store i8* %3, i8** %exn.slot
  %4 = extractvalue { i8*, i32 } %2, 1
  store i32 %4, i32* %ehselector.slot
  br label %finally.catchall

finally.catchall:                                 ; preds = %lpad
  %exn = load i8** %exn.slot
  store i8* %exn, i8** %finally.exn
  store i1 true, i1* %finally.for-eh
  store i32 2, i32* %cleanup.dest.slot
  br label %cleanup

unreachable:                                      ; preds = %finally.cont,
%finally.cont
  unreachable
}

The inner function, x(), has a cleanup.  The outer has a catch.  The cleanup
rethrows the exception with a resume instruction.  The outer function, y(), has
a catchall.  This rethrows the exception with objc_exception_throw().

When the inliner has run, the following version of y() is produced:

define i32 @y() uwtable optsize {
entry:
  %x.i = alloca i32, align 4
  %0 = bitcast i32* %x.i to i8*
  call void @llvm.lifetime.start(i64 -1, i8* %0)
  invoke void @objc_exception_throw(i8* null)
          to label %invoke.cont.i unwind label %lpad.i

invoke.cont.i:                                    ; preds = %entry
  unreachable

lpad.i:                                           ; preds = %entry
  %1 = landingpad { i8*, i32 } personality i8* bitcast (i32 (...)* @__gnu_objc_personality_v0 to i8*)
          cleanup
          catch i8* null
  invoke void @foo(i8* %0) optsize
          to label %finally.rethrow unwind label %terminate.lpad.i

terminate.lpad.i:                                 ; preds = %lpad.i
  %2 = landingpad { i8*, i32 } personality i8* bitcast (i32 (...)* @__gnu_objc_personality_v0 to i8*)
          catch i8* null
  call void @abort() noreturn nounwind
  unreachable

finally.rethrow:                                  ; preds = %lpad.i
  %3 = extractvalue { i8*, i32 } %1, 0
  %4 = load i32* @z, align 4, !tbaa !0
  %inc = add nsw i32 %4, 1
  store i32 %inc, i32* @z, align 4, !tbaa !0
  call void @objc_exception_throw(i8* %3)
  unreachable
}

x() has been inlined into the first basic block, so the invoke of the function
throwing the exception (objc_exception_throw()) becomes the terminator for that
block.  So far, so good.  The landing pad for this now has a cleanup and then a
catchall, which is a bit redundant because either of these will always be
matched.  The call to foo() in the landing pad is the cleanup code from x(),
which is correct.  This then branches to finally.rethrow, however, which then
calls objc_exception_throw().  This call is only valid if we landed with the
catchall, but because a cleanup will always be matched first we actually landed
with the cleanup.

The resume instruction has been completely removed.  The correct behaviour in
this case would be simply to remove the cleanup - there is a catchall and so
the cleanup is completely redundant.  In the more general case, if we have
multiple rethrow blocks with different mechanisms then we should be wrapping
them in a branch.  The finally.rethrow block in this case should look like
something like this:

finally.rethrow:                                  ; preds = %lpad.i
  %3 = extractvalue { i8*, i32 } %1, 0
  %4 = load i32* @z, align 4, !tbaa !0
  %inc = add nsw i32 %4, 1
  store i32 %inc, i32* @z, align 4, !tbaa !0
  %5 = extractvalue { i8*, i32 } %1, 1
  switch i32 %5, label %unreachable [
    i32 0, label %cleanup.cont
    i32 1, label %rethrow
  ]

rethrow:
  call void @objc_exception_throw(i8* %3)
  unreachable

cleanup.cont:
  resume { i8*, i32 } %1

Quuxplusone commented 12 years ago

About the "there's a catch-all so the cleanup is redundant", check out
RecognizePersonality and isCatchAll in InstructionCombining.cpp.  They
seem to expect __objc_personality_v0 not __gnu_objc_personality_v0.

As for the rest of the bug report, at a glance the IR is wrong since it doesn't
conform to the http://llvm.org/docs/ExceptionHandling.html#restrictions, namely
this bit:

"In order for inlining to behave correctly, landing pads must be prepared to
handle selector results that they did not originally advertise. Suppose that a
function catches exceptions of type A, and it’s inlined into a function that
catches exceptions of type B. The inliner will update the landingpad
instruction for the inlined landing pad to include the fact that B is also
caught. If that landing pad assumes that it will only be entered to catch an A,
it’s in for a rude awakening. Consequently, landing pads must test for the
selector results they understand and then resume exception propagation with the
resume instruction if none of the conditions match."

Quuxplusone commented 12 years ago

This should not be required for something that is a catchall - by definition,
it catches every kind of exception and should not resume.  I have a slightly
more worrying example of error here:

define i32 @finally() noreturn uwtable {
entry:
  %0 = load i8** @except, align 8, !tbaa !1
  invoke void @objc_exception_throw(i8* %0) noreturn
          to label %invoke.cont unwind label %lpad

invoke.cont:                                      ; preds = %entry
  unreachable

lpad:                                             ; preds = %entry
  %1 = landingpad { i8*, i32 } personality i8* bitcast (i32 (...)* @__gnu_objc_personality_v0 to i8*)
          cleanup
  tail call void @runCleanup(i8* undef)
  resume { i8*, i32 } %1
}

declare void @objc_exception_throw(i8*)

declare i32 @__gnu_objc_personality_v0(...)

define i32 @main() nounwind uwtable {
entry:
  %call = invoke i32 @finally()
          to label %return unwind label %lpad

lpad:                                             ; preds = %entry
  %0 = landingpad { i8*, i32 } personality i8* bitcast (i32 (...)* @__gnu_objc_personality_v0 to i8*)
          catch i8* null
  br label %return

return:                                           ; preds = %entry, %lpad
  ret i32 0
}

The inner function has a cleanup, the outer function a catchall.  The correct
behaviour for this program is to exit with a status 0.  After optimisation, we
get:

define i32 @main() nounwind uwtable {
entry:
  %0 = load i8** @except, align 8, !tbaa !1
  invoke void @objc_exception_throw(i8* %0) noreturn
          to label %invoke.cont.i unwind label %lpad.i

invoke.cont.i:                                    ; preds = %entry
  unreachable

lpad.i:                                           ; preds = %entry
  %1 = landingpad { i8*, i32 } personality i8* bitcast (i32 (...)* @__gnu_objc_personality_v0 to i8*)
          cleanup
  store i32 1, i32* @cleanupRun, align 4, !tbaa !0
  resume { i8*, i32 } %1
}

The cleanup landing pad has been moved to the outer function, and the catchall
is gone completely.  This program now aborts at run time because the unwinder
finds only cleanups before falling off the end of the stack.

Quuxplusone commented 12 years ago

"This should not be required for something that is a catchall - by definition,
it catches every kind of exception and should not resume".  You misunderstood
what the docs are saying.  The docs say that the catch-all handler, like every
other handler, should check that the selector value was "catch-all" before
running the catch-all code.  It's not about stuff that happens after the catch
all, it's about stuff that may happen before it after inlining, eg the cleanup
running.  If you modify the IR in this way, does it then work?

Quuxplusone commented 12 years ago

I am not 100% sure what the correct behaviour should be.  It's a catchall, so
it should never be resuming even after inlining.  Even if it is merged with
other handlers, then the catchall code should run unconditionally.  I tried
modifying it like this:

define i32 @main() nounwind uwtable {
entry:
  %call = invoke i32 @finally()
          to label %return unwind label %lpad

lpad:                                             ; preds = %entry
  %0 = landingpad { i8*, i32 } personality i8* bitcast (i32 (...)* @__gnu_objc_personality_v0 to i8*)
          catch i8* null
  %1 = extractvalue { i8*, i32 } %0, 1
  %2 = icmp eq i32 0, %1
  br i1 %2, label %return, label %rethrow

return:                                           ; preds = %entry, %lpad
  ret i32 0

rethrow:
  resume { i8*, i32 } %0
}

This code is now wrong, because the catchall is a barrier beyond which no
exceptions should propagate, however even with this modification, the same
incorrect results occur during optimisation.  It is transformed into the
following, which unconditionally propagates exceptions:

define i32 @main() nounwind uwtable {
entry:
  %0 = load i8** @except, align 8, !tbaa !1
  invoke void @objc_exception_throw(i8* %0) noreturn
          to label %invoke.cont.i unwind label %lpad.i

invoke.cont.i:                                    ; preds = %entry
  unreachable

lpad.i:                                           ; preds = %entry
  %1 = landingpad { i8*, i32 } personality i8* bitcast (i32 (...)* @__gnu_objc_personality_v0 to i8*)
          cleanup
  store i32 1, i32* @cleanupRun, align 4, !tbaa !0
  resume { i8*, i32 } %1
}

Quuxplusone commented 12 years ago

I meant something like this:

Original:

define i32 @main() nounwind uwtable {
entry:
  %call = invoke i32 @finally()
          to label %return unwind label %lpad

lpad:                                             ; preds = %entry
  %0 = landingpad { i8*, i32 } personality i8* bitcast (i32 (...)*
@__gnu_objc_personality_v0 to i8*)
          catch i8* null
  br label %return

return:                                           ; preds = %entry, %lpad
  ret i32 0
}

Modified:

define i32 @main() nounwind uwtable {
entry:
  %call = invoke i32 @finally()
          to label %return unwind label %lpad

lpad:                                             ; preds = %entry
  %0 = landingpad { i8*, i32 } personality i8* bitcast (i32 (...)*
@__gnu_objc_personality_v0 to i8*)
          catch i8* null
  %selector = extractvalue { i8*, i32 } %0, 1
  %typeid = call i32 @llvm.eh.typeid.for(i8* 0)
  %is_catch_all = icmp eq i32 %selector, %typeid
  br %is_catch_all, label %return, label %resume

resume:
  resume { i8*, i32 } %0

return:                                           ; preds = %entry, %lpad
  ret i32 0
}

Quuxplusone commented 12 years ago

Modified like this:

define i32 @main() nounwind uwtable {
entry:
  %call = invoke i32 @finally()
          to label %return unwind label %lpad

lpad:                                             ; preds = %entry
  %0 = landingpad { i8*, i32 } personality i8* bitcast (i32 (...)* @__gnu_objc_personality_v0 to i8*)
          catch i8* null
  %selector = extractvalue { i8*, i32 } %0, 1
  %typeid = call i32 @llvm.eh.typeid.for(i8* null)
  %is_catch_all = icmp eq i32 %selector, %typeid
  br i1 %is_catch_all, label %return, label %rethrow

return:                                           ; preds = %entry, %lpad
  ret i32 0

rethrow:
  resume { i8*, i32 } %0
}

THe result is still that the optimiser removes the ret, and makes the catchall-
>stop into a cleanup->resume:

define i32 @main() nounwind uwtable {
entry:
  %0 = load i8** @except, align 8, !tbaa !1
  invoke void @objc_exception_throw(i8* %0) noreturn
          to label %invoke.cont.i unwind label %lpad.i

invoke.cont.i:                                    ; preds = %entry
  unreachable

lpad.i:                                           ; preds = %entry
  %1 = landingpad { i8*, i32 } personality i8* bitcast (i32 (...)* @__gnu_objc_personality_v0 to i8*)
          cleanup
  store i32 1, i32* @cleanupRun, align 4, !tbaa !0
  resume { i8*, i32 } %1
}

Quuxplusone commented 12 years ago

Can you please attach the complete (modified) IR.

Quuxplusone commented 12 years ago

Attached e.ll (4559 bytes, application/octet-stream): IR that is incorrectly inlined

Quuxplusone commented 12 years ago

Something seems to be horribly broken in the inliner...

Quuxplusone commented 12 years ago

Attached lp.ll (3644 bytes, text/plain): Comprehensive testcase for what the inliner is supposed to do

Quuxplusone commented 12 years ago

The later inlinings look like bugs, but the first inlining actually looks
totally reasonable to me.

David, I'm worried about the line "This call is only valid if we landed with
the catchall, but because a cleanup will always be matched first we actually
landed with the cleanup."  This sounds like a bug in the personality function.
The personality function is required to not merely decide whether to land, but
to correctly decide what the landing pad's action should be;  landing with a
selector value of 0 is a statement that it's landing there *only* to run
cleanups, which is obviously not the case.

Now, arguably the backend should make a point of not saying that there are
cleanups when there's a catchall handler in effect, but that doesn't really fix
the personality bug, because the personality would still be returning the wrong
answer for a non-total catch.

Quuxplusone commented 12 years ago

Hi John,

(In reply to comment #11)
> David, I'm worried about the line "This call is only valid if we landed with
> the catchall, but because a cleanup will always be matched first we actually
> landed with the cleanup."  This sounds like a bug in the personality function.
> The personality function is required to not merely decide whether to land, but
> to correctly decide what the landing pad's action should be;  landing with a
> selector value of 0 is a statement that it's landing there *only* to run
> cleanups, which is obviously not the case.

It's a question of ordering.  Personality functions will jump to the first
matching selector.  It is up to the compiler to ensure that the selectors are
ordered from most specific to most general (unless they were ordered in a
different way in the source language).  This means that a catchall and a
cleanup should always be ordered with the catchall first in the case where both
are present.  Doing otherwise would result in a significant performance hit (I
speak here as someone who has implemented the personality functions for three
languages, including C++) and be unacceptable.  Indeed - the ABI spec clearly
says that the first match should be taken, because a more general match may
follow, e.g. an NSString* followed by an NSObject* followed by an id catch in
Objective-C.

This inlining places the cleanup before the catchall, and so results in the
block being entered via the cleanup.  The personality function then enters the
first matching rule: the cleanup.

Now, I could implement work-arounds for the LLVM behaviour in the personality
functions that I maintain, but that wouldn't make us work with personality
functions that don't implement this hack, and would make all LLVM-compatible
personality functions slower than their LLVM-incompatible counterparts.

Quuxplusone commented 12 years ago

(In reply to comment #12)
> It's a question of ordering.  Personality functions will jump to the first
> matching selector.  It is up to the compiler to ensure that the selectors are
> ordered from most specific to most general (unless they were ordered in a
> different way in the source language).  This means that a catchall and a
> cleanup should always be ordered with the catchall first in the case where
both
> are present.  Doing otherwise would result in a significant performance hit (I
> speak here as someone who has implemented the personality functions for three
> languages, including C++) and be unacceptable.

You are basically claiming that GCC's LSDA layout requires cleanup actions to
always be the outermost action for a landing pad.  I am quite skeptical; among
other things, this would make it impossible to share action-table entries
between the landing pads in code like this:
  try {
    foo();
    std::string s; // needs a cleanup
    bar();
  } catch (E1 *e) {}
At the very least, I'd like to see some evidence from the GCC documentation
that this is intended.

I am also skeptical that it is a significant performance hit when interpreting
the actions table to simply remember that you've seen a cleanup action and
keep scanning.  Among other things, interpreting another entry in the
actions table is quite fast compared to everything else the personality is
doing — e.g. finding the appropriate lpad in the first place.

> Indeed - the ABI spec clearly says that the first match should be taken,
> because a more general match may follow, e.g. an NSString* followed
> by an NSObject* followed by an id catch in Objective-C.

If you're talking about the Itanium ABI's suggested implementation,
(1) it's clearly talking about handlers and (2) it doesn't talk about GCC's
LSDA layout.

> This inlining places the cleanup before the catchall, and so results in the
> block being entered via the cleanup.  The personality function then enters the
> first matching rule: the cleanup.

The IR has no concept of 'ordering' a cleanup w.r.t. handler clauses;
it's just a flag on the landingpad instruction.  That's completely fine.

> Now, I could implement work-arounds for the LLVM behaviour in the personality
> functions that I maintain, but that wouldn't make us work with personality
> functions that don't implement this hack, and would make all LLVM-compatible
> personality functions slower than their LLVM-incompatible counterparts.

For what it's worth, what I said above about just flagging whether
you saw a cleanup is exactly what GCC's C++ personality function does.
I really don't think you're relying on specified behavior here.

Quuxplusone / LLVMBugzillaTest

Inliner incorrectly combines cleanup and catch landing pads #14147