Clang generates 'call builtin' call to 'nobuiltin' function for global replacement of operator delete; gets optimized to trap().

Quuxplusone commented 3 years ago


Bugzilla Link	PR49022
Status	NEW
Importance	P normal
Reported by	Douglas Yung (douglas_yung@playstation.sony.com)
Reported on	2021-02-03 10:59:38 -0800
Last modified on	2021-11-15 12:48:43 -0800
Version	trunk
Hardware	PC All
CC	bhramar.vatsa@synopsys.com, dblaikie@gmail.com, efriedma@quicinc.com, florian_hahn@apple.com, hstong@ca.ibm.com, htmldeveloper@gmail.com, jdoerfert@anl.gov, jeroen.dobbelaere@synopsys.com, johannes@jdoerfert.de, llvm-bugs@lists.llvm.org, llvm-dev@redking.me.uk, mikael.holmen@ericsson.com, nok.raven@gmail.com, paul_robinson@playstation.sony.com, richard-llvm@metafoo.co.uk, warren_ristow@playstation.sony.com, yuanfang.chen@sony.com
Fixed by commit(s)
Attachments
Blocks
Blocked by
See also	PR41039

We have an internal test that prior to commit
292077072ec1279d89d21873fe900061e55ef936 produced code for the following
function:

void operator delete(void *p) noexcept {}

int test() {
  int *p = new int;
  delete p;
  return 1;
}

When compiled with optimizations enabled (-O2), it would generate the following
llvm IR:

; Function Attrs: norecurse nounwind readnone uwtable willreturn mustprogress
define dso_local i32 @_Z4testv() local_unnamed_addr #1 {
entry:
  ret i32 1
}

But now after 292077072e the compiler seems to think the function is now
undefined behavior and emits the following for it:

; Function Attrs: noreturn nounwind uwtable mustprogress
define dso_local i32 @_Z4testv() local_unnamed_addr #1 {
entry:
  tail call void @llvm.trap()
  unreachable
}

I don't think this is correct, but I'm not sure whether it really is undefined
or not.

Quuxplusone commented 3 years ago

Ok, I think I know what the problem is here. Clang (roughly) the following IR

define dso_local void @_ZdlPv(i8* %p) nobuiltin {
  ret void
}

define void @foo(i8 *%ptr) {
  call void @_ZdlPv(i8* %ptr) builtin
  ret void
}

Clang (correctly IIUC) adds nobuiltin to the definition of _ZdlPv, but the call to _ZdlPv is marked as builtin.

So DeadArgElim use the definition to determine that the argument to _ZdlPv is dead and updates the uses of the function to pass undef instead.

Then comes InstCombine, which sees the builtin call to _ZdlPv and assumes this is UB and creates a store to undef.

The current behavior of CallBase::isNoBuiltin() is that the builtin attribute on the call takes precedence over nobuiltin on the function; so at the call it appears as we are calling a builtin function, not a nobuiltin one.

I am not sure if the definition of the operator delete is valid, because it does not de-allocate memory allocated by new. IIUC there's no way for clang to tell if a global replacement is called, because they could be in different translation unit.s

We could teach DeadArgElim that should not update 'builtin' calls of 'nobuiltin' functions, but I am not sure that's the right thing to do. (and there might be plenty of other IPO that have the same problem)

Quuxplusone commented 3 years ago

I am not sure if the definition of the operator delete is valid, because it does not de-allocate memory allocated by new.

The standard specifies that the memory is deallocated; it does not specify how this occurs, and I think it is out-of-scope for the compiler to make arbitrary decisions about what "counts" as a deallocation.

Why couldn't I have a single-use-and-throw-it-away policy for my memory allocator? In which case, an empty operator delete function is exactly the correct implementation (and amazingly efficient, besides).

Quuxplusone commented 3 years ago

(In reply to Paul Robinson from comment #2)

I am not sure if the definition of the operator delete is valid, because it does not de-allocate memory allocated by new.

The standard specifies that the memory is deallocated; it does not specify how this occurs, and I think it is out-of-scope for the compiler to make arbitrary decisions about what "counts" as a deallocation.

Why couldn't I have a single-use-and-throw-it-away policy for my memory allocator? In which case, an empty operator delete function is exactly the correct implementation (and amazingly efficient, besides).

As I am said, I am not sure if it standard compliant or not.

Assuming it is, the next question is whether Clang is correct in marking the call as builtin, when the function is marked as nobuiltin?

Transformations looking at the call to _ZdlPv assume the builtin delete operator is called, not a user provided one, hence they assume it behaves like the builtin operator.

(I don't think this is a bug in 292077072ec1279d89d21873fe900061e55ef936 itself, it just exposes an existing issue)

Quuxplusone commented 3 years ago

I would also think the builtin on the call is a problem given that clang emits a call to a nobuiltin operator.

Quuxplusone commented 3 years ago

(In reply to Johannes Doerfert from comment #4)

I would also think the builtin on the call is a problem given that clang emits a call to a nobuiltin operator.

What makes things more difficult is that I do not think there's a way for Clang to tell whether a user-provided operator new/delete replaces the builtin ones, as the replacement could be defined anywhere in the program.

From https://en.cppreference.com/w/cpp/memory/new/operator_delete :

` Global replacements

The replaceable deallocation functions (1-12) are implicitly declared in each translation unit even if the header is not included. These functions are replaceable: a user-provided non-member function with the same signature defined anywhere in the program, in any source file, replaces the corresponding implicit version for the entire program. Its declaration does not need to be visible. `

Quuxplusone commented 3 years ago

I think the given sample code is valid, and the combination of optimizations performed here is incorrect.

There are two views the optimizer could take: (1) it can treat these 'builtin' calls as having builtin semantics and can optimize them, or (2) it can see through these calls to the underlying functions and propagate out information. Either in isolation would be correct, but we can't apply at any call site, because they might contradict each other.

I think view (1) is more valuable -- it's very rare (ignoring LTO situations for a moment) that we'll see the definition of operator new / operator delete, and LTO of the definition of operator new / operator delete is, IIRC, unsafe even considering only nobuiltin calls, because we still treat such calls as returning a noalias pointer (which they actually don't, if you can look through the implementation).

Moreover, I think this is a derefinement bug: when the optimizer assumes builtin semantics, it applies the broad unrefined semantics associated with that builtin, whereas the actual function definition for that builtin will provide a refinement of those semantics. So any time we perform IPO on a builtin function definition and then optimize on the basis of the builtin semantics, we have a derefinement issue.

So perhaps GlobalValue::mayBeDerefined() should return true for all builtin functions.

Quuxplusone commented 3 years ago

LangRef:

builtin This indicates that the callee function at a call site should be recognized as a built-in function, even though the function’s declaration uses the nobuiltin attribute. This is only valid at call sites for direct calls to functions that are declared with the nobuiltin attribute.

nobuiltin This indicates that the callee function at a call site is not recognized as a built-in function. LLVM will retain the original call and not replace it with equivalent code based on the semantics of the built-in function, unless the call site uses the builtin attribute. This is valid at call sites and on function declarations and definitions.

This reads to me as if the optimizer is supposed to assume builtin semantic at the call site here. Certainly, if we do IPO and thereby utilize the nobuilin implementation that can't hold anymore. So either forbid IPO across those call edges, e.g., Richard's mayBeDerefined() proposal for nobuiltin functions, or drop the builtin attribute (which is harder to do but might be an option down the line if IPO passes are explicitly aware of this situation).

Quuxplusone commented 3 years ago

Just wondering: is it valid/expected that 'clang' produces the call to @_Zd1Pv with the 'builtin' attribute here ?

Quuxplusone commented 3 years ago

(In reply to Jeroen Dobbelaere from comment #8)
> Just wondering: is it valid/expected that 'clang' produces the call to
> @_Zd1Pv with the 'builtin' attribute here ?

Forget my question, I clearly did not grasp the complete thread yesterday.

Is there already a consensus for a solution ?

Quuxplusone commented 3 years ago

I created a patch that blocks this at 'deadargelim': https://reviews.llvm.org/D97672

Quuxplusone commented 3 years ago

(In reply to Richard Smith from comment #6)

I think the given sample code is valid, and the combination of optimizations performed here is incorrect.

There are two views the optimizer could take: (1) it can treat these 'builtin' calls as having builtin semantics and can optimize them, or (2) it can see through these calls to the underlying functions and propagate out information. Either in isolation would be correct, but we can't apply at any call site, because they might contradict each other.

I think view (1) is more valuable -- it's very rare (ignoring LTO situations for a moment) that we'll see the definition of operator new / operator delete, and LTO of the definition of operator new / operator delete is, IIRC, unsafe even considering only nobuiltin calls, because we still treat such calls as returning a noalias pointer (which they actually don't, if you can look through the implementation).

Moreover, I think this is a derefinement bug: when the optimizer assumes builtin semantics, it applies the broad unrefined semantics associated with that builtin, whereas the actual function definition for that builtin will provide a refinement of those semantics. So any time we perform IPO on a builtin function definition and then optimize on the basis of the builtin semantics, we have a derefinement issue.

I think that sounds reasonable. IIUC we need to ensure a consistent view on the IR.

So perhaps GlobalValue::mayBeDerefined() should return true for all builtin functions.

Do you mean true for all nobuiltin functions? I think this should allow us to block problematic IPOs on functions marked as nobuiltin, but still allows callsites to use builtin. I put up https://reviews.llvm.org/D97735 which does this.

I think if there are no builtin calls, we could treat nobuiltin as we do now, but it is probably not worth the extra effort to check all call sites.

Quuxplusone commented 3 years ago

(In reply to Florian Hahn from comment #11)

Do you mean true for all nobuiltin functions?

I don't think nobuiltin has any bearing here. We'd have exactly the same issue if Clang didn't insert any builtin / nobuiltin attributes here. In the testcase, both calls are builtin calls.

Quuxplusone commented 2 years ago

_Bug 41039 has been marked as a duplicate of this bug._

Quuxplusone / LLVMBugzillaTest

Clang generates 'call builtin' call to 'nobuiltin' function for global replacement of operator delete; gets optimized to trap(). #47991