Open Quuxplusone opened 3 years ago
Bugzilla Link | PR49022 |
Status | NEW |
Importance | P normal |
Reported by | Douglas Yung (douglas_yung@playstation.sony.com) |
Reported on | 2021-02-03 10:59:38 -0800 |
Last modified on | 2021-11-15 12:48:43 -0800 |
Version | trunk |
Hardware | PC All |
CC | bhramar.vatsa@synopsys.com, dblaikie@gmail.com, efriedma@quicinc.com, florian_hahn@apple.com, hstong@ca.ibm.com, htmldeveloper@gmail.com, jdoerfert@anl.gov, jeroen.dobbelaere@synopsys.com, johannes@jdoerfert.de, llvm-bugs@lists.llvm.org, llvm-dev@redking.me.uk, mikael.holmen@ericsson.com, nok.raven@gmail.com, paul_robinson@playstation.sony.com, richard-llvm@metafoo.co.uk, warren_ristow@playstation.sony.com, yuanfang.chen@sony.com |
Fixed by commit(s) | |
Attachments | |
Blocks | |
Blocked by | |
See also | PR41039 |
Ok, I think I know what the problem is here. Clang (roughly) the following IR
define dso_local void @_ZdlPv(i8* %p) nobuiltin {
ret void
}
define void @foo(i8 *%ptr) {
call void @_ZdlPv(i8* %ptr) builtin
ret void
}
Clang (correctly IIUC) adds nobuiltin
to the definition of _ZdlPv
, but the call to _ZdlPv
is marked as builtin.
So DeadArgElim use the definition to determine that the argument to _ZdlPv
is dead and updates the uses of the function to pass undef
instead.
Then comes InstCombine, which sees the builtin call to _ZdlPv and assumes this is UB and creates a store to undef.
The current behavior of CallBase::isNoBuiltin() is that the builtin
attribute on the call takes precedence over nobuiltin
on the function; so at the call it appears as we are calling a builtin function, not a nobuiltin one.
I am not sure if the definition of the operator delete
is valid, because it does not de-allocate memory allocated by new
. IIUC there's no way for clang to tell if a global replacement is called, because they could be in different translation unit.s
We could teach DeadArgElim that should not update 'builtin' calls of 'nobuiltin' functions, but I am not sure that's the right thing to do. (and there might be plenty of other IPO that have the same problem)
I am not sure if the definition of the
operator delete
is valid, because it does not de-allocate memory allocated bynew
.
The standard specifies that the memory is deallocated; it does not specify how this occurs, and I think it is out-of-scope for the compiler to make arbitrary decisions about what "counts" as a deallocation.
Why couldn't I have a single-use-and-throw-it-away policy for my memory allocator? In which case, an empty operator delete function is exactly the correct implementation (and amazingly efficient, besides).
(In reply to Paul Robinson from comment #2)
I am not sure if the definition of the
operator delete
is valid, because it does not de-allocate memory allocated bynew
.The standard specifies that the memory is deallocated; it does not specify how this occurs, and I think it is out-of-scope for the compiler to make arbitrary decisions about what "counts" as a deallocation.
Why couldn't I have a single-use-and-throw-it-away policy for my memory allocator? In which case, an empty operator delete function is exactly the correct implementation (and amazingly efficient, besides).
As I am said, I am not sure if it standard compliant or not.
Assuming it is, the next question is whether Clang is correct in marking the call as builtin
, when the function is marked as nobuiltin?
Transformations looking at the call to _ZdlPv
assume the builtin delete operator is called, not a user provided one, hence they assume it behaves like the builtin operator.
(I don't think this is a bug in 292077072ec1279d89d21873fe900061e55ef936 itself, it just exposes an existing issue)
I would also think the builtin
on the call is a problem given that clang emits a call to a nobuiltin
operator.
(In reply to Johannes Doerfert from comment #4)
I would also think the
builtin
on the call is a problem given that clang emits a call to anobuiltin
operator.
What makes things more difficult is that I do not think there's a way for Clang to tell whether a user-provided operator new/delete replaces the builtin ones, as the replacement could be defined anywhere in the program.
From https://en.cppreference.com/w/cpp/memory/new/operator_delete :
` Global replacements
The replaceable deallocation functions (1-12) are implicitly declared in each translation unit even if the
I think the given sample code is valid, and the combination of optimizations performed here is incorrect.
There are two views the optimizer could take: (1) it can treat these 'builtin' calls as having builtin semantics and can optimize them, or (2) it can see through these calls to the underlying functions and propagate out information. Either in isolation would be correct, but we can't apply at any call site, because they might contradict each other.
I think view (1) is more valuable -- it's very rare (ignoring LTO situations for a moment) that we'll see the definition of operator new / operator delete, and LTO of the definition of operator new / operator delete is, IIRC, unsafe even considering only nobuiltin calls, because we still treat such calls as returning a noalias pointer (which they actually don't, if you can look through the implementation).
Moreover, I think this is a derefinement bug: when the optimizer assumes builtin semantics, it applies the broad unrefined semantics associated with that builtin, whereas the actual function definition for that builtin will provide a refinement of those semantics. So any time we perform IPO on a builtin function definition and then optimize on the basis of the builtin semantics, we have a derefinement issue.
So perhaps GlobalValue::mayBeDerefined() should return true for all builtin functions.
LangRef:
builtin This indicates that the callee function at a call site should be recognized as a built-in function, even though the function’s declaration uses the nobuiltin attribute. This is only valid at call sites for direct calls to functions that are declared with the nobuiltin attribute.
nobuiltin This indicates that the callee function at a call site is not recognized as a built-in function. LLVM will retain the original call and not replace it with equivalent code based on the semantics of the built-in function, unless the call site uses the builtin attribute. This is valid at call sites and on function declarations and definitions.
This reads to me as if the optimizer is supposed to assume builtin semantic at the call site here. Certainly, if we do IPO and thereby utilize the nobuilin
implementation that can't hold anymore. So either forbid IPO across those call edges, e.g., Richard's mayBeDerefined() proposal for nobuiltin
functions, or drop the builtin
attribute (which is harder to do but might be an option down the line if IPO passes are explicitly aware of this situation).
Just wondering: is it valid/expected that 'clang' produces the call to @_Zd1Pv with the 'builtin' attribute here ?
(In reply to Jeroen Dobbelaere from comment #8)
> Just wondering: is it valid/expected that 'clang' produces the call to
> @_Zd1Pv with the 'builtin' attribute here ?
Forget my question, I clearly did not grasp the complete thread yesterday.
Is there already a consensus for a solution ?
I created a patch that blocks this at 'deadargelim': https://reviews.llvm.org/D97672
(In reply to Richard Smith from comment #6)
I think the given sample code is valid, and the combination of optimizations performed here is incorrect.
There are two views the optimizer could take: (1) it can treat these 'builtin' calls as having builtin semantics and can optimize them, or (2) it can see through these calls to the underlying functions and propagate out information. Either in isolation would be correct, but we can't apply at any call site, because they might contradict each other.
I think view (1) is more valuable -- it's very rare (ignoring LTO situations for a moment) that we'll see the definition of operator new / operator delete, and LTO of the definition of operator new / operator delete is, IIRC, unsafe even considering only nobuiltin calls, because we still treat such calls as returning a noalias pointer (which they actually don't, if you can look through the implementation).
Moreover, I think this is a derefinement bug: when the optimizer assumes builtin semantics, it applies the broad unrefined semantics associated with that builtin, whereas the actual function definition for that builtin will provide a refinement of those semantics. So any time we perform IPO on a builtin function definition and then optimize on the basis of the builtin semantics, we have a derefinement issue.
I think that sounds reasonable. IIUC we need to ensure a consistent view on the IR.
So perhaps GlobalValue::mayBeDerefined() should return true for all builtin functions.
Do you mean true for all nobuiltin
functions? I think this should allow us to block problematic IPOs on functions marked as nobuiltin
, but still allows callsites to use builtin
. I put up https://reviews.llvm.org/D97735 which does this.
I think if there are no builtin
calls, we could treat nobuiltin
as we do now, but it is probably not worth the extra effort to check all call sites.
(In reply to Florian Hahn from comment #11)
Do you mean true for all
nobuiltin
functions?
I don't think nobuiltin
has any bearing here. We'd have exactly the same issue if Clang didn't insert any builtin
/ nobuiltin
attributes here. In the testcase, both calls are builtin
calls.
_Bug 41039 has been marked as a duplicate of this bug._