Quuxplusone / LLVMBugzillaTest

0 stars 0 forks source link

ExprMutationAnalyzer mishandles templated callback lambdas #37927

Open Quuxplusone opened 6 years ago

Quuxplusone commented 6 years ago
Bugzilla Link PR38954
Status NEW
Importance P enhancement
Reported by Roman Lebedev (lebedev.ri@gmail.com)
Reported on 2018-09-14 23:28:06 -0700
Last modified on 2018-09-17 03:41:32 -0700
Version trunk
Hardware PC Linux
CC development@jonas-toth.eu, llvm-bugs@lists.llvm.org, shuaiwang@google.com
Fixed by commit(s)
Attachments
Blocks PR38890
Blocked by
See also PR38955
template<typename Lambda> void withCallback(Lambda l) {
  for(int i = 0; i < 4; i++) {
    l(i);
  }
}
void test() {
  auto l = [](int i){};
  withCallback(l);
}

warning: variable is mutated in the loop body and loop iteration expression
[bugprone-loop-variable-mutations]
  for(int i = 0; i < 4; i++) {
          ^
note: loop in question
  for(int i = 0; i < 4; i++) {
  ^
note: mutation in loop body occurs here
    l(i);
      ^
note: mutation in loop iteration expression occurs here
  for(int i = 0; i < 4; i++) {
                        ^
Quuxplusone commented 6 years ago

I think the clangtidy check should skip uninstantiated template (and macros fwiw), because:

Take this case as a concrete example: it just so happens that the instantiated version in test has a non-mutating Lambda. withCallback can be also called with auto l2 = [](int& i) { i = 123; }; withCallback(l2); So we can't reliably tell whether the expression l(i) mutates i or not by just looking at the template definition.

Thinking deeper about this, in order to be extra smart about uninstantiated template, we can look at not only the template definition itself but also all of its instantiated version. So that immediately rules out templates in header files because we can't see all of its instantiations within a single TU. How about template defined in .cpp then? At least we can iterate through all the instantiations, but the difficulty is that there doesn't seem to be a way to create a mapping between different instantiated versions of the same expression. For example: template void g(F f) { int x; f(x); } void h() { g( {}); g( {}); } We'll have three FunctionDecl of g, first one being the template definition itself, and the rest two being the two instantiated versions in h. Each FunctionDecl has its own CompoundStmt as function body and its own DeclRefExpr to x within the body. And no trace has left to map between these DeclRefExpr. Which means even if we figured out all instantiated version of x are not mutated, we can't map that information back to the uninstantiated version of x.

Quuxplusone commented 6 years ago

(In reply to Shuai Wang from comment #1)

I think the clangtidy check should skip uninstantiated template (and macros fwiw), because:

  • In template (and macro), something can be replaced and completely change the behavior, in other words there's no single stable behavior to analyze.
  • ExprMutationAnalyzer has to lean on safer side and pretty much anything {type,value}-dependent results in a mutation. I'm not following, this bugreport isn't about uninstantiated templates, it is about very much instantiated templates. So far i haven't seen any false-positives due to the uninstantiated templates.

Take this case as a concrete example: it just so happens that the instantiated version in test has a non-mutating Lambda. withCallback can be also called with auto l2 = [](int& i) { i = 123; }; withCallback(l2);

So we can't reliably tell whether the expression l(i) mutates i or not by just looking at the template definition. BTW, would it be possible for the EMA to be more descriptive? It would be great if it would not simply say "i give up, i guess this mutates", but instead say "i give up, this may or may not be mutating.".

Thinking deeper about this, in order to be extra smart about uninstantiated template, we can look at not only the template definition itself but also all of its instantiated version. So that immediately rules out templates in header files because we can't see all of its instantiations within a single TU. How about template defined in .cpp then? At least we can iterate through all the instantiations, but the difficulty is that there doesn't seem to be a way to create a mapping between different instantiated versions of the same expression. For example: template void g(F f) { int x; f(x); } void h() { g( {}); g( {}); } We'll have three FunctionDecl of g, first one being the template definition itself, and the rest two being the two instantiated versions in h. Each FunctionDecl has its own CompoundStmt as function body and its own DeclRefExpr to x within the body. And no trace has left to map between these DeclRefExpr. Which means even if we figured out all instantiated version of x are not mutated, we can't map that information back to the uninstantiated version of x.

I'm probably lacking a lot of context, but i really do think we want the per-instantiation knowledge about mutations, not for the whole templated function.

Quuxplusone commented 6 years ago

(In reply to Roman Lebedev from comment #2)

(In reply to Shuai Wang from comment #1)

I think the clangtidy check should skip uninstantiated template (and macros fwiw), because:

  • In template (and macro), something can be replaced and completely change the behavior, in other words there's no single stable behavior to analyze.
  • ExprMutationAnalyzer has to lean on safer side and pretty much anything {type,value}-dependent results in a mutation. I'm not following, this bugreport isn't about uninstantiated templates, it is about very much instantiated templates. So far i haven't seen any false-positives due to the uninstantiated templates. Well you can't really issue any warning and/or fixit to any instantiations right? Instantiations are implicitly generated by the compiler and not written in code, what's written in code is always uninstantiated. The fact that there's one instantiation of withCallback caused by test doesn't make the raw definition of withCallback instantiated and any warnings issues on that piece of code, which is uninstantiated, based on any particular instantiation would be wrong.

Take this case as a concrete example: it just so happens that the instantiated version in test has a non-mutating Lambda. withCallback can be also called with auto l2 = [](int& i) { i = 123; }; withCallback(l2);

So we can't reliably tell whether the expression l(i) mutates i or not by just looking at the template definition. BTW, would it be possible for the EMA to be more descriptive? It would be great if it would not simply say "i give up, i guess this mutates", but instead say "i give up, this may or may not be mutating.". I'll think about it, "maybe" would probably equal to "findMutation() returns a {type,value}-dependent expression", if that's indeed the case then the caller of EMA could handle that trivially.

Thinking deeper about this, in order to be extra smart about uninstantiated template, we can look at not only the template definition itself but also all of its instantiated version. So that immediately rules out templates in header files because we can't see all of its instantiations within a single TU. How about template defined in .cpp then? At least we can iterate through all the instantiations, but the difficulty is that there doesn't seem to be a way to create a mapping between different instantiated versions of the same expression. For example: template void g(F f) { int x; f(x); } void h() { g( {}); g( {}); } We'll have three FunctionDecl of g, first one being the template definition itself, and the rest two being the two instantiated versions in h. Each FunctionDecl has its own CompoundStmt as function body and its own DeclRefExpr to x within the body. And no trace has left to map between these DeclRefExpr. Which means even if we figured out all instantiated version of x are not mutated, we can't map that information back to the uninstantiated version of x.

I'm probably lacking a lot of context, but i really do think we want the per-instantiation knowledge about mutations, not for the whole templated function. You can get per-instantiation results already, simply pass the CompoundStmt of the instantiated version when constructing EMA. But how would you issue warnings/suggestions? Diags are issues to a particular source code location and that piece of source code is the uninstantiated version.

FYI, for the above example involving g and h, here's what the AST looks like: FunctionTemplateDecl | |-TemplateTypeParmDecl | |-FunctionDecl <------ This is the uninstantiated version | | |-ParmVarDecl | | -CompoundStmt <---- If passing this as "scope" when constructing EMA, EMA will assume CallExpr mutates | | |-DeclStmt | | |-VarDecl | | -CallExpr | | |-DeclRefExpr | |-DeclRefExpr | |-FunctionDecl <-------- This is the first instantiation | | |-TemplateArgument | | |-ParmVarDecl | | -CompoundStmt <------ Passing this to EMA will get "not mutated" result, because first instantiation has a non-mutating lambda | | |-DeclStmt | | |-VarDecl | | -CXXOperatorCallExpr | | |-ImplicitCastExpr | | |-DeclRefExpr | | |-ImplicitCastExpr | | | -DeclRefExpr | |-ImplicitCastExpr | | -DeclRefExpr |-FunctionDecl <-------- This is the second instantiation | |-TemplateArgument | |-ParmVarDecl | -CompoundStmt <------ Passing this the EMA will get "mutated" result. | |-DeclStmt | |-VarDecl | -CXXOperatorCallExpr | |-ImplicitCastExpr | |-DeclRefExpr | |-ImplicitCastExpr | | -DeclRefExpr |-DeclRefExpr

Quuxplusone commented 6 years ago

(In reply to Shuai Wang from comment #3)

(In reply to Roman Lebedev from comment #2)

(In reply to Shuai Wang from comment #1)

I think the clangtidy check should skip uninstantiated template (and macros fwiw), because:

  • In template (and macro), something can be replaced and completely change the behavior, in other words there's no single stable behavior to analyze.
  • ExprMutationAnalyzer has to lean on safer side and pretty much anything {type,value}-dependent results in a mutation. I'm not following, this bugreport isn't about uninstantiated templates, it is about very much instantiated templates. So far i haven't seen any false-positives due to the uninstantiated templates. Well you can't really issue any warning and/or fixit to any instantiations right? Instantiations are implicitly generated by the compiler and not written in code, what's written in code is always uninstantiated. The fact that there's one instantiation of withCallback caused by test doesn't make the raw definition of withCallback instantiated and any warnings issues on that piece of code, which is uninstantiated, based on any particular instantiation would be wrong.

Oh wait, i think i'm looking at my own testcase wrong. For the uninstantiated withCallback() itself, i think it should return "i don't know".

The deeper problem here, is that EMA does not provide any context/explanation of results. It would be great if it could record the logic that lead to the decision that mutation happens, and be able to dump it.

Quuxplusone commented 6 years ago

(In reply to Roman Lebedev from comment #4)

(In reply to Shuai Wang from comment #3)

(In reply to Roman Lebedev from comment #2)

(In reply to Shuai Wang from comment #1)

I think the clangtidy check should skip uninstantiated template (and macros fwiw), because:

  • In template (and macro), something can be replaced and completely change the behavior, in other words there's no single stable behavior to analyze.
  • ExprMutationAnalyzer has to lean on safer side and pretty much anything {type,value}-dependent results in a mutation. I'm not following, this bugreport isn't about uninstantiated templates, it is about very much instantiated templates. So far i haven't seen any false-positives due to the uninstantiated templates. Well you can't really issue any warning and/or fixit to any instantiations right? Instantiations are implicitly generated by the compiler and not written in code, what's written in code is always uninstantiated. The fact that there's one instantiation of withCallback caused by test doesn't make the raw definition of withCallback instantiated and any warnings issues on that piece of code, which is uninstantiated, based on any particular instantiation would be wrong.

Oh wait, i think i'm looking at my own testcase wrong. For the uninstantiated withCallback() itself, i think it should return "i don't know".

The deeper problem here, is that EMA does not provide any context/explanation of results. It would be great if it could record the logic that lead to the decision that mutation happens, and be able to dump it.

How much & what kind of information do you need? (Right now EMA gives out a Stmt which mutates the given Expr, and in the case of indirect mutation via a series of reference binding (e.g. int x; int& r = x; r = 10;) findMutation(x) returns DeclRefExpr(r) and findMutation(DeclRefExpr(r)) returns "r = 10" so that you can follow the chain of reference binding.) I'm hesitating on "I'm not sure" result because I think the only place where "I'm not sure" is correct is {type,value}-dependent (aka template), all other "I'm not sure" should be fixed.

Quuxplusone commented 6 years ago

(In reply to Shuai Wang from comment #5)

(In reply to Roman Lebedev from comment #4)

(In reply to Shuai Wang from comment #3)

(In reply to Roman Lebedev from comment #2)

(In reply to Shuai Wang from comment #1)

I think the clangtidy check should skip uninstantiated template (and macros fwiw), because:

  • In template (and macro), something can be replaced and completely change the behavior, in other words there's no single stable behavior to analyze.
  • ExprMutationAnalyzer has to lean on safer side and pretty much anything {type,value}-dependent results in a mutation. I'm not following, this bugreport isn't about uninstantiated templates, it is about very much instantiated templates. So far i haven't seen any false-positives due to the uninstantiated templates. Well you can't really issue any warning and/or fixit to any instantiations right? Instantiations are implicitly generated by the compiler and not written in code, what's written in code is always uninstantiated. The fact that there's one instantiation of withCallback caused by test doesn't make the raw definition of withCallback instantiated and any warnings issues on that piece of code, which is uninstantiated, based on any particular instantiation would be wrong.

Oh wait, i think i'm looking at my own testcase wrong. For the uninstantiated withCallback() itself, i think it should return "i don't know".

The deeper problem here, is that EMA does not provide any context/explanation of results. It would be great if it could record the logic that lead to the decision that mutation happens, and be able to dump it.

How much & what kind of information do you need? (Right now EMA gives out a Stmt which mutates the given Expr, and in the case of indirect mutation via a series of reference binding (e.g. int x; int& r = x; r = 10;) findMutation(x) returns DeclRefExpr(r) and findMutation(DeclRefExpr(r)) returns "r = 10" so that you can follow the chain of reference binding.) I was just thinking out loud. I don't have any concrete ideas/needs.

I'm hesitating on "I'm not sure" result because I think the only place where "I'm not sure" is correct is {type,value}-dependent (aka template), all other "I'm not sure" should be fixed.

I agree. But how soon will that happen, how soon all of them will be fixed? Explanation:

Right now, when EMA can not analyze something, it pessimistically assumes that the mutation happens. It may be the correct decision/answer, given the right question. E.g. for the Jonas's const-correctness check, where the question is, "do we know this doesn't mutate?" But if the question is the opposite one, i.e. "do we know this does mutate?", well, now you are mixing together both the actual mutations, and the EMA analysis limitations.

Quuxplusone commented 6 years ago
(In reply to Roman Lebedev from comment #6)
> (In reply to Shuai Wang from comment #5)
> > I'm hesitating on "I'm not sure" result because I think the only place where
> > "I'm not sure" is correct is {type,value}-dependent (aka template), all
> > other "I'm not sure" should be fixed.
>
> I agree. But how soon will that happen, how soon *all* of them will be fixed?
> Explanation:
>
> Right now, when EMA can not analyze something, it pessimistically assumes
> that the mutation happens.
> It *may* be the correct decision/answer, given the right *question*.
> E.g. for the Jonas's const-correctness check, where the question is, "do we
> *know* this doesn't mutate?"
> But if the question is the opposite one, i.e. "do we *know* this *does*
> mutate?",

> well, now you are mixing together both the actual mutations, and the EMA
> analysis limitations.

Err, what i meant is, the pessimistic answer "i don't know,
i guess this mutates" always mixes together both the actual mutations,
and the EMA analysis limitations.

If we are looking for "no mutation happens", then we will *only* get
it when there was no analysis limitations, and the analysis concluded
that it does not mutate.

While if we are looking for "mutation does happen", we will get it both
when the analysis concluded that the mutation actually happens, and when
the analysis limitation was hit.
Quuxplusone commented 6 years ago
(In reply to Roman Lebedev from comment #7)
> (In reply to Roman Lebedev from comment #6)
> > (In reply to Shuai Wang from comment #5)
> > > I'm hesitating on "I'm not sure" result because I think the only place
where
> > > "I'm not sure" is correct is {type,value}-dependent (aka template), all
> > > other "I'm not sure" should be fixed.
> >
> > I agree. But how soon will that happen, how soon *all* of them will be
fixed?
> > Explanation:
> >
> > Right now, when EMA can not analyze something, it pessimistically assumes
> > that the mutation happens.
> > It *may* be the correct decision/answer, given the right *question*.
> > E.g. for the Jonas's const-correctness check, where the question is, "do we
> > *know* this doesn't mutate?"
> > But if the question is the opposite one, i.e. "do we *know* this *does*
> > mutate?",
>
> > well, now you are mixing together both the actual mutations, and the EMA
> > analysis limitations.
>
> Err, what i meant is, the pessimistic answer "i don't know,
> i guess this mutates" always mixes together both the actual mutations,
> and the EMA analysis limitations.
>
> If we are looking for "no mutation happens", then we will *only* get
> it when there was no analysis limitations, and the analysis concluded
> that it does not mutate.
>
> While if we are looking for "mutation does happen", we will get it both
> when the analysis concluded that the mutation actually happens, and when
> the analysis limitation was hit.

I get your point, in fact I have the exact same thought when looking at your
use case :)
Indeed all previous use case required leaning towards "assume mutation when
unsure" and your check wants the other way around.
On the bright side I think we're actually not far away, the only major area
this "assume mutation when unsure" behavior happens is around pointers, I'm
reasonable sure once we get that fixed there would be no "maybe" results
(except template.)
Of course I could be wrong, so I'm very eager to see how EMA performs once
findPointeeMutation is implemented in order to verify. Basically I'm not ruling
out the option of adding a "maybe" result but I'd like to see how far we can
push forward without it.
Quuxplusone commented 6 years ago
> I get your point, in fact I have the exact same thought when looking at your
> use case :)
> Indeed all previous use case required leaning towards "assume mutation when
> unsure" and your check wants the other way around.

Yup, I guess that was our starting point and on that side we are pretty far
already!

> On the bright side I think we're actually not far away, the only major area
> this "assume mutation when unsure" behavior happens is around pointers, I'm
> reasonable sure once we get that fixed there would be no "maybe" results
> (except template.)

I would like to add to the template problems: Once there are concepts in place
it is in principle possible to analyze type and value-dependent expression
under their concept constraints.

For value-dependent expressions: Given the values are int-like the analysis
should be possible. But in my opinion its ok to ignore these for now, because
in the template instantiations the value-dependendness is gone.

> Of course I could be wrong, so I'm very eager to see how EMA performs once
> findPointeeMutation is implemented in order to verify. Basically I'm not
> ruling out the option of adding a "maybe" result but I'd like to see how far
> we can push forward without it.

Totally agree. Given the static analyzer guys were interested in using it as
well, we will probably find all false positives/negatives and fix them. The
question 'is this thing mutated' is binary, because the compiler knows when you
are mutating a 'const' variable. So the analyzer should figure this out as well
(- type-dependent without concepts as nothing is known there)