dart-lang / sdk

The Dart SDK, including the VM, JS and Wasm compilers, analysis, core libraries, and more.
https://dart.dev
BSD 3-Clause "New" or "Revised" License
10.06k stars 1.55k forks source link

More aggressive "unused variable" analysis would be a great time-saver #28292

Open Hixie opened 7 years ago

Hixie commented 7 years ago

It would be great if the analyzer could detect that in the following program, a, b, and c are doing nothing useful:

import 'dart:math';

void main() {
  int a = 0;
  a = a + 1;

  int b = 0;
  b = max(b, 1);

  int c = 0;
  if (c > 1)
    c = 1;
}

The combination of these three things were the only things standing in the way of me discovering that a whole bunch of code I'd written actually was just using CPU for no reason.

lrhn commented 7 years ago

The call to max is the odd one out here - it's not trivial to know which functions have no side-effects, and if it had been print(b) then it wouldn't be doing nothing.

Hixie commented 7 years ago

Yeah. Maybe we could have an annotation (which we could put on math.max), that the analyzer can use to know that the function is pure (without side effects) (or maybe it can figure that out itself, though that might be tough for dart:math).

eernstg commented 7 years ago

I think the concept is interesting, but it might need to be generalized to several variables to work really well (in the sense that the single-variable variant skips over a number of problems of a similar nature that might be it least as common and important to catch). Consider this example:

void main() {
  int a = 3;
  int b = 4;
  b += a;
  ... // a, b unused here
}

Let's assume that we can rule out side-effects as needed (say, because there are no invocations of user-defined methods, because there is a pure annotation (hypothetical, but let's assume it's checked and sound) on every invoked method, or something like that).

Now, the relevant concept would be that the data flow is statically known to stay within a given set of variables (here: {a, b}). Those variables can be used to update each other, possibly via complex and time consuming computations. But in the end all the computations are useless, because those computations could have produced different values, and program executions would behave identically (except possibly for timing), because none of those variables are used for anything else. To me, "using a to update a" is a special case of this situation which may not be very useful to single out.

We might very well want to detect this kind of situation. However, instead of trying to make the compiler make good guesses about which sets of variables to scrutinize for this type of "collective unusedness", I think we can attack the problem at a lower level: We might be able to unravel the problem from the "end of the dependency graph".

We could have a much simpler notion of unused value, similar to the notion of def/use properties in traditional compilation analysis, and we might receive a lot of help from the notion of programs in SSA form, because a program in that form essentially makes each computed value tangible as an SSA-variable.

In the above example we would simply detect that the value of b produced in b += a is unused. After removing that statement we would detect that b is completely unused and then we'd also get rid of a. This simple concept could allow developers to get rid of the more complex problems (like "collectively unused") step by step. Not so powerful, but simple and sufficient.

So I'd recommend that we go for detection of unused value, which would by the way fit rather nicely under the heading 'more aggressive unused variable analysis'. ;-)

MichaelRFairhurst commented 7 years ago

+1 on SSA-like analysis. This the type of stuff that good optimizers know to remove, definitely would create a more robust "unused" workflow.

As an added bonus this type of solution lends itself to detecting cases like:

void main() {
  var a = 1;
  print(a);
  a += 2; // The var 'a' is used, but this assignment is useless
}
srawlins commented 6 years ago

Don't all three examples need the @pure annotation?

void main() {
  int a = 0;
  a = a + 1;  // `operator +` might have side effects.

  int b = 0;
  b = max(b, 1); // `max` might have side effects.

  int c = 0;
  if (c > 1) // `operator >` might have side effects.
    c = 1;
}