dart-lang / sdk

The Dart SDK, including the VM, JS and Wasm compilers, analysis, core libraries, and more.
https://dart.dev
BSD 3-Clause "New" or "Revised" License
10.06k stars 1.55k forks source link

@debug annotation as a tree shaking hint #35303

Open mdebbar opened 5 years ago

mdebbar commented 5 years ago

Following the discussion on https://github.com/flutter/flutter/pull/23964, it seems like having an @debug annotation would be a good solution to guarantee certain methods are tree-shaken out.

The idea is to introduce an @debug annotation (name is up for discussion) that the compiler could use as a hint to tree-shake the method out.

class A {
  @debug
  toString() {
    return '$runtimeType and some other fields that aren't used anywhere else';
  }
}

Because the toString name is used a lot, it's not likely that it'll be automatically tree-shaken out. Which leads to all the fields it's accessing to be retained as well. The @debug annotation here means that the developer doesn't want this method to be included in the release build (both binary AOT and dart2js).

Based on the above-linked Flutter PR, this can lead to significant binary size gains (37KB compressed for some Flutter apps).

In the case of toString it should be safe to remove it, and it'll fallback to its super.toString() (and that should be defined for all objects). In other situations though, it may not be safe, and the compiler should warn or error in such case. For type-checking purposes, any @debug method can call other @debug methods. But non-@debug methods can't see or call @debug methods (unless the super class has a method with the same name).

yjbanov commented 5 years ago

/cc @Hixie, @goderbauer, @dnfield, who participated in the original discussion /cc @leafpetersen, because this might actually need language-level support (similar to assert).

mraleph commented 5 years ago

Instead of adding this as a language feature, I recommend to simply write a Flutter specific Kernel-to-Kernel transformation that would remove those methods - it will at worst take couple days of work for a person who knows how to hold Kernel so you can have it tomorrow. If you go with language-feature route it will probably take months.

Yes, you would not get analyzer support from that, but you can still report an error in build time from the transform if you so choose.

Hixie commented 5 years ago

If we add a kernel transform to the Flutter build system, it will essentially be a language feature in all but name. I'd rather we engage the actual language process rather than short-circuit it and saddle Dart with something that may not be what the language team would have designed.

dnfield commented 5 years ago

What if this was handled as a conditional compilation instruction insted?

e.g. (using C-style syntax):

#if DEBUG
String toString() {
 return '$runtimeType and some other fields that aren't used anywhere else';
}
#endif

I suppose an attribute could still serve that purpose (e.g. applying it to the following scope or line based on a condition). But this seems related to #33249 to me.

yjbanov commented 5 years ago

But non-@debug methods can't see or call @debug methods (unless the super class has a method with the same name).

This is too restrictive. @debug methods should be callable from within assert. Flutter has a ton of debug* methods and fields and they are mostly called from production methods and functions. They do wrap such calls in asserts.

mdebbar commented 5 years ago

@yjbanov for the case of toString() it's not a problem because the inheritance chain will always yield a class that implements it. So by putting @debug, you're just saying, use my super's implementation.

As for other cases, what would the return value be when the whole body is wrapped in an assertion? I don't think this is something the @debug can do automatically.

Do you have specific examples that can help me understand better?

goderbauer commented 5 years ago

Which leads to all the fields it's accessing to be retained as well.

As far as I know the AOT treeshaker currently does not remove unused fields at all. Would be cool if it would, though, especially if this gets implemented and makes even more fields unused in release mode. I've filed https://github.com/dart-lang/sdk/issues/35310 for the treeshaking issue.

yjbanov commented 5 years ago

@mdebbar

As for other cases, what would the return value be when the whole body is wrapped in an assertion?

There would be no return value. The whole function/method would be gone.

Do you have specific examples that can help me understand better?

Let's take _debugReserveFor. It is called only from within an assert. When marked as @debug the compiler removes the method in profile and release modes. The code continues to compile and run correctly because there are no references to the method in those modes.

leafpetersen commented 5 years ago

This is pretty much conditional compilation. Code that is valid when the debug methods are included may not be valid afterwards.

class A {
  void foo(int x) { x.isEven; }
}
class B extends A {
  @debug
  void foo(Object x) {}
}
void test() {
  new B().foo("hello");
}

This code is legal when the override is included, but is broken if the override is not included.

So I don't think doing this just as a kernel transform is a good idea - unless you first typecheck the code with the appropriate debug setting, you can't be sure that you won't be feeding arbitrarily bad code to the back ends. If it's done as an annotation, at the least the CFE needs to understand the annotations and typecheck appropriately.

mraleph commented 5 years ago

@leafpetersen arguably you can just restrict yourself to a very practical subset: e.g. outlaw this sort of override altogether and issue an error if @debug foo overrides a foo with non-equal signature.

My suggestion about doing it as kernel transformation is based on practical considerations (not everything has to be a language feature).

leafpetersen commented 5 years ago

@mraleph I don't mean to imply that it has to be a language feature (though I agree with @Hixie that this is likely to become a de facto language feature, and that that's unfortunate). You can figure out the set of restrictions that you need in order to guarantee coherence between configurations, or you can just check that the program is correct in the current configuration, but either way, there's some kind of static checking implied that needs to understand these annotations: either to enforce the coherence conditions, or to check the post-processed program.

I frankly think that figuring out the right set of restrictions that you need in order to guarantee that the debug program and the non-debug program are type equivalent is probably more work than just implementing the conditional compilation, but the user experience is probably better as well, so maybe worth it.

If the proposal is to release this into the wild without doing the work to make this sane though, then I am pretty opposed. Presumably this is not the proposal?

Some things that come to mind that need to be thought through to try to make this sane:

matanlurey commented 5 years ago

Mild plug of https://github.com/dart-lang/sdk/issues/33920 - if customers knew what could be expected to be tree-shaken, I imagine it wouldn't have taken this long to file bugs that stuff they expected to be tree-shaken is not.

@leafpetersen: Is there a (non-resource) reason that the language team couldn't outline a set of tree-shaking guidelines (not a strict specification). For example, it would be nice for both Dart AOT and Dart2JS for implementation teams to read and try to conform to:

Fields.

Fields that are never read from are removed, and all writes to them are removed as a side-effect.

...

leafpetersen commented 5 years ago

@matanlurey Well, there's no non-resource reason that the language team couldn't outline a set of pony-gifting guidelines as well specifying that I should get a pony, but I don't think I'm going to be building a stable any time soon... :)

More seriously, I don't think that tree-shaking shortcomings (where they exist) in the compilers is because the implementation teams in question don't know or care. I'm sure that if this rises to the top of the priority queue there are improvements that could be made, but ultimately if we want tree-shaking to be dramatically better we probably need to do actual work in the language to make it so.

matanlurey commented 5 years ago

I don't think that tree-shaking shortcomings (where they exist) in the compilers is because the implementation teams in question don't know or care.

Totally agree. But a lot of this seems like it could have been mitigated by customers knowing what tree-shaking is capable of doing, what it isn't capable of doing, and requesting enhancements when needed. Instead it feels like a game of catch up - client teams have spent years developing code they later find out is not tree-shaking :-/

lrhn commented 5 years ago

Having two programs in one source code is always going to be fragile. If I wanted it to be statically safe, I'd probably go for a full two-level language where the non-debug code cannot see debug code, and there is no way for debug code to be callable from non-debug code. (That would mean no overriding non-debug signatures with debug implementations, so the original toString example is out). Probably not very usable in practice.

Another approach is to only require run-time consistency, and transform all debug members into something that throws when used. You'll at least catch it immediately if you do use debug code from non-debug code. Fairly unsafe, though, the error mode is crashing in production.

Tree shaking is hard (whether code is reachable is, fundamentally, a Turing complete problem). It only removes code that definitely will never be called (it's not a program transformation, it's just optimized compilation, compiling the entire thing into zero bytes), and as long as it's not changing program behavior, it's safe. Just removing methods marked as @debug is not a semantics preserving transformation. I don't see a good and simple solution. It is a hard problem.

leafpetersen commented 5 years ago

Just removing methods marked as @debug is not a semantics preserving transformation.

I don't think anyone is looking for a semantics preserving transformation. My concern is just ensuring that it's a sound transformation.

yjbanov commented 5 years ago

Having to live with two programs in one source is something we're willing to live with given the ROI. In practice, a sufficiently complex program (e.g. any Flutter app) has virtually infinite number of programs in it, simply because the behavior depends on a lot of the environmental factors. An app running with assistive technology enabled is very different from one without it. Similarly you have connected vs offline, Android vs iOS vs Web, portrait vs landscape, to name a few.

I think the property we'd want is that the failure modes are well understood and are caught early, preferably during compilation. When I get an error compiling the app in release mode, will I understand quickly enough what's going on? @leafpetersen's intuition to look for non-obvious cases is the right one. However, I'd look for cases where the mere presence of @debug alters runtime behavior (i.e. not the contents of the debug code). Those I'd be worried about. The specific example with the overridden foo method looks OK to me. If new B().foo("hello"); is inside debug code then there's no error. If it's in release code then I get an error from the compiler saying that A.foo doesn't take String. I can figure out what's going on by simply reading the code.

Hixie commented 5 years ago

Flutter programs are already two programs in one source — debug mode runs a ton more code than release mode (we use the assert language feature to do this). This is just about giving us the ability to tree-shake the debug program out of the release program.

leafpetersen commented 5 years ago

The specific example with the overridden foo method looks OK to me. If new B().foo("hello"); is inside debug code then there's no error. If it's in release code then I get an error from the compiler saying that A.foo doesn't take String. I can figure out what's going on by simply reading the code.

Just be sure we're on the same page, my main concern is that the failure mode of doing this as a kernel transform (without static checking) is not an error message saying that A.foo doesn't take String, but instead at best a compiler crash, and at worst memory corruption or a core dump at runtime.

leafpetersen commented 5 years ago

Flutter programs are already two programs in one source

One direction this could lead us in is explicitly adding something like a pre-processor + macros. This has come up in discussions around code generation as well.

dnfield commented 5 years ago

Preprocessing is really what this should be. And ideally it would handle not just the compilation mode but other variables as well, such as the Flutter or Dart SDK version.

yjbanov commented 5 years ago

Preprocessing or a macro system of some sort could also help tree-shaking unused platform-specific code, e.g. Cupertino widgets in a Material Design app and vice-versa. It could also help us plug in Web-specific things in Hummingbird without affecting native functionality.

One important aspect to consider is how this can work with Bazel.

jonahwilliams commented 5 years ago

I think experimenting with this as a kernel transform is a reasonable step to improve flutter code size. @Hixie To prevent this from being a de-facto language feature, we can simply restrict the kernel transformation to the flutter core libraries, where we can verify each member that we want to remove. Though this will cap out at whatever the size of all the debug code flutter uses.

My gut feeling is that most applications don't have anywhere near the amount of debug code that flutter does, and any code that has been accidentally retained is also going to be difficult to prove safe to remove.

Hixie commented 5 years ago

The supermixin experiment became a de facto language feature even with only flutter using it. It's precisely that that I'm trying to avoid.

dnfield commented 4 years ago

I wrote a simple kernel transformer to change all overrides of toString in dart:ui and package:flutter to just be return super.toString():

class _ToStringVisitor extends RecursiveVisitor<void> {
  @override
  void visitProcedure(Procedure node) {
    if (node.name.name == 'toString' && node.enclosingLibrary != null) {
      assert(node.enclosingClass != null);
      final String importUri = node.enclosingLibrary.importUri.toString();
      if (importUri.startsWith('dart:ui') || importUri.startsWith('package:flutter/')) {
        // print('Removing ${node.location}');
        node.function.replaceWith(
          FunctionNode(
            ReturnStatement(
              SuperMethodInvocation(
                node.name,
                Arguments(<Expression>[]),
              ),
            ),
          ),
        );
      }
    }
    super.visitProcedure(node);
  }
}

class _ToStringTransformer extends frontend.ProgramTransformer {
  _ToStringTransformer(this.child);

  final frontend.ProgramTransformer child;

  @override
  void transform(Component component) {
    if (child is! _ToStringTransformer)
      component.visitChildren(_ToStringVisitor());
    child?.transform(component);
  }
}

In local testing, this reduces our AOT snapshot size on the gallery by about 200kb. I think this is worth doing.

dnfield commented 4 years ago

From a manual audit, I can't find a toString that would fail in release mode with this.

Unfortunately, I'm also trying to think about what this could end up doing if someone ever did add a meaningful toString to the framework or dart:ui. I'd like to think we'd catch that in code review, but I don't think we should trust code review for that.

Trying to think if there's some way this transformer could also look at calls to .toString and maybe warn you, or fail to drop them in such a case.

yjbanov commented 4 years ago

@dnfield Would it be possible to try this on a Web build? 200kb on the Web would be huge.

dnfield commented 4 years ago

I tried but it looks like web builds aren't using the front end compiler we provide. Jonah says it should be possible though, I'll try to sync up with him and find out what's up there.

mdebbar commented 4 years ago

@dnfield have you tried removing the toString method altogether (instead of replacing its implementation with return super.toString())? I think you can do node.remove().

dnfield commented 4 years ago

@mdebbar that's more complicated, you then have to find callsites and fix those

dnfield commented 4 years ago

And I suspect that this just gets optimized to nothing anyway

mdebbar commented 4 years ago

@mdebbar that's more complicated, you then have to find callsites and fix those

I'm not sure I understand. When you remove the toString method from a class, it'll just inherit it from the super class. This works fine:

class A {
  // No `toString` method.
}

class B {
  @override
  String toString() {
    return super.toString();
  }
}

void main() {
  print(A().toString()); // prints "Instance of 'A'"
  print(B().toString()); // prints "Instance of 'B'"
}
dnfield commented 4 years ago

Yes, but calling remove in the kernel doesn't quite work that way

dnfield commented 4 years ago

https://github.com/flutter/engine/pull/17068

mkustermann commented 4 years ago

I wrote a simple kernel transformer to change all overrides of toString in dart:ui and package:flutter to just be return super.toString(): .... In local testing, this reduces our AOT snapshot size on the gallery by about 200kb. I think this is worth doing.

I would assume this is also possible without any kernel transformer:

const bool isRelease = const bool.fromEnvironment('dart.vm.product');

String toString() {
  if (isRelease) return super.toString();
  <normal body>
}

Would that work?

(Of course if one actually wants the toString() method definition to go away entirely, then it's a language change)

dnfield commented 4 years ago

@mkustermann it becomes very verbose, and people forget to add it.

mkustermann commented 4 years ago

@dnfield It's one line (as is @debug), only a little bit longer. People may also forget about adding @debug :)

Hixie commented 4 years ago

I don't think we'd want an annotation you have to put on every class. We'd probably want to just mark the base Diagnostics class as debug-only and have all its methods and any subclass overridden version of those methods be marked as needing to be removed.