dart-lang / language

Design of the Dart language
Other
2.67k stars 205 forks source link

Allow objects to be eliminated #306

Open eernstg opened 5 years ago

eernstg commented 5 years ago

[Edit: Adjusted DateTime examples to create a situation which is directly relevant for a 'static class'.]

There are some situations where an object is created in order to hold some data, and some instance methods are invoked on that object in order to compute results or perform actions based on that data. But the object itself is not needed for any other purpose than holding the given data during those method invocations. For instance:

main() {
  var currentMonth = DateTime.now().month;
  var moonLandingDay = DateTime.parse("1969-07-20 20:18:04Z").weekday;
  print("Current month: $currentMonth, Moon landing day: $moonLandingDay");

  var fullString = (StringBuffer()
      ..write("Use a StringBuffer")
      ..writeAll(["for ", "efficient ", "string ", "creation "])
      ..write("if you are ")
      ..write("building lots of strings"))
    .toString();
}

(Examples derived from the DateTime DartDoc and an example in Seth's blog entry on strings in Dart).

If such objects are indeed never needed at all (except as storage locations for the data included in the construction) then we could allow an optimization whereby the allocation of a new object is eliminated, the data is stored in the run-time stack, and the computations are performed in a top-level function.

davidmorgan commented 5 years ago

I think this should apply to the builder pattern as used by built_collection and built_value. It would be great to reduce the overhead.

void foo = Foo((b) => b..x = 3..y = 4);

where the Foo factory creates a FooBuilder, applies the function to it, then converts it into a Foo to return.

eernstg commented 5 years ago

That would be great! Please check out #308 and speak up if you see something that you think wouldn't work. ;-)

leafpetersen commented 5 years ago

Can you elaborate a little bit more on the motivations here? In particular, what is it that you want to enable that is not possible now? A quick check of dart2js output shows that it will eliminate all but one of the object allocations in your example above, and I would be fairly surprised if the VM doesn't do likewise.

Test program:

import 'dart:core';

void main() {
  var now = DateTime.now();
  var berlinWallFell = DateTime.utc(1989, 11, 9);
  var moonLanding = DateTime.parse("1969-07-20 20:18:04Z");
  print(now.isAfter(berlinWallFell));
  print(berlinWallFell.isAfter(moonLanding));
  var fullString = (StringBuffer()
                    ..write("Use a StringBuffer")
                    ..writeAll(["for ", "efficient ", "string ", "creation "])
                    ..write("if you are ")
                    ..write("building lots of strings"))
      .toString();
  print(fullString);
}

Output when compiled with dart2js:

  A = {
    main: function() {
      var moonLanding,
        t1 = Date.now(),
        t2 = H.Primitives_valueFromDecomposedDate(1989, 11, 9, 0, 0, 0, 0, true);
      if (typeof t2 !== "number" || Math.floor(t2) !== t2)
        H.throwExpression(H.argumentErrorValue(t2));
      moonLanding = P.DateTime_parse("1969-07-20 20:18:04Z");
      P.print(t1 > t2);
      P.print(t2 > moonLanding._value);
      t1 = P.StringBuffer__writeAll("Use a StringBuffer", ["for ", "efficient ", "string ", "creation "], "") + "if you are building lots of strings";
      P.print(t1.charCodeAt(0) == 0 ? t1 : t1);
    }
  };
matanlurey commented 5 years ago

As a language feature "the compiler should generate better code if able" seems really vague.

I'd love to allow users to introduce more confidence they are writing explicitly writing optimizable code (whether support constant expressions/functions, something like Kotlin's inline classes, etc).

eernstg commented 5 years ago

Can you elaborate a little bit more on the motivations

I can see how these optimization opportunities can sometimes be detected today, but the point is that the developer who marks a class as static commits to a certain discipline in the implementation of that class, and any violations will cause a compile-time error.

So this marks the class as "intended to be able to evaporate", and hence certain constructor invocations can be eliminated: The object was never created, and its state is stored in local variables (including function parameters).

We're essentially storing the object on the stack, and we're free to create as many copies as we want, because the special discipline provides the guarantee that this does not create any difference in the observable behavior.

If that is really, seriously no problem to achieve without help then we can just eliminate the notion of static classes from the proposal about how to obtain a generalized version of static extension methods and static extension types in #309. That would just make it simpler. (Note that this amounts to the same thing as saying that we can create wrapper objects to handle extension method invocations at zero cost, compared to a compilation strategy which always just translates the extension methods to static functions.)

Otherwise, I believe that the notion of static classes does embody a robust and rather expressive static check on a class which ensures that "non-leaking" (cf. #308) expressions are guaranteed to be transformable into purely static function calls. The core idea is that this never leaks, and there is no mutable state, so nobody will need a reference to the object and hence nobody can see whether there are multiple copies.

The only kind of access to the object is member lookup, and that's always a two-step traversal where we have a reference to the object in the middle (to find o.m we evaluate o and then m relative to o), but that's opaque so for a variable access we can replace it by a lookup of a local variable/parameter in a static function, and for an instance member invocation we can replace it by a call to a static function.

eernstg commented 5 years ago

I looked a little bit more closely at the translated code, and several things came to mind.

So I basically remain unconvinced that we already handle the situations where I think object allocations can be avoided, at least not at anything like the same level of generality and predictability that we could have if we were to allow classes to have the modifier static and treat them as in #309.

I adjusted the examples using DateTime to match the main scenario for a static class directly:

main() {
  var currentMonth = DateTime.now().month;
  var moonLandingDay = DateTime.parse("1969-07-20 20:18:04Z").weekday;
  print("Current month: $currentMonth, Moon landing day: $moonLandingDay");
}

This translates as follows:

...
  main: function() {
    var t1 = Date.now(),
      t2 = P.DateTime_parse("1969-07-20 20:18:04Z");
      H.printString(
        "Current month: " +
        H.Primitives_getMonth(new P.DateTime(t1, false)) +
        ", Moon landing day: " + H.Primitives_getWeekday(t2));
  }
...

So we're still allocating two DateTime instances, and this is one of those situations where we might be able to make DateTime a static class, in which case we would be able to eliminate the two allocations.

PS: Interestingly, P.DateTime_parse will create one DateTime and new P.DateTime will create another one, and the generated code has swapped the order. So the compiler must know that there are no side effects in the constructor. ;-)

eernstg commented 5 years ago

@davidmorgan wrote:

void foo = Foo((b) => b..x = 3..y = 4); .. the Foo factory creates a FooBuilder, applies the function to it, then converts it into a Foo to return.

Wouldn't that be var foo = ... or Foo foo = ...? I'll assume that.

Interestingly, we could actually use an extension type (that is, a wrapper class, in #309 lingo) for this:

wrapper class Foo ... {
  implicit factory Foo(void Function(FooBuilder) f) {
    // We'd do something like this:
    FooBuilder fb = ...;
    f(fb); // Let the user code set up `fb`.
    return fb.build(); // Create and return the corresponding `Foo`.
  }
}

main() {
  Foo foo = (b) => b..x = 3..y = 4;
}

It might seem counter-intuitive to let Foo be a wrapper class for FooBuilder, but this just means that any situation where a void Function(FooBuilder) is provided and a Foo is expected, the constructor will be implicitly inserted such that we obtain the desired Foo. Here's the desugared code:

// Desugared code (`class Foo` is unchanged).
main() {
  Foo foo = Foo((b) => b..x = 3..y = 4);
}

By the way, if we use the abbreviated function syntax of #265 we can write it like this:

main() {
  Foo foo = { x = 3; y = 4; };
}

This is the same thing, and it desugars the same way, it's just relying on the ability to have a parameter named this and the ability to use the abbreviated form { <statements>+ } to mean (this) { <statements>+ }. Then the access to x becomes this.x (provided that there's no other x in the lexical scope), etc.

We rely on type inference to tie the pieces together, because it is only a type correct constructor invocation when inference has decided that the parameter type of the function literal is FooBuilder.

But given that we've said that we want a Foo, there is only one implicit constructor, so there is not that much of an ambiguity. Of course, if anyone else will convert a void Function(T) to a Foo, for some T != FooBuilder, it would be ambiguous; but it could then be written as (FooBuilder b) => b..x = 3..y = 4, respectively (FooBuilder this) { x = 3; y = 4; }.

However, all this doesn't help much along the way to eliminate any allocations. You wouldn't be in a good position to eliminate the FooBuilder (if it's used in a way which is similar to the one that I hinted at above), because fb is passed to the user-provided function object f, and this means that we must pass an instance of FooBuilder, so we must allocate it. Also, a user-provided function object is not likely to be inlined (you'd need global flow analysis and a lot of method splitting or something weird like that), so it isn't very likely that we can maneuver ourselves into a better position in that way.

So this is sad, but true—unless the whole thing works differently than I think, it won't be such an obvious use case for static classes after all.

But Foo foo = { x = 3; y = 4; }; would be kind of cool. ;-)

yjbanov commented 5 years ago

I think modifying the class places the decision about whether an instance should be stack-allocated in the wrong place. Wouldn't it be up to the call site to determine that? Example:

var foo = &Foo();  // dunno if & is Darty enough
foo.blah();
return bar(foo);

Knowing that the value of foo lives on the stack allows the compiler to do more checking. For example, it must ensure that method blah and function bar do not create references from the heap to foo. In Rust, they use borrowing for that. In Dart it could look something like this:

static Foo _globalFoo;

// The `&` declares that `bar` can take heap- and stack-allocated
// instances of `Foo`. It promises that it will not leak the instance to the
// heap. A static compiler check upholds the promise.
int bar(Foo& foo) {
  // _globalFoo = foo;  // STATIC COMPILER ERROR
  return foo.someField;  // OK
}

Similar syntax would need to be added for methods, but I can't think of anything Darty. You'd need some way to declare that this is borrowed.

yjbanov commented 5 years ago

Actually, static class pretty much is the syntax I'm looking for that declares that this is borrowed. Let me read the proposal before I say more :)

eernstg commented 5 years ago

I think we wouldn't want exactly the same amount of complexity in Dart as in Rust, in order to control memory management.

The idea behind a static class is that it never leaks this, which means that creation of a fresh object in an expression that is non-leaking is also known to not leak that object. If moreover it is immutable then we can eliminate the object and just carry the values of its fields with us wherever we wish to access it. This is not as powerful as Rust, but it should be relatively easy to manage as a developer: Mark the class static, make the compiler stop complaining about the class, and then all "non-leaking creations" of (exactly!) that type of object can be eliminated (and those creations could be explicit or implicit).

yjbanov commented 5 years ago

I think we wouldn't want exactly the same amount of complexity in Dart as in Rust, in order to control memory management.

I think the two sources of complexity in Rust are:

  1. Stack allocation is the default. You have to opt out of stack allocation.
    • This does not apply to Dart because by default the vast majority of code can continue living on the heap, just like it does today. You'd have to opt in to the complexity when it is warranted.
  2. The alternative to stack allocation in Rust is dozens of different allocation strategies all intertwined with Rust's borrowing system.
    • In Dart's case the fallback is good old fully managed heap with no added complexity whatsoever.

Note that in my suggestion the Foo& is fully compatible with heap-allocated objects. This means that only the author of the function bar opts into the constraints of stack-allocation. The users of this function can safely continue using bar in complete ignorance.

Where would something like this be useful to me? In UI frameworks you frequently create temporary data structures to compute the minimal delta you want to apply to the UI after the developer told you that something's changed. For example, you may receive two lists (e.g. of widgets) and you want to figure out the minimum amount of edits (add, remove, swap), or you receive a new tree of transform matrices and clips and you want to compute that effective change in the clip. Today we create temporary lists, maps, strings, matrices, rectangles, and other geometric objects on the heap in cases where stack would suffice. Another example is immediately-invoked callbacks, such as list.forEach. There's no reason to allocate the callback on the heap because it is invoked and dropped right there and then. It would be great if Iterable.forEach declared this at the interface level, e.g.:

void forEach(borrow void f(E element));
//           ^
//           guarantees that all implementations do not leak `f`
eernstg commented 5 years ago

@yjbanov wrote:

Note that in my suggestion the Foo& is fully compatible with heap-allocated objects

It's certainly an interesting idea, and more powerful than static classes in many ways! But if we allow any references to a stack allocated Foo to exist at all (and we're talking Dart and not Rust ;-) then it is going to be a delicate matter to provide the necessary compile-time guarantee that there will not be any dangling pointers to that object after the return that removes the stack frame where it lives. For instance, think about passing such a pointer as an argument to a function of type Function in a method of a mixin which happens to override the method implementation that declares a parameter to have type Foo&. We'd certainly have to enforce that parameter type Foo& cannot be overridden by parameter type T for any T (subtype of Foo, with covariant, or supertype of Foo) unless T is also of the form T&, because there's nothing other than the method signature and the override rules which will allow us to enforce additional constraints on that Foo& argument. So it's certainly going to be a feature that developers will need to deal with in order to avoid compile-time errors, and it must be a reified part of the type of a function. Moreover, developers will need to understand the implications of having/not-having that & when they declare a type, anywhere.

I don't think it's acceptable for Dart programs to have an even remote risk of seg-faults (that must be a bug in the runtime or in code generation, it can't be caused by a buggy program .. ok, ffi is an exception, but otherwise not), so we will need to maintain that "no dangling pointers" guarantee rigorously, and "the rest of Dart" may not be optimized for establishing such guarantees.

What I'm aiming at with static classes is a bit less ambitious. It relies on a stronger discipline whereby it is guaranteed that 100% of the references to a given object can be eliminated, and then we just need to preserve information about which values the would-be object would have in its (final) fields, and which method implementations it would run on a given instance method invocation. My idea is that the methods would be top-level functions, and the object fields would be passed around as parameters of those functions. With that, there's no need to maintain the same memory layout (or even ordering-in-memory) of the variables/parameters which are used to hold the object state, because all accesses to them will be compiled in a context where it is already statically known that they are parameters. Since parameters can be stored in registers, we could then end up having an object that "lives in a few registers" throughout its entire lifetime, so it really never has an address at all.

yjbanov commented 5 years ago

@eernstg

But if we allow any references to a stack allocated Foo to exist at all (and we're talking Dart and not Rust ;-) then it is going to be a delicate matter to provide the necessary compile-time guarantee that there will not be any dangling pointers to that object after the return that removes the stack frame where it lives.

That pretty much is the feature request :) The only references to stack allocated objects that would be allowed are ones marked as "borrowed". The language/compiler would have to learn how to make sure that those references are not leaked onto the heap.

The interesting question is: what's the minimum sufficient feature set? For example, would we need to support putting borrowed references into borrowed objects, such as class fields and list elements? It is a perfectly safe thing to do (segfault-wise) if the contents are guaranteed to outlive the containing object, but I'm not sure what the simplest way to express it is. Would we need something like Rust's lifetime parameters?

I don't think it's acceptable for Dart programs to have an even remote risk of seg-faults

+1

eernstg commented 5 years ago

@yjbanov wrote:

The only references to stack allocated objects that would be allowed are ones marked as "borrowed". The language/compiler would have to learn how to make sure that those references are not leaked onto the heap.

We might be able to handle such a concept. However, I'm pretty sure that Rust has been optimized for that kind of static proof maintenance almost from day 1, whereas Dart embodies a thousand design decisions that are not optimized in that particular direction, and we might very well have to transform Dart into a kind of a half-Rust-ish language in order to do it. This could mean that every Dart developer would then have to spend a significant amount of brain cycles on getting those "borrowed" bits right.

One example is that Dart would presumably need to maintain the stack-allocation related invariants dynamically. We can't prevent dynamic invocations like d.x from evaluating to a stack-allocated object (if an instance variable can have a 'borrowed' type), and we'd then need to ensure (1) that this evaluation always fails, or (2) that we only do things with that object which will maintain the required discipline. Borrowed types would presumably be supertypes of normal (heap-allocated) types (because we can do more things with entities of a normal type), and this could mean that we will need to downcast a lot of expressions from a borrowed to a normal type, or we would need to make the default bound of a type variable a normal (non-borrowed) type, and then require explicit bounds in the cases where a type variable should be allowed to have a value which is a borrowed type. Etc. ;-)

So I can certainly see the temptation to get the added performance, but I do think that the notion of borrowed objects is so deep that it will have very broad implications for a language to support it, and for developers using it.

What I've proposed in terms of 'static classes' (#308) is a much simpler feature: It allows for object allocations to be eliminated in a situation where we have an exact type of an immutable object, and a complete absence of leaking of said object.

yjbanov commented 5 years ago

This could mean that every Dart developer would then have to spend a significant amount of brain cycles on getting those "borrowed" bits right.

This is a very important point. We should make it a design constraint that this does not happen. I think this can be designed in such a way that does not leak the complexity of borrowing onto developers. For example, this is why I suggested that borrowing functions also allow passing normal heap-allocated objects. What you can do with a borrowed reference should be a strict subset of what you can do with a heap reference. This way the author (and only the author) of the borrowing function opts into this complexity. The author does not opt the users into it as well.

We can't prevent dynamic invocations like d.x from evaluating to a stack-allocated object

Why not? It seems pretty easy to forbid assigning a borrowed Foo to dynamic. However, I do not insist that arbitrary Dart objects should be allowed to participate in this. It is totally OK to only support a subset as long as the feature is still applicable enough to solve a lot of performance problems.

What I've proposed in terms of 'static classes' (#308) is a much simpler feature

I actually do not see a conflict here. If #308 is useful enough, then by all means, it should be added to the language too.

ds84182 commented 5 years ago

Gonna be honest, this sounds a lot like structs... You have "objects" that are just wrappers over plain ol data, and these objects are passed by "copy" by destructuring the inner fields.

eernstg commented 5 years ago

We can't prevent dynamic invocations like d.x from evaluating to a stack-allocated object

@yjbanov wrote:

Why not? It seems pretty easy to forbid assigning a borrowed Foo to dynamic

That's not where it happens: Consider the situation where we assign an expression of type T to dynamic d, and the dynamic type of the assigned value is some T1 <: T, and T1 is a class that declares a getter x whose return type is borrowed Foo.

There may not be a getter T.x, so there is no way we can make d = e an error when e has static type T, because we can't even predict that there will be a d.x at run time, and we have no idea whether it would have a return type which is borrowed, and we don't actually know whether we'll ever use the x of d in the first place.

I do not insist that arbitrary Dart objects should be allowed

The problem is that T above can be any type that can have a subtype T1 that declares any member where a borrowed type occurs covariantly (e.g., as the return type of a getter or method, or the type of an instance variable). Basically, T can be anything. And we can't make it an error to do dynamic d = e whenever the static type of e is "anything". So we're not just excluding a small subset of the Dart objects, we're excluding basically all expressions (except types like int, double, and String, where we do know all subtypes).

eernstg commented 5 years ago

@ds84182 wrote:

this sounds a lot like structs

True, but structs (meaning record data types with direct allocation, as opposed to heap allocated entities that are accessed via a pointer) require support from the runtime. So we can't have structs if we translate to JavaScript. But we can compile non-leaking usages of instances of static classes down to a form where that instance is eliminated, also when we translate to JavaScript (or anything whatsoever, because the elimination of that instance is a Dart-level desugaring operation).

yjbanov commented 5 years ago

@eernstg

Consider the situation where we assign an expression of type T to dynamic d, and the dynamic type of the assigned value is some T1 <: T, and T1 is a class that declares a getter x whose return type is borrowed Foo.

I'm not sure we need borrowed return types. I think these could be disallowed. Alternatively, a borrowed return type may only be assigned to a borrowed variable iff the type of the wrapper object is itself borrowed (not sure if we'd also need lifetimes). Not sure how useful this feature is though. For Rust, this is critical, because there's no GC to fall back on. In Dart there's always GC as the last resort.

What would be useful, however, is an ability to construct and return an object along with ownership to the caller (a la C++ move semantics). Again, regular code can be completely oblivious to the fact that ownership is given to them. They can do whatever they want with it (probably leak it onto the heap, and the runtime would happily oblige). However, advanced users should be able to put it on stack and then pass it (lend it?) to functions that borrow the object, avoiding heap allocation.

eernstg commented 5 years ago

@yjbanov wrote:

I'm not sure we need borrowed return types

That's an interesting restriction!

We could consider the following rule: A formal parameter p of a function declaration can be given a borrowed type, in which case the body of the function can only contain p in non-leaking positions. This would ensure that we don't need borrowed return types because no expression with a borrowed type can be returned.

But unless we restrict this discussion to a setting where we can manage memory ourselves (in particular, not when generating JavaScript), we would need to generate a specialized version of every such function which "accepts the state" of p rather than a regular object (that is, it accepts several formal parameters rather than p), and this would have implications for tear-offs and dynamic invocations etc. And we'd still need to have function types whose parameter types can be borrowed (as well as all other contravariantly positioned types, e.g., the parameter types of a returned function type). So it would still incur a non-trivial language complexity cost.

eernstg commented 5 years ago

Considering borrowed types, here is a description of the kind of red tape that Rust programmers get into if they want to assign one reference to another one: Medium: 'Mutable Reference in Rust'. I think we'd want to be very cautious about forcing Dart programmers into a situation where they need to solve that kind of puzzles.

yjbanov commented 5 years ago

@eernstg

forcing Dart programmers

I'm not sure how this can be described this way, given that you can continue using the existing language as you do today. The idea is to offer this as an advanced language feature that never "leaks" across API boundaries. That is, non-advanced users should never be exposed to this. The issue with Rust (and other non-GC non-RC languages, like C, C++) is that the language allows this complexity leak out to public API.

Consider, for example, the amount of bit-twiddling complexity that goes on in the Dart VM to make strings and ints fast. Does that complexity leak onto the Dart programmer? No. The API looks very natural and "Darty" to use. That should also be a constraint on the feature I'm proposing.

braj065 commented 1 year ago

Is it possible in dart to have non-paused GC like in golang?

munificent commented 1 year ago

The GC in Dart is concurrent and mostly non-pausing as far as I know.

braj065 commented 1 year ago

The GC in Dart is concurrent and mostly non-pausing as far as I know.

Thanks @munificent. Will there be a time in coming years when Dart will be modified to have same performance level(in terms of memory footprint, GC optimization, Lighter isolates etc) as Golang or more performant server side languages. I think that will be a crowd puller to develop the open source server side ecosystem libraries like load balancer, microservice registry, rate limiter etc. Please ignore if my question seems silly.

munificent commented 1 year ago

Will there be a time in coming years when Dart will be modified to have same performance level(in terms of memory footprint, GC optimization, Lighter isolates etc) as Golang or more performant server side languages.

This question assumes a premise that it's not already at the same performance as those languages, but that premise requires some sort of agreed-on benchmarks or performance data, and I don't think that consensus exists. There's no context-free way to say language X is faster than Y. Instead it's language X is faster than Y for assumed-equivalent programs Zx and Zy.

I think for many programs, Dart is plenty fast enough to be used as a back-end language. (For example, all of our own tools are written in Dart, including the compilers, package manager (both client and server), etc.)

braj065 commented 1 year ago

Will there be a time in coming years when Dart will be modified to have same performance level(in terms of memory footprint, GC optimization, Lighter isolates etc) as Golang or more performant server side languages.

This question assumes a premise that it's not already at the same performance as those languages, but that premise requires some sort of agreed-on benchmarks or performance data, and I don't think that consensus exists. There's no context-free way to say language X is faster than Y. Instead it's language X is faster than Y for assumed-equivalent programs Zx and Zy.

I think for many programs, Dart is plenty fast enough to be used as a back-end language. (For example, all of our own tools are written in Dart, including the compilers, package manager (both client and server), etc.)

Are the benchmarks data in following observations qualify for the consensus - https://programming-language-benchmarks.vercel.app/dart-vs-go and https://www.techempower.com/benchmarks/ (I remember seeing Shelf/aqueduct in this but not able to find it anymore. They mention in the git - https://github.com/TechEmpower/FrameworkBenchmarks that they have benchmarked Dart as well - https://github.com/TechEmpower/FrameworkBenchmarks/blob/master/frameworks/Dart/dart2/server.dart). Also once Serverpod will release their throughput benchmarks(https://github.com/serverpod/serverpod/discussions/598) probably that will give a better insight and a yardstick to get behind. Thanks for taking out time to reply.