Protected extension types or proper "Value types"

dart-lang / language

Design of the Dart language

Other

2.65k stars 201 forks source link

Protected extension types or proper "Value types" #1467

Open lrhn opened 3 years ago

lrhn commented 3 years ago

The current specification of extension types suggests protected extension type declarations to introduce a type which:

is a zero overhead abstraction,
not related to the on type (only to Object? and Never),
can only be entered using a constructor, and
can be exited by casting (can always be cast to Object?).

It's not a real wrapper object, although there is an extra part of the proposal to introduce box/unbox getters to do explicit boxing and unboxing.

To me, this all sounds very much like an attempt to define a value type, meaning a type with no (guaranteed) identity for values, allowing automatic boxing and unboxing as necessary, but otherwise looking like a class. The show/hide clauses allow us to forward some methods to the underlying value, which is a feature we could use in other situations too.

So, if we instead allowed a declaration like:

struct Int implements int as _value { // `struct` declares a "value class"
  final int _value;
  Foo(int value) : _value = value;
  Iterable<int> operator(int to) sync* {
    for (var i = this; i < to; i++) yield i;
  }
}

The struct works the same as class, but introduces a value without identity. This allows the compiler to inline/unbox and re-box the object as it pleases. I'd probably want identical to either always answer false, or always do identical comparisons on the field values. Structs do not introduce an interface and cannot be subclassed, or something like that (lots of possible details).

The implements type as field automatically adds forwarders for all non-overridden members of type to field.member. (We can perhaps also allow show/hide after the as field, but it's a problem that show/hide uses a comma separated list, just like implements). There are also other options for forwarding, but this can handle bulk forwarding.

Together, these two features would give us all the affordability of protected extension types, except guaranteed zero overhead. It's a compiler optimization to inline structs where possible, but something that should be fairly predictable (whenever the static type is the value type, we can safely unbox it).

In return, we get general value type which can contain more than one value (extension types are only on one value), which work safely with generic containers (by auto-boxing), and where boxing/autoboxing is built into the system, not something you have to do manually using box/unbox. This is safer than the skin-deep abstraction of an extension type can ever be.

The forwarding feature can also be used by classes, and for more than one type at a time. It's a request we've had before (like https://github.com/dart-lang/language/issues/3748).

lrhn commented 3 years ago

If we ignore the syntax differences, the significant difference is that you declare the variable in the class part (outside the body), I declare it in the body and reference it in the class part:

// No need to write `wrapper`, the forwarding should work for all classes.
class Bar on Foo _foo, Baz _baz implements Qux {  
  ...
}

vs.

class Bar implements Foo as _foo, Baz as _baz, Qux { 
  Foo _foo;
  Baz _baz;
  ...
}

Yours is definitely shorter in some cases, but (IMO) harder to read. I can't just look at the top of the object declaration and see the state. Also, it doesn't allow two interfaces to be implemented by the same object:

class Foo implements Bar as _b, Baz as _b, Qux {
  BarBaz _b;
  ...
}

My approach doesn't allow exposing members which are not part of an interface. There is no show/hide. Maybe that could be handled at th variable declaration:

   Bar bar show bar1, bar2;

would introduce forwarders for bar.bar1 and bar.bar2 in the surrounding class. Then writing (my syntax) implements Bar as bar would be equivalent to adding (effectively, even if we don't have that syntax) show Bar.* on the bar declaration.

As for auto-unboxing, that's a separate feature. I wouldn't call it wrapper class because it's not about wrapping. It's about not having (or at least not needing to have) a consistent identity. I used struct because that's what C# uses for non-Object types.

lrhn commented 3 years ago

The idea is that, as an alternative to "protected extension types", we should do "value types" (which can be unboxed and boxed as needed, and do not promise identity) and "easy forwarding" (for the show/hide part of protected extension types). Two features, which can both be used independently in other situations too, and which together makes protected extension types unnecessary. (So, basically, solve a more general problem, and not the too-specific and not-entirely-possible idea that zero-overhead abstraction can actually buy you any kind of safety).

lrhn commented 3 years ago

One problem with this idea is that value types are traditionally unmodifiable and not subtypable. If something does not have identity, mutating it makes little sense, and if you can replace it with something which has more state prevents unboxing.

For the non-subclassability, that probably means that you cannot extend (or implement) a struct. A struct can extend only classes (you can assign the struct to the supertype, it'll just get boxed). If you need to extend a struct, instead contain the struct and forward its members (which is another use for easy forwarding).

For unmodifiability, either we can just go with that (struct classes must have only final variables and extend only classes with const constructors, which luckily includes Object), or introduce some notion of "modification" which acts consistently with assignment, but isn't really. Basically, where mutating is known to be the same as replacing with a different struct. In that case, we don't want to be able to cast the struct to a mutable supertype, so again it can only extend classes with const constructrs, or maybe just Object (favor composition over inheritance, and all).

Example: struct Foo {int i;} .... Foo x; x.i = 42;, only allowed where x is itself a mutable reference to the struct, and it's equivalent to creating a new Foo with the updated value, and assigning that to x. Basically a lhs.foo=42 is really a composite operation like lhs = lhs[foo ↦ 42] (where e is only evaluated once, and lhs must be an assignable expression). Can even do lhs..foo = 1..bar = 2; and incrementally update the lhs value. That should be optimizable in a lot of cases, but that means that we must prevent mutation of boxed structs, which means no dynamic assignments

Maybe just define structValue..id = value as evaluating to the updated struct. Then you have to explicitly write x = x..foo = 42; (read: x is now x where foo was changed to 42). Using x.foo=42 as shorthand is an option, but less explicit.

If we introduce a supertype of all values, Value, which is a proper supertype of Object and dynamic, and where struct types are subtypes of Value (like in C#), then we also avoid such things as nullable structs and dynamic structs, those operations are only for (reference-)objects. Maybe we can make void be at the level of Value. It shouldn't be assignable to any other type anyway (and if it's still assignable to dynamic, its as an implicit downcast which can fail).

lrhn commented 3 years ago

Stack allocation is a reason for value types, but it's only part of it. (Not just stack allocation, also inline-allocation in other objects, so let's call it non-heap allocation, where the "heap" here is the garbage collected object heap, not a C-like heap).

The reason for non-heap allocated objects is memory use reduction, memory churn reduction and therefore lower garbage collection overhead. It's space, and thereby time, optimization, not a semantic change in how the object works—at least not unless we need to add further restrictions.

So why do we talk about it at all? Because we actually do care about the overhead of allocating new objects. Maybe we shouldn't, and we should just optimize the normal objects better. It's just that if we remove a "small" thing like identity, we know the compiler will have much more opportunities to optimize. That's why we're considering providing that feature. Just as when we consider allowing you to seal a class, it's because it allows the compiler to do more optimizations, and more predictable optimizations, than without it.

The real goal here is good and predictable performance, because without that, Dart won't be a good platform for performance sensitive programs.

The cost of removing identity and allowing inline allocation is that you can't safely store a reference to a non-heap allocated object because either it might stop existing at any time (if it's on the stack) or it doesn't exist as an independent entity (if inlined in another object). Not having a reference means that most Object-based operations won't work. You can't pass the value as an argument, which is value types will often have copy-semantics. You don't pass a reference, you create a new copy at the receiving side. Then we get into design discussions about whether that copy itself has identity enough to allow mutation (so a value stored in a variable stays the same, but assigning it to another variable makes a copy), or whether it's all immutable and the compiler itself can completely forget about which value comes from where.

Cat-sushi commented 3 years ago

I like value types with syntax of

struct Bar from Foo as _foo, Baz as _baz, Qux { 
  Foo _foo;
  Baz _baz;
  ...
}

without auto-forwarding by default, because ID from int doesn't naturally have interface of int.

lrhn commented 3 years ago

I agree that we'll need something, hopefully not pointers.

It's not a given that a "value type" is mutable. It such an overloaded word, used to mean both "data type" (immutable values) and "structs" (potentially mutable like objects, but without a permanent identity), or things in-between. You're talking about the struct.

Reference variables would work, something like ref el1 = list[1]; would be an alias for list[1] as a left hand side as well, and then we'll could also have ref parameters. Structs are identified by the assignable expression (left-hand-side) that they are accessed through, they only change identity on assignment (including parameter passing). It's a powerful feature, but not fundamentally different from capturing a variable with closures like () => x and (v) { x = v; }. A possible desugaring of ref variables is a pair of such functions, or only the getter for a final ref variable, basically abstracting over getters and setters, and then the "getter" also allows struct field assignment through it (so it's a triple of functions, get/set/updateStruct).

There is one other approach we can take, one which doesn't require pointers: Functional Updates!

Structs are immutable.
Assignment to a struct field creates a new struct and assigns it back to the variable holding the old struct.

Basically syntax like structVariable.field = 42 really means structVariable = structVariable._replace(field: 42) (then that expression evaluates to 42 anyway). Works for any valid assignable LHS, so list[index].field = 42 means list[index] = list[index].replace(field: 42) (with the usual caveats about not revaluating list and index twice). It's a composite assignment, like +=42, only the operator-assignment is .field=. A final struct typed variable would be immutable (so would a const variable, obviously).

You can have nested structs and do structVar.nest.field = 42 meaning structVar = structVar._replace(nest: structVar.nest._replace(field: 42)).

It would be up to the compiler to optimize such assignments into in-place updates, or combine things structvar..field1 = 42..field2 = 37 into a single replace.

It's not quite mutable structs, but it simulates it adequately (there are things about evaluation order too, for cases like foo.field = expressionChangingFoo which should not throw away the change. And it's not clear how well it works with dynamic types. I'm sure there are other details to figure out.

That's the one non-reference/non-pointer approach I've seen to updating structs, no pointer exposed, no reference variables needed, but it does rely on the struct variable being assignable.

lrhn commented 3 years ago

All object references are reference/pointer types behind the scenes. We're hiding them fairly successfully now (I hope :sweat:).

Struct references are no different in that regard, the question is whether you can share a reference, which we always do with objects. With structs, you don't share, any assignment is a copy, and here you would probably copy/inline the struct onto into stack. (That's what C# does, they specialize the code for each value type, and only reuse code for reference types - because references all have the same size and shape). Copying is the proper behavior for a struct. If the compiler can sometimes deduce that you don't need to make a copy (or in this case, only copy the name part), then good for it, but that's an optimization.

Admit I can't think of any language with mutable non-object structures and no pointers/reference variables. Even C# has ref parameters and returns. Their structs are also not objects, so you can't assign them to Object. Dart only has one non-Object value for now (null).

dmvvilela commented 4 months ago

Any updates on this? Forgive me if im saying nonsense, but im wondering what if we had structs from day one before flutter even started. Considering SwiftUI (flutter style for building UIs) uses structs, maybe flutter could have worked with such instead of relying on stateful x stateless widgets classes. Just a thought, anyone more knowledgeable than me to have some good insights on the subject?