dart-lang / language

Design of the Dart language
Other
2.67k stars 205 forks source link

[inline classes] Support simple union types? #2603

Open eernstg opened 2 years ago

eernstg commented 2 years ago

In response to https://github.com/dart-lang/language/issues?q=is%3Aissue+is%3Aopen+union+types+label%3Aunion-types, we could have something like the following in a widely available location (e.g., 'dart:union' or even 'dart:core'):

inline class Union2<X1, X2> {
  final Object? it;
  implicit Union2.inject1(X1 this.it);
  implicit Union2.inject2(X2 this.it);

  // We might want various methods, e.g.:
  bool get valid => it is X1 || it is X2;
  X1? get as1 => it is X1 ? it : null;
  X2? get as2 => it is X2 ? it : null;
}

inline class Union3<X1, X2, X3> {
  final Object? it;
  implicit Union3.inject1(X1 this.it);
  implicit Union3.inject2(X2 this.it);
  // ... and so on.
}

...

inline class Union9<...> {...} // Or some other max number of arguments.

The modifier implicit on constructors was proposed in https://github.com/dart-lang/language/issues/309 (for a more specialized purpose, but the idea immediately generalizes to a constructor of any class). The basic idea is that if an expression e has static type S and context type T, and S is not assignable to T, and T is a type (parameterized like C<U1, .. Uk>, or simply C) denoting a class that declares an implicit constructor (say, C.name(S1 s)) taking exactly one positional parameter and no required named parameters, then e is transformed into an invocation of that constructor (C.name(e) or C<...>.name(e)).

It could be used as follows:

import 'dart:union';

void f(Union2<int, String> u) {
  // We'd probably need special support in order to recognize that the set of cases is
  // exhaustive, and each case is possible (according to the type of `u`). We might
  // throw in some generated code in case someone did terrible things like
  // `var u = true as Union<int, String>;`, such that we don't have to write that every time.
  switch (u) {
    case int(): print(u.isEven);
    case String(): print(u.length);
    // A `default` case is generated to throw if the Union was misused.
  }
}

void main() {
  f(Union2.inject1(42)); // The constructors can of course be used directly.
  f(42); // If we get support for `implicit` then this would suffice, too.
}

The fact that inline classes do not involve allocation of an actual wrapper object ensures that (1) the union type has a zero cost representation at run time (an expression of type Union2<int, String> evaluates to an int or a String, no extras), and (2) it allows a plain type test (myUnion is int).

The mechanism is not robust (we can easily break it, e.g., true as Union2<int, String> won't throw), but it is only intended to help developers who want to ensure that a given expression has one out of a specified set of types, and we do get the static checking at a reasonable level. For instance, even with an implicitly invoked constructor, f(true) in main would be a compile-time error, because true doesn't have a type which is assignable to Union2<int, String>, and none of the implicit constructors (or any other available code transformation) will make it type correct.

The mechanism does not have the algebraic properties that we would expect from a full-fledged union type in the language: We do not have support for A <: A | B <: B | A <: A | B | C (no normalization, no reordering), it only has the subtype relationships that we can get from standard covariant type argument based subtyping: Union2<int, String> is a subtype of Union2<num, dynamic> because int <: num and String <: dynamic. This is one of the reasons why this mechanism would be a minimalistic version of union types.

The support for switch statements where the scrutinee has a union type is perhaps the most tricky part. We could simply omit this and rely on developers writing the switch statements in full. However, it seems quite useful if the compiler/analyzer (or perhaps just the linter) "knows about" union types, such that we can ensure that a sequence of cases testing the type of the scrutinee will actually test exactly the operand types in the union type, and also that it will throw in case there is no match (again, considering true as Union<int, String>).

[Edit Nov 2: Added reference to the old proposal about implicit constructors, and some more info about the mechanism.] [Edit, Dec 8: Changed the syntax to the most recent syntax: Views are now inline classes.]

Wdestroier commented 2 years ago

Besides f(42); being desugared to f(Union2.inject1(42));, I believe int | String could be syntax sugar for Union2<int, String> and dart:union could be always automatically imported.

lrhn commented 2 years ago

This seems like an attempt to make an untagged union act like a tagged union, plus extra work to detect invalid values.

rrousselGit commented 2 years ago

This is interesting, but the idea of having Union2.inject2 vs Union2.inject1 sounds a bit unexpected to me.

It's something I played with before in a package I made a long time ago: union And I wasn't satisfied with how it's used.

I think being able to do the following is quite important:

Union2<int, String> value = 42;

Also, would we have some way to normalize the type?

Again, with my union package, one issue is that we can end up with Union2<A, Union2<B, C>>. Or even worse: Union2<A, Union2<A, B>> (where A is present twice).

There's also the question of "Is Union2<A, B> assignable to Union2<B, A>?"

eernstg commented 2 years ago

@lrhn wrote:

This seems like an attempt to make an untagged union act like a tagged union

I'd say that it is an untagged union. For instance, we could have this:

class A {}
class B {}
class C implements A, B {}

void main() {
  var u = Union2<A, B>.inject1(C()); // No way to tell whether the `C` is seen as an `A` or a `B`.
}

A tagged union would be more like this:

class Union2<X extends Object, Y extends Object> {
  final X? x;
  final Y? y;
  Union2.inject1(this.x): y = null;
  Union2.inject2(this.y): x = null;
  bool get is1 => x != null;
  bool get is2 => y != null;
}

void main() {
  var u = Union2<A, B>.inject1(C());
  print(u.is2); // 'false', definitely.
}

The point is that the union which is based on a view type is a cheap approach, and presumably robust enough to be worthwhile. It's just helping us to detect cases where, say, an actual argument in a method call should be an int or a String, and we're providing something else. In particular, it is surely better than a parameter type like Object which would indeed allow both, but it would also allow a lot of other (wrong) things.

eernstg commented 2 years ago

@rrousselGit wrote:

int | String could be syntax sugar for Union2<int, String>

Indeed. It just depends on the implications of doing that (possible parsing ambiguities, eating up syntactic space such that other things may then not be possible in the future, etc.etc).

and 'dart:union' could be always automatically imported.

In that case the UnionN view classes could simply be added to 'dart:core' instead. Again, it depends on the breakage and the willingness (of all of us as a community) to fix the breakage, if any.

eernstg commented 2 years ago

@rrousselGit wrote:

It's something I played with before in a package I made a long time ago: union

Thanks, that's interesting, I hadn't seen that!

And I wasn't satisfied with how it's used.

I don't think we can expect Dart to twist and bend a lot in order to make this mechanism optimally convenient. But it might be worthwhile even with those warts.

I think being able to do the following is quite important:

Union2<int, String> value = 42;

That would work right away with implicit constructors. However, that's one case where it isn't that crucial, you could simply do this:

var value = Union2<int, String>(42);

It's more important in cases like f(42) vs. f(Union2(42)) vs. f(Union2<int, String>(42).

Also, would we have some way to normalize the type?

I don't think so. There's no natural ordering of types. Even an alphabetical ordering would act up if one expression of a type uses import prefixes and the other one doesn't, or it uses different prefixes ((core.int, myPrefix.C) vs. (int, otherPrefix.C)). So there's no reasonable way that I can think of to enforce that Union2<String, int> isn't expressed as Union2<int, String> by someone.

Of course, we could introduce a (small!) set of coercions:

view class Union2<X1, X2> {
  final Object? it;
  implicit Union2.inject1(X1 this.it);
  implicit Union2.inject2(X2 this.it);
  implicit Union2.reorder(Union2<X2, X1> u): it = u.it;
}

Again, with my union package, one issue is that we can end up with Union2<A, Union2<B, C>>. Or even worse: Union2<A, Union2<A, B>> (where A is present twice).

Those are really good questions, and generally my response is that (1) we can achieve a minimal level of support for union types using view classes as illustrated in this issue, but (2) they won't have those nice algebraic properties that you mention.

There's also the question of "Is Union2<A, B> assignable to Union2<B, A>?"

With an implicit Union2.reorder, yes. But I don't think it's tractable (e.g., what's going to happen to compilation time?) to aim for completeness in this area.

I'd simply recommend that developers use this kind of union types in a simple and straightforward way, and then we may have to admit that normalization of complex union types isn't supported, you just need to avoid getting into that mess. ;-)

lrhn commented 2 years ago

It's tagged-union-like in the sense that it introduces a new type, and doesn't have subtype relations between the union type and the individual values. It's untagged-union-like in that it doesn't actually have a tag.

It's also unsafe in that views will not prevent you from introducing any value into the "union", because the representation type is not a real union type.

We can definitely define a type like this, but I'd prefer not to make backends have to recognize it and generate code specially for it. (Like, pretend the two official union types exhaust the possible values, even if it doesn't.) It's special casing which is hidden in the syntax.

I'd rather introduce a proper "semi-tagged union" construct, like int | String, which is a new type, not subtype related to either int or string. We can make both assignable to the union type, for injecton, without making them subtypes (so you can add an int to a List<int|String>, but you can't assing a List<int> to it). Extraction is done using type checks, like we do for FutureOr. So, basically the traditional union type, without making it a supertype of the element types, so we don't have to worry about it in LUB computations. Then it becomes both type safe and zero-overhead, without hiding that it's special by looking as a view.

eernstg commented 2 years ago

@lrhn wrote:

I'd prefer not to make backends have to recognize it and generate code specially for it

That's definitely a safe bet, in the sense that we can do something without changing the language (other than possibly adding support for implicit on constructors), and avoid committing to any details of union types in the language itself.

I was aiming for a trade-off where we would offer a rather tiny amount of support in the language, e.g., simple well-formedness checks on switch statements/expressions where the scrutinee type is a union type.

I think this might be useful, but I admit that it is a risky strategy, because we could end up deciding that this particular kind of union type is far too weak, and we need a real language-level union type with all the nice algebraic properties (normalization, subtyping like B <: A | B <: A | B | C, A Function(B) | C Function(D) <: A | C Function(B & D), etc).

Conversely, considering how much work we've done in order to maintain support for nullable types and FutureOr (those are the proper union types that we have today), and considering that the request for union types has been present at least since 2014 (https://github.com/dart-lang/sdk/issues/18564 is from 2014 and it has a comment saying

Closed unions are generated as dynamic until Dart supports either union or function overloading.

), it seems somewhat likely that we could make progress on a minimalistic approach, but aiming for a nice, shiny, complete approach would push the topic way into the future.

I'd rather introduce a proper "semi-tagged union" construct, like int | String, which is a new type, not subtype related to either int or string.

The funny thing is that the view class approach does exactly this already.

I'd actually expect to have int <: int | String in a proper language-level union type. I see it more in the opposite direction, "a view class union is not a real union type" (exactly because we do not have things like Union2<int, double> <: Union3<int, bool, double>).

But the view class based approach is much simpler to handle in the language, so we could accept the lack of convenience and expressive power, because it's something that we can actually have soon.

We can make both assignable to the union type, for injecton, without making them subtypes (so you can add an int to a List<int|String>, but you can't assing a List<int> to it).

Right, adding an int to a List<Union2<int, String> would actually be possible, because myUnionList.add(42) would be transformed into myUnionList.add(Union2<int, String>(42)) using the implicitness of the constructor.

Assigning a List<int> to a variable of type List<Union2<int, String>> will fail at compile time. But if the static type is dynamic, or we use a cast, it will succeed at run time: The run-time type of the List<int> could be SomeList<int>, and the run-time representation of the type List<Union2<int, String>> is List<Object?>.

However, we would then have a dynamic error (because of the dynamically checked covariance) if we try to add a String to the list, just like we'd have a dynamic error here:

void main() {
  List<Object?> xs = <int>[];
  xs.add('Hello!'); // Throws.
}

So there's nothing union-type specific about this. In general, view class based union types as type arguments will work as if the type is Object? at run time (actually, that is precisely what it is).

Extraction is done using type checks, like we do for FutureOr.

That is also how the view class based approach would behave.

So, basically the traditional union type, without making it a supertype of the element types, so we don't have to worry about it in LUB computations.

Exactly! That's the kind of complications that I wanted to skip by using a plain view class. Similarly, a view class based union type would not have any other members than (1) the ones in Object, and (2) the ones declared in the relevant UnionN declaration, so we don't have to discuss whether draw() can be invoked on an expression of type Cowboy | Pencil.

rrousselGit commented 2 years ago

For reference, using the union package I was able to allow making Union2<A, B> assignable to Union3<A, B, C>

The trick was to use functions:

typedef Union2<A, B> =  void Function(
  void Function(A value),
  void Function(B value),
  Object? _c,
);
typedef Union3<A, B, C> = void Function(
  void Function(A value),
  void Function(B value),
  void Function(C value),
);

Union3<A, B, C> union = Union2(A)

And I relied on extensions for support assigning int | double to num:

extension<T> on Union2<T, T> {
  // using extensions, the compiler automatically assign "value" as the closest common type between all generics
  T get value {...} 
}

Union2<int, double> union;
num value = union.value; // works without a cast

So we can get a lot of things done without language support.

Although from my experience, it's still painful to use when combining unions.

One problem I faced was:
Say two packages use unions, where package A exports Union2<String, int> functionA();; and package B exports functionB(Union3<int, String, double> value).
A common use case would be to do functionB(functionA()).
But although with union I was able to make Union2 assignable to Union3, the issue that arises here is that the order in which generic types are defined is different.

As a language feature, maybe we'd be able to use that "reorder"? And do functionB(Union3(functionA())).
That'd assume it'd support assigning Union2 to Union3, but I'm sure that's reasonable.

eernstg commented 2 years ago

So we can get a lot of things done without language support.

Cool!

I can't quite see how Union3<A, B, C> union = Union2(A); would work, given that Union2 is a function type, not a function, but I'm convinced that we would do many things.

maybe we'd be able to use that "reorder"? And do functionB(Union3(functionA())).

The problem that constantly pops up is that there are so many combinations of concrete situations, and it would blow up (in terms of the amount of code as well as the time/space needed to compile code using it) if we enumerate all the situations one by one.

For example, there would be k! variants of "reorder" for a union type with k type arguments. So we may not even have a single reorder constructor, because we won't have all of them, with any meaningful interpretation of 'all' (say, we do it up to 4 type arguments, or something).

Another example of those combinatorial explosions is subtyping from k type arguments to k+1 type arguments: We would need to handle each subset of size k from a set of size k+1, that is, k+1 variants, just for that single subtyping relationship (more precisely: to support that kind of assignability, based on implicit constructors), and then we'd have to handle reordering as a separate step, or as part of the same step if implicit is restricted to taking one step.

So even though we can do a lot, I suspect that it would be more useful to aim at a simple usage for anything like view class based union types. For the sophisticated and highly generic usages we'd want real union types, which all the algebraic laws that we can come up with. But the latter would take longer time to specify and implement than the former..