Open eernstg opened 2 years ago
Besides f(42);
being desugared to f(Union2.inject1(42));
, I believe int | String
could be syntax sugar for Union2<int, String>
and dart:union
could be always automatically imported.
This seems like an attempt to make an untagged union act like a tagged union, plus extra work to detect invalid values.
This is interesting, but the idea of having Union2.inject2
vs Union2.inject1
sounds a bit unexpected to me.
It's something I played with before in a package I made a long time ago: union And I wasn't satisfied with how it's used.
I think being able to do the following is quite important:
Union2<int, String> value = 42;
Also, would we have some way to normalize the type?
Again, with my union package, one issue is that we can end up with Union2<A, Union2<B, C>>
. Or even worse: Union2<A, Union2<A, B>>
(where A
is present twice).
There's also the question of "Is Union2<A, B>
assignable to Union2<B, A>
?"
@lrhn wrote:
This seems like an attempt to make an untagged union act like a tagged union
I'd say that it is an untagged union. For instance, we could have this:
class A {}
class B {}
class C implements A, B {}
void main() {
var u = Union2<A, B>.inject1(C()); // No way to tell whether the `C` is seen as an `A` or a `B`.
}
A tagged union would be more like this:
class Union2<X extends Object, Y extends Object> {
final X? x;
final Y? y;
Union2.inject1(this.x): y = null;
Union2.inject2(this.y): x = null;
bool get is1 => x != null;
bool get is2 => y != null;
}
void main() {
var u = Union2<A, B>.inject1(C());
print(u.is2); // 'false', definitely.
}
The point is that the union which is based on a view type is a cheap approach, and presumably robust enough to be worthwhile. It's just helping us to detect cases where, say, an actual argument in a method call should be an int
or a String
, and we're providing something else. In particular, it is surely better than a parameter type like Object
which would indeed allow both, but it would also allow a lot of other (wrong) things.
@rrousselGit wrote:
int | String
could be syntax sugar forUnion2<int, String>
Indeed. It just depends on the implications of doing that (possible parsing ambiguities, eating up syntactic space such that other things may then not be possible in the future, etc.etc).
and 'dart:union' could be always automatically imported.
In that case the UnionN
view classes could simply be added to 'dart:core' instead. Again, it depends on the breakage and the willingness (of all of us as a community) to fix the breakage, if any.
@rrousselGit wrote:
It's something I played with before in a package I made a long time ago: union
Thanks, that's interesting, I hadn't seen that!
And I wasn't satisfied with how it's used.
I don't think we can expect Dart to twist and bend a lot in order to make this mechanism optimally convenient. But it might be worthwhile even with those warts.
I think being able to do the following is quite important:
Union2<int, String> value = 42;
That would work right away with implicit
constructors. However, that's one case where it isn't that crucial, you could simply do this:
var value = Union2<int, String>(42);
It's more important in cases like f(42)
vs. f(Union2(42))
vs. f(Union2<int, String>(42)
.
Also, would we have some way to normalize the type?
I don't think so. There's no natural ordering of types. Even an alphabetical ordering would act up if one expression of a type uses import prefixes and the other one doesn't, or it uses different prefixes ((core.int, myPrefix.C)
vs. (int, otherPrefix.C)
). So there's no reasonable way that I can think of to enforce that Union2<String, int>
isn't expressed as Union2<int, String>
by someone.
Of course, we could introduce a (small!) set of coercions:
view class Union2<X1, X2> {
final Object? it;
implicit Union2.inject1(X1 this.it);
implicit Union2.inject2(X2 this.it);
implicit Union2.reorder(Union2<X2, X1> u): it = u.it;
}
Again, with my union package, one issue is that we can end up with
Union2<A, Union2<B, C>>
. Or even worse:Union2<A, Union2<A, B>>
(whereA
is present twice).
Those are really good questions, and generally my response is that (1) we can achieve a minimal level of support for union types using view classes as illustrated in this issue, but (2) they won't have those nice algebraic properties that you mention.
There's also the question of "Is
Union2<A, B>
assignable toUnion2<B, A>
?"
With an implicit Union2.reorder
, yes. But I don't think it's tractable (e.g., what's going to happen to compilation time?) to aim for completeness in this area.
I'd simply recommend that developers use this kind of union types in a simple and straightforward way, and then we may have to admit that normalization of complex union types isn't supported, you just need to avoid getting into that mess. ;-)
It's tagged-union-like in the sense that it introduces a new type, and doesn't have subtype relations between the union type and the individual values. It's untagged-union-like in that it doesn't actually have a tag.
It's also unsafe in that views will not prevent you from introducing any value into the "union", because the representation type is not a real union type.
We can definitely define a type like this, but I'd prefer not to make backends have to recognize it and generate code specially for it. (Like, pretend the two official union types exhaust the possible values, even if it doesn't.) It's special casing which is hidden in the syntax.
I'd rather introduce a proper "semi-tagged union" construct, like int | String
, which is a new type, not subtype related to either int
or string
. We can make both assignable to the union type, for injecton, without making them subtypes (so you can add
an int
to a List<int|String>
, but you can't assing a List<int>
to it). Extraction is done using type checks, like we do for FutureOr
. So, basically the traditional union type, without making it a supertype of the element types, so we don't have to worry about it in LUB computations.
Then it becomes both type safe and zero-overhead, without hiding that it's special by looking as a view
.
@lrhn wrote:
I'd prefer not to make backends have to recognize it and generate code specially for it
That's definitely a safe bet, in the sense that we can do something without changing the language (other than possibly adding support for implicit
on constructors), and avoid committing to any details of union types in the language itself.
I was aiming for a trade-off where we would offer a rather tiny amount of support in the language, e.g., simple well-formedness checks on switch
statements/expressions where the scrutinee type is a union type.
I think this might be useful, but I admit that it is a risky strategy, because we could end up deciding that this particular kind of union type is far too weak, and we need a real language-level union type with all the nice algebraic properties (normalization, subtyping like B <: A | B <: A | B | C
, A Function(B) | C Function(D) <: A | C Function(B & D)
, etc).
Conversely, considering how much work we've done in order to maintain support for nullable types and FutureOr
(those are the proper union types that we have today), and considering that the request for union types has been present at least since 2014 (https://github.com/dart-lang/sdk/issues/18564 is from 2014 and it has a comment saying
Closed unions are generated as dynamic until Dart supports either union or function overloading.
), it seems somewhat likely that we could make progress on a minimalistic approach, but aiming for a nice, shiny, complete approach would push the topic way into the future.
I'd rather introduce a proper "semi-tagged union" construct, like
int | String
, which is a new type, not subtype related to eitherint
orstring
.
The funny thing is that the view class approach does exactly this already.
I'd actually expect to have int <: int | String
in a proper language-level union type. I see it more in the opposite direction, "a view class union is not a real union type" (exactly because we do not have things like Union2<int, double> <: Union3<int, bool, double>
).
But the view class based approach is much simpler to handle in the language, so we could accept the lack of convenience and expressive power, because it's something that we can actually have soon.
We can make both assignable to the union type, for injecton, without making them subtypes (so you can add an
int
to aList<int|String>
, but you can't assing aList<int>
to it).
Right, adding an int
to a List<Union2<int, String>
would actually be possible, because myUnionList.add(42)
would be transformed into myUnionList.add(Union2<int, String>(42))
using the implicit
ness of the constructor.
Assigning a List<int>
to a variable of type List<Union2<int, String>>
will fail at compile time. But if the static type is dynamic
, or we use a cast, it will succeed at run time: The run-time type of the List<int>
could be SomeList<int>
, and the run-time representation of the type List<Union2<int, String>>
is List<Object?>
.
However, we would then have a dynamic error (because of the dynamically checked covariance) if we try to add a String
to the list, just like we'd have a dynamic error here:
void main() {
List<Object?> xs = <int>[];
xs.add('Hello!'); // Throws.
}
So there's nothing union-type specific about this. In general, view class based union types as type arguments will work as if the type is Object?
at run time (actually, that is precisely what it is).
Extraction is done using type checks, like we do for
FutureOr
.
That is also how the view class based approach would behave.
So, basically the traditional union type, without making it a supertype of the element types, so we don't have to worry about it in LUB computations.
Exactly! That's the kind of complications that I wanted to skip by using a plain view class. Similarly, a view class based union type would not have any other members than (1) the ones in Object
, and (2) the ones declared in the relevant UnionN
declaration, so we don't have to discuss whether draw()
can be invoked on an expression of type Cowboy | Pencil
.
For reference, using the union package I was able to allow making Union2<A, B>
assignable to Union3<A, B, C>
The trick was to use functions:
typedef Union2<A, B> = void Function(
void Function(A value),
void Function(B value),
Object? _c,
);
typedef Union3<A, B, C> = void Function(
void Function(A value),
void Function(B value),
void Function(C value),
);
Union3<A, B, C> union = Union2(A)
And I relied on extensions for support assigning int | double
to num
:
extension<T> on Union2<T, T> {
// using extensions, the compiler automatically assign "value" as the closest common type between all generics
T get value {...}
}
Union2<int, double> union;
num value = union.value; // works without a cast
So we can get a lot of things done without language support.
Although from my experience, it's still painful to use when combining unions.
One problem I faced was:
Say two packages use unions, where package A
exports Union2<String, int> functionA();
; and package B
exports functionB(Union3<int, String, double> value)
.
A common use case would be to do functionB(functionA())
.
But although with union I was able to make Union2 assignable to Union3, the issue that arises here is that the order in which generic types are defined is different.
As a language feature, maybe we'd be able to use that "reorder"? And do functionB(Union3(functionA()))
.
That'd assume it'd support assigning Union2 to Union3, but I'm sure that's reasonable.
So we can get a lot of things done without language support.
Cool!
I can't quite see how Union3<A, B, C> union = Union2(A);
would work, given that Union2
is a function type, not a function, but I'm convinced that we would do many things.
maybe we'd be able to use that "reorder"? And do
functionB(Union3(functionA()))
.
The problem that constantly pops up is that there are so many combinations of concrete situations, and it would blow up (in terms of the amount of code as well as the time/space needed to compile code using it) if we enumerate all the situations one by one.
For example, there would be k! variants of "reorder" for a union type with k type arguments. So we may not even have a single reorder
constructor, because we won't have all of them, with any meaningful interpretation of 'all' (say, we do it up to 4 type arguments, or something).
Another example of those combinatorial explosions is subtyping from k type arguments to k+1 type arguments: We would need to handle each subset of size k from a set of size k+1, that is, k+1 variants, just for that single subtyping relationship (more precisely: to support that kind of assignability, based on implicit
constructors), and then we'd have to handle reordering as a separate step, or as part of the same step if implicit
is restricted to taking one step.
So even though we can do a lot, I suspect that it would be more useful to aim at a simple usage for anything like view class based union types. For the sophisticated and highly generic usages we'd want real union types, which all the algebraic laws that we can come up with. But the latter would take longer time to specify and implement than the former..
In response to https://github.com/dart-lang/language/issues?q=is%3Aissue+is%3Aopen+union+types+label%3Aunion-types, we could have something like the following in a widely available location (e.g., 'dart:union' or even 'dart:core'):
The modifier
implicit
on constructors was proposed in https://github.com/dart-lang/language/issues/309 (for a more specialized purpose, but the idea immediately generalizes to a constructor of any class). The basic idea is that if an expressione
has static typeS
and context typeT
, andS
is not assignable toT
, andT
is a type (parameterized likeC<U1, .. Uk>
, or simplyC
) denoting a class that declares an implicit constructor (say,C.name(S1 s)
) taking exactly one positional parameter and no required named parameters, thene
is transformed into an invocation of that constructor (C.name(e)
orC<...>.name(e)
).It could be used as follows:
The fact that inline classes do not involve allocation of an actual wrapper object ensures that (1) the union type has a zero cost representation at run time (an expression of type
Union2<int, String>
evaluates to anint
or aString
, no extras), and (2) it allows a plain type test (myUnion is int
).The mechanism is not robust (we can easily break it, e.g.,
true as Union2<int, String>
won't throw), but it is only intended to help developers who want to ensure that a given expression has one out of a specified set of types, and we do get the static checking at a reasonable level. For instance, even with an implicitly invoked constructor,f(true)
inmain
would be a compile-time error, becausetrue
doesn't have a type which is assignable toUnion2<int, String>
, and none of the implicit constructors (or any other available code transformation) will make it type correct.The mechanism does not have the algebraic properties that we would expect from a full-fledged union type in the language: We do not have support for
A <: A | B <: B | A <: A | B | C
(no normalization, no reordering), it only has the subtype relationships that we can get from standard covariant type argument based subtyping:Union2<int, String>
is a subtype ofUnion2<num, dynamic>
becauseint <: num
andString <: dynamic
. This is one of the reasons why this mechanism would be a minimalistic version of union types.The support for switch statements where the scrutinee has a union type is perhaps the most tricky part. We could simply omit this and rely on developers writing the switch statements in full. However, it seems quite useful if the compiler/analyzer (or perhaps just the linter) "knows about" union types, such that we can ensure that a sequence of cases testing the type of the scrutinee will actually test exactly the operand types in the union type, and also that it will throw in case there is no match (again, considering
true as Union<int, String>
).[Edit Nov 2: Added reference to the old proposal about implicit constructors, and some more info about the mechanism.] [Edit, Dec 8: Changed the syntax to the most recent syntax: Views are now inline classes.]