dart-lang / language

Design of the Dart language
Other
2.61k stars 200 forks source link

Sum/union types and type matching #83

Open GregorySech opened 5 years ago

GregorySech commented 5 years ago

I'd like to import some features from functional languages like OCaML and F#. The main feature I'm proposing is the possibility of using typedef to define sum types. They can be thought of as type enumerations. This will, in some cases, reduce the need for dynamic making for sounder typing. For example json.decode would no longer return dynamic but JSON a sum type, something like:

typedef JSON is Map<String, JSON> | List<JSON> | Literal;

Sum types could power a new behaviour of switch case that resolves the type based on the typed enumerated. Kind of like assert and if does with contemporary is syntax.

typedef T is A | B | C;
...
void foo(T arg) {
  switch(arg) {
    case A: // here arg is casted to A
      break;
    case B: // here arg is casted to B
      break;
    case C: // here arg is casted to C
      break;
  }
}

A better syntax for this type of switch case might be of order, maybe something like #27.

typedef T is A | B | C;
...
void foo(T arg) {
  switch(arg) {
    case A -> ... // here arg is A
    case B -> ... // here arg is B
    case C -> ... // here arg is C
  }
}

This would be a powered down version of OCaML and F# match <arg> with as I've not included a proposition for type deconstruction, which would probably require tuples (or more in general product types) as discussed in #68.

eernstg commented 5 years ago

First note that it should be possible to categorize this issue as a request or a feature. The request would articulate a non-trivial and practically relevant shortcoming in terms of software engineering that would justify considerations about adding new features to Dart. A feature description is a response to a request, that is, every feature under consideration should be grounded in a need for the feature in terms of an underlying software engineering problem, it should never just be "I want feature X just because that's a cool concept to have in a language". It may well be a non-trivial exercise to come up with such a request, and it may not necessarily have just one feature as its "corresponding solution", but that's a good thing, not a bug. ;-)

This issue is clearly a feature issue, specifically promoting a language construct (or a family of closely related ones). So we need a request issue which describes a software engineering problem more broadly, such that some notion of sum types would be an appropriate response to that request.

That said, requests for sum types are of course not a new thing. A couple of pointers: https://github.com/dart-lang/sdk/issues/4938 has been the main location for discussing union types, and the feature discussed in this issue may or may not be covered under that umbrella.

In particular, this proposal may or may not assume tags. If tags are assumed, typedef JSON would be similar to a class hierarchy like this:

class Json {}
class JsonMap implements Json {
  Map<String, Json> value;
}
class JsonList implements Json {
  List<Json> value;
}
class JsonLiteral implements Json {
  ... // Whatever is needed to be such a literal.
}

This would provide run-time support for discriminating all cases, because anything of type "Map<String, JSON>" would actually be wrapped in an object of type JsonMap, and a simple is JsonMap test would reveal whether we're looking at such an object. In a functional language you'd declare all cases in a single construct, and it would thus be easy to detect at compile time whether any given deconstruction is exhaustive. With an OO class hierarchy you'd need a concept like a 'sealed' class hierarchy in order to get a similar statically known exhaustiveness guarantee. (OCaml supports extensible variant types, and they require an explicit 'default' case in every pattern match, so we got the same trade-off there, albeit with the sealed design as the default.) In any case, tagged unions is the easy way to do this, and it may be somewhat costly at run time, because we have to create and navigate a lot of "wrapper" objects.

However, if we assume untagged unions then we cannot expect run-time support for O(1) discrimination. For instance, a recursive untagged union type could be defined like this (assuming that Dart is extended to support recursive union types):

typedef NestedIntList = int | List<NestedIntList>;

If we're simply considering the type NestedIntList to be the (infinite) union of int and List<int> and List<List<int>> and so on, then it would be possible for a variable of type NestedIntList to refer to an instance of int, and it could also refer to an instance of List<int>, etc. There would not be any wrapper objects, and we would have no support to tell in time O(1) that any given object (say, a List<int>) is in fact of type NestedIntList. This means that the expression e is NestedIntList would need to perform an unbounded number of type checks, and it would in principle be unable to stop at any finite level (if it's looking at a List<List<.....<String>.....>> it wouldn't be able to say "no" before it hits the String, which could be nested any number of levels.)

So we need to clarify whether sum types are intended to be similar to SML algebraic data types (that is, tagged unions), or similar to TypeScript union types (untagged unions).

We should note that TypeScript does not support dynamic type tests (and I do not believe the user-defined workaround is a relevant replacement in Dart). Dart will remain sound in the sense that the obvious heap invariant ("a variable of type T refers to an entity of type T") is maintained at all times, based on a combination of static and dynamic checks. So we don't have the option to ignore that certain type tests / type checks are expensive.

GregorySech commented 5 years ago

Sorry for the missing tag, it seems I'm unable to edit them, I'll propose them in the title while I'm looking for documentation on how to do it.

Thank you for the extensive explanation, I didn't dive deep enough into the sdk issues and I apologize.

To answer the specification question: I was not aware of union types in TypeScript as I've not used it yet so I intended them to be as similar as possible to SML data types however I was ignoring the implementation details of such feature. I'll start studying the mentioned concept in the next days.

eernstg commented 5 years ago

@GregorySech, first: Thanks for your input! I should have said that. Second: I guess there's some permission management in place which restricts the labeling; I'm sure those labels will be adjusted as needed, so don't worry. So the main point is simply that it would be really useful in the discussion about a topic like this to have a characterization of the underlying software engineering problem (without any reference to any particular language construct that might solve the problem), and it would be helpful if you could capture that underlying software engineering challenge for which such a feature as 'sum types' would be a relevant solution.

yjbanov commented 5 years ago

@eernstg

if we assume untagged unions ... There would not be any wrapper objects

Would this still be true if Dart adds unboxed types in the future? For example, imagine in the future non-nullable int is directly represented by 64-bit integer and its value is inlined into variables and fields. Now, what happens when you define a variable of type int | String?

mdebbar commented 5 years ago

FutureOr<T> is a real use case that is hardcoded in Dart today and would be solved neatly when Dart introduces union types. I agree they don't have to match all the specifics of other languages.

eernstg commented 5 years ago

.. [no] wrapper objects Would this still be true if Dart adds unboxed types[?]

If Dart is extended to support a notion of unboxed types I would expect such instances to be compatible enough with the rest of Dart to allow a reference of type Object to refer to them, and that would require support for some notion of boxing.

You may or may not want to say that a boxed entity of type T is a wrapper object containing an unboxed entity of type T, but we're still talking about the same type (with different representations at run-time).

With a wrapper which is used as the representation of a tagged union type there is a transition from one type to another type when the wrapper is added. An example in SML would be the following:

datatype IntString = INT of int | STRING of string;
fun f (INT i) = "An integer"
  | f (STRING s) = s;
val v1 = f (INT 42);
val v2 = f (STRING "24");

Here, INT 42 is of type IntString, but a plain 42 is of type int, so there's a difference between "just being an int" and "being an int which is one of the cases in the union of int and string", and that difference amounts to an entire wrapper object. It's a compile-time error to call f with an argument of type int or string, they must be wrapped first, hence f (INT 42). You need to create that wrapper object explicitly (OK, we could try to infer that, but that's another tarpit ;-), and in return you can test for it at run time (so f knows which case to run when you call it). An extension of Dart with tagged unions would presumably have a lot of the same properties.

An extension of Dart with untagged unions would be different in many ways. I just wanted to point out the fact that this distinction matters, so we shouldn't discuss union types (or sum types) without knowing which kind we're talking about. ;-)

eernstg commented 5 years ago

FutureOr<T> is a real use case that is hardcoded in Dart today

Right. We knew from day one that FutureOr could be used as a playground for getting to know exactly how that special case of union types would affect the whole language.

yjbanov commented 5 years ago

@eernstg Thanks for the explanation! I misread your "wrapper object" as boxing. Yes, something like tagged unions does not require boxing. For example, Rust's and Swift's enum types are essentially tagged unions, and they do not box them.

eernstg commented 5 years ago

@yjbanov wrote:

For example, Rust's and Swift's enum types are essentially tagged unions, and they do not box them.

Right, but I think the situation in Rust and Swift is slightly different from the situation in Dart.

In general, a range of memory cells does not have to be a full-fledged heap entity in order to be able to contain a primitive value (like an integer or a pointer to an immutable string (Rust: &str)) as well as a reference to an associated value (Swift) or a tuple or struct with no subtype relations (Rust) such that the size of the whole thing is known statically as soon as we know which "case" of the enum type we are working on, and we can compute how much space is needed in order to be able to store any of the declared cases.

But at this point we do not have general support for non-heap entities in Dart of any other size than that of a primitive, so if we want to support that kind of composite entity in Dart I think we will have to keep them in the heap (that is, we need to box them). That could still be useful, however, because an enum with an associated value is similar to a tagged union where the enum value itself is the tag.

The ability to work on unboxed entities of arbitrary size would be an orthogonal feature in Dart, and we would be able to introduce support for this separately (and sort out how they would work in dynamic code), and then we could use it for entities of any kind, including enum-with-associated-value.

stt106 commented 5 years ago

If this is implemented + non-nullable types, then Dart will be the best language ever! Sum types (and records) are incredibly powerful constructs to model domains.

GregorySech commented 5 years ago

Let's say that Dart had a non-nullable type ancestor called Thing, Object might be Thing | Null. I'm still unaware of implementation implications for this features to work together but the message I'm trying to convey is that this might save a lot of refactoring if non-null is implemented.

burakemir commented 5 years ago

Hey Erik, since you asked for "software engineering arguments" further up, I will bite and state why algebraic data types and pattern matching is useful from a software engineering perspective.

The simple truth is: 1 - representing structured data so that it can be inspected from the outside isn't what objects were made for, and 2 - yet most data in real world applications has to deal with structured data (it started with tuples, records, relations and the "OO-SQL impedance mismatch").

Today, NULL/optional, enums etc can all provide only half-hearted solutions to case distinction. The paper "Matching Objects with Patterns" (Emir, Odersky, Williams) and also my thesis "Object-oriented pattern matching" discusses the benefits and drawbacks of the object-oriented and other encodings of algebraic data types. The "type-case" feature is also discussed, but not a notion of untagged sum type.

I'd be an advocate for the tried and tested algebraic data types which is a way to represent tuples, sum types, "value classes" and a whole bunch of other programming situations that all help with modeling and dealing with structured data. Algebraic data types would reflect the symmetry between product types and sum (=co-product) types, as known from category theory and its manifestations in functional programming in Haskell, Scala, ocaml aka Reason etc. What better software engineering argument than "it works" can there be? : ) Let me know if this is the appropriate issue or I should file a new one.

(updated, typo)

eernstg commented 5 years ago

(Hello Burak, long time no c, hope everything is well!)

data [..] inspected from the outside isn't what objects were made for

Agreed, I frequently mention that argument as well. And I'd prefer to widen the gap, in the sense that I would like to have one kind of entity which is optimized from access-from-outside ("data") and another one which is optimized for encapsulation and abstraction ("objects").

I'd be an advocate for the tried and tested algebraic data types

I think they would fit rather nicely into Dart if we build them on the ideas around enumerated type with "associated values", similarly to Swift enumerations (but we'd box them first, to get started, and then some unboxed versions could be supported later on where possible).

Considering the mechanisms out there in languages that are somewhat similar to Dart, the notion of an enum class with associated "values" comes to mind here. I mentioned enums and their relationship with inline allocation up here, but I did not mention how similar they are to algebraic data types.

// Emulate the SML type `datatype 'a List = Nil | Cons of ('a * 'a List)`
// using names that are Darty, based on an extension of the Dart `enum`
// mechanism with associated data.

enum EnumList<X> {
  Nil,
  Cons(X head, EnumList<X> tail),
}

int length<X>(EnumList<X> l) {
  switch (l) {
    case Nil: return 0;
    case Cons(var head, var tail): return 1 + length(tail);
  }
}

main() {
  MyList<int> l = MyList.Cons(42, MyList.Nil);
  print(length(l));
}

The enum declaration would then be syntactic sugar for a set of class declarations, one declaration similar to the one which is the specified meaning of the enum class today, and one subclass for each value declared in the enum, carrying the declared associated data in final fields.

We'd want generated code to do all the usual stuff for value-like entities (starting with operator == and hashCode, probably including support for some notion of copy-with-new-values, etc).

There will be lots of proposals for better syntax (e.g., for introduction of patterns introducing fresh names to bind as well as fixed values to check for), but the main point is that the notion of an enum mechanism with associated values would be a quite natural way in Dart to express a lot of the properties that algebraic data types are known to have in other languages.

lukepighetti commented 5 years ago

There are a lot of people in this thread with much more formal computer science education than I possess. My argument will be thus be quite simple. Union types in TypeScript make my life easy as a developer. Not having them in Dart makes my life considerably worse as a developer. That's all I have to share as a consumer of Dart, and quite frankly, that's all the experience I need to have an opinion on whether we should have them or not. I won't be able to provide anything of great intelligence to this conversation, just my opinion based on experience of building things with Dart. How that gets implemented is well outside of my wheelhouse, I'll defer to the software engineers to use (or not use) my feedback. 😄

ds84182 commented 5 years ago

@lifranc Please don't come to language repository just to leave comments that do not add to the conversation. I'm watching the repository so I can keep track of all the conversations happening here... constructive conversations. Your comment is redundant, and unconstructive.

truongsinh commented 4 years ago

https://github.com/spebbe/dartz from @spebbe can be another use case that can make use of union types

dartz was originally written for Dart 1, whose weak type system could be abused to simulate higher-kinded types. Dart 2 has a much stricter type system, making it impossible to pull off the same trick without regressing to fully dynamic typing.

Also, https://developer.mozilla.org/en-US/docs/Web/API/WebSocket/send API, when implemented in Dart, is quite tricky, either we have to have send(dynamic data), which loses type-safety, or sendString(String data) + sendBytes(Uint8List data) + sendBlob(Blob data) etc

wkornewald commented 4 years ago

I'd be really disappointed if Dart added ADTs / enum types instead of real union types. The latter are much more ergonomic, flexible and capture exactly the essence of what the developer wants to express: Different types are possible. This essence is captured with minimal syntax: A | B | C | D. No tags. It's also flexible in that you can pass a subset like A | C or B | D and it will still work correctly. This is what we really want: No limitations. Minimal syntax. Just the essence.

It's always annoying if you can't restrict to a subset and always have to deal with the whole enum and match against all of cases and raise an exception for the disallowed cases at runtime instead of having them disallowed at compile-time. It also makes working with types from different packages annoying because you can't easily mix them (esp. individual cases of the enum). This happens often enough to be an annoyance in Swift, Kotlin and other popular languages.

After all, we're doing this because we want compile-time checks. That's the whole point of union types. If you've ever worked with TypeScript's unions, as an end-developer (vs. the PL designer), you really don't want to go back to something as inflexible and verbose and unnecessarily complicated as enums. Yeah, enums might be easier to optimize, but I believe we can still find a "good enough" optimization for union types. The use-cases for unions in Dart would be different than those in e.g. Haskell because we mostly use classes and I believe the performance is less of an issue here. For example, we don't represent our lists as Cons | Nil. We already have a List class for that.

For more context, please take a look at my quick proposal from a year ago: https://github.com/dart-lang/sdk/issues/4938#issuecomment-396005754

werediver commented 4 years ago

@wkornewald The untagged unions concept sounds nice, but it doesn't sound like a replacement to the algebraic data types. They can totally coexist and serve different needs.

wkornewald commented 4 years ago

@werediver Sure they could coexist, but what is missing from union types that is absolutely necessary from a practical point of view? Do you have a code example?

werediver commented 4 years ago

what is missing from union types that is absolutely necessary from a practical point of view?

That is quite easy to answer: extra semantics.

data Message
  = Info    String
  | Warning String
  | Error   String

Which would be possible with sealed classes and untagged unions together, but that would be noticeably less succinct.

rrousselGit commented 4 years ago

I definitely agree that union types expressed as A | B | C instead of something like sealed classes are a better IMO.

Technically speaking, sealed classes don't bring any new feature, they just reduce the boilerplate.

For example, sealed classes cannot be used to represent JSON since they are a new type. But unions can as they reuse existing types:

typedef Json = String | num | List<Json> | Map<String, Json>

Instead of such sealed class:

sealed class LoginResponse {
    data class Success(val authToken) : LoginResponse()
    object InvalidCredentials : LoginResponse()
    object NoNetwork : LoginResponse()
    data class UnexpectedException(val exception: Exception) : LoginResponse()
}

we could use a typedef & unions (+ potentially data classes to describe Token/...)

typedef LoginResponse =
  | Token
  | NoNetwork
  | InvalidCredentials
  | Exception
wkornewald commented 4 years ago

You can do that with union types, too:

typedef Message
  = Info(String)
  | Warning(String)
  | Error(String)

We don't really have to use an explicit "data" keyword in typedefs to introduce new data classes. That's just a question of consistency vs succinctness. At the same time you can define a function like this which wouldn't be possible with algebraic data types:

void handleProblematicMessage(Warning | Error message) { ... }

With algebraic data types you have to write

void handleProblematicMessage(Message message) {
  switch(message) {
    Info -> throw Exception("Not a problematic message!");
    ...
  }
}

Now you've introduced a run-time error instead of a compile-time type error.

werediver commented 4 years ago

We're clearly carelessly mixing different programming languages and terminology, so let's be more precise.

@wkornewald:

We don't really have to use an explicit "data" keyword ...

I used the data keyword, because I was "speaking" Haskell.

You can do that with union types, too: ...

These are tagged, which is the same as sum-types in the algebraic data types and not the same as the untagged unions that you were talking about initially.

That is exactly my point: tagged unions aka sum-types are useful, untagged unions are useful, they serve different needs, can coexist and be even more useful together :)

@rrousselGit:

For example, sealed classes cannot be used to represent JSON since they are a new type.

I don't see how this is true. Surely you can model JSON with sealed classes. Every concrete sealed class [in Kotlin] is a subclass of a particular base class. From the docs:

Sealed classes are used for representing restricted class hierarchies, when a value can have one of the types from a limited set, but cannot have any other type. ...

sealed class Expr
data class Const(val number: Double) : Expr()
data class Sum(val e1: Expr, val e2: Expr) : Expr()
object NotANumber : Expr()
rrousselGit commented 4 years ago

@werediver

Every concrete sealed class [in Kotlin] is a subclass of a particular base class. From the docs:

That's the problem. There's no common subclass with the different values of a JSON output. num is completely unrelated to String, which is completely unrelated to Map, ...

While it is possible to make a sealed class that represents JSON, it'd be a relatively big breaking change to update the serializer: The serialized object is of a different type.

For example, what used to be:

dynamic serialize(Foo foo) {
  return {
    'count': foo.count,
  };
}

would have to change to:

Json serialize(Foo foo) {
  return Json.map({
    'count': Json.number(foo.count),
  });
}

whereas with unions we'd have:

typedef Json = String | num | List<Json> | Map<String, Json>

Json serialize(Foo foo) {
  return {
    'count': foo.count,
  };
}

We can see that with unions, the implementation did not change. It's just suddenly type-safe.

wkornewald commented 4 years ago

@wkornewald:

We don't really have to use an explicit "data" keyword ...

I used the data keyword, because I was "speaking" Haskell.

I was just referring to my original proposal where typedef was mixed with data Info(String) instead of just Info(String) to be more explicit, but we could make that optional.

You can do that with union types, too: ...

These are tagged, which is the same as sum-types in the algebraic data types and not the same as the untagged unions that you were talking about initially.

That is exactly my point: tagged unions aka sum-types are useful, untagged unions are useful, they serve different needs, can coexist and be even more useful together :)

Ah I see where the difference in our wording is then. :) When takling about "tagged unions" or "algebraic data types" the "tag" is usually not an independent type on its own, but only usable via the whole union like I showed in my code example (which is the usual limitation of tagged unions). That's why I'm talking about union types + data classes as separate concepts that are usable independently of each other, but can be combined to get something more flexible than algebraic data types.

I've also updated my comment to use "union types" instead of "untagged unions" to make this clearer.

werediver commented 4 years ago

@wkornewald I had to read the proposal to make a good sense out of your arguments, but in the end I like how you combine features there. Indeed, the union types in combination with data classes make a good mix. With a deconstructing pattern-matching it can totally replace the "classical" (Haskell, Swift) sum-types, looks nice 👍

wkornewald commented 4 years ago

@werediver Great, then let's convince the majority of the Dart team that this is where we'd like to go. ;)

werediver commented 4 years ago

@wkornewald But did you pay attention to the reply of @eernstg to you original proposal?

He is basically saying that the union types don't help a compiler [that much] to reason about types in presence of subtyping, whereas the sealed class hierarchies do. And, well, my answer to that would be "who needs subtyping anyway, it screws everything!", but it's already in Dart.

We do want a compiler that can reason about the code well, make deep conclusions and good optimizations, right? And having limited resources for development it makes quite a bit of sense to adopt a feature that enables as many good things in the compiler as possible.

In other words, an opinion of the masses is good to take into account, but the through technical analysis by the core team is essential. Let's hope they are going to pull it off in a way that satisfies us 🤞

wkornewald commented 4 years ago

@werediver Oh yeah, I didn't have the time to reply back then, but basically, the compiler can't easily optimize arbitrary use of unions, but I believe we can find an acceptable trade-off that works in 80-90% of the cases. For example, for the in-memory representation we could fall back to the smallest & most frequently used typedef that fully contains a given union. Also, monomophization could improve performance in critical code paths with small union types (maybe only for pairs). Finally, we rarely need union types to be as performant as e.g. in Haskell because their use-case is different. We don't have Cons | Nil, but a simple List class.

So union types will only be used in few select cases and in non-trivial cases we'll have a single typedef for the whole union and the compiler can internally treat that as a single unit/ADT. And if that's not enough I'm sure we can come up with even more optimizations. I'd rather deal with the optimization problem than the prevalent inflexibility of verbose ADT-based code.

eernstg commented 4 years ago

@werediver makes the same distinction that I did:

These are tagged, which is the same as sum-types in the algebraic data types and not the same as the untagged unions that you were talking about initially.

Algebraic data types are useful for programming in a somewhat functional style (it's still Dart, of course) where the set of variants is known precisely when the algebraic data type itself is known (because we wouldn't go into anything like extensible variants, when we already have classes): This allows a pattern matching construct to be checked for exhaustiveness, and there is no subtyping so we know exactly how each variant is structured.

The untagged unions are basically a safer version of the common supertype: If a method accepts int | String | Map<int, int> then we can't do anything meaningful on an expression of that type, but we can discriminate in the body of the method and do something for the int and something else for the String, etc. If we had used the common supertype Object then it would have allowed a lot of other types than the union type, in which case we'd need error handling and we'd get much less static checking at call sites. But there's nothing stopping us from using num | int or other unions that don't amount to a set of disjoint cases, and we might still check for is int first and then is num, and get some meaningful case based behavior anyway. So untagged unions basically provide a lot fewer constraints and guarantees, but since we're talking about object-oriented programs we will have type tags on every entity in the heap anyway, and this means that we can discriminate between different types of objects even with an untagged union.

I just keep coming back to the conclusion that they are different things, and may be useful for different purposes, so I guess I should stop ranting. ;-)

wkornewald commented 4 years ago

@eernstg I'm not sure if you agree or disagree or just wanted to add some neutral thoughts. :) Some of the misunderstanding indeed comes from my initial use of the word untagged unions. I wanted to distinguish my union types + data classes proposal from classical tagged unions because union types + data classes are a more flexible superset of tagged unions:

Most of the time it's actually less code than tagged unions and the code is simpler (i.e. more straightforward, easier to understand).

When combined with (data) classes as tags you get algebraic data types as a special case:

typedef Tree<T>
  | Leaf(T value)
  | TreeNode(T value, Tree<T> left, Tree<T> right);

But you don't have to wrap every single value in a data class. You can also just write String | List<String> and have no wrapping at all.

Also, you can write functions that match on only one of the variants:

void walkNode<T>(TreeNode<T> node) { }

You can even write functions that mix different variants:

void walkTreeOrGraphNode<T>(GraphNode<T> | TreeNode<T> node) { }

This allows you to reuse variants and it gives them a semantically useful meaning. With algebraic data types the semantically same case (e.g. Error) must be redefined if you have three result cases, but with union types you can reuse the existing Error class as a variant and introduce a third case:

typedef Result<T> = Ok(T) | Error(String);

typedef ComplexResult<T> = Ok<T> | Error | ThirdCase(String);
// Or simpler: Result<T> | ThirdCase(String);

Now you can pass e.g. Error directly to any existing function that knows how to deal with Error. No need to re-wrap ComplexResult::Error in a Result::Error variant because they're the same types.

Maybe you're trying to say that there's no way to dispatch on the individual cases unless we introduce overloading:

void walk<T>(TreeNode<T> node) {
   // ...
}

void walk<T>(Leaf<T> leaf) {
   // ...
}

This is the typical usage of tagged unions (and this specific example can be expressed with classes). Dart could support that, too, but we might not want to allow (ambiguous?) inheritance in the argument types of overloaded functions because then we'd need multiple dispatch. Even with this limitation union types would be more flexible than tagged unions.

Is there any consensus within the Dart team? Does the team have a preference for union types + data classes or tagged unions? Maybe I can just stop trying to convince you guys because you already agree with my proposal?

munificent commented 4 years ago

Is there any consensus within the Dart team?

I don't think consensus is super well-defined in the absence of detailed concrete proposals. So we're likely not in active disagreement, but that doesn't necessarily mean there is enough agreement that we can move forward on any particular concrete path.

Does the team have a preference for union types + data classes or tagged unions?

I think there is some general desire to have union types because it's been a long-standing feature request. Nullable types and FutureOr are both essentially union types under the hood, so there's a reasonable argument for generalizing and making arbitrary union types available to users.

At the same time, it's a surprisingly big, complex feature. In terms of complexity cost, changing the type system is higher than almost any other feature. So we don't have any immediate plans to add them.

"Data classes" means different things to different people. If you interpret it to mean "a lighter syntax for defining a class with value-ish semantics" then, yes, I think there's a general desire to do that. This is "easier" in that it's just syntactic sugar for concepts the language can already express. At the same time, designing worthwhile syntactic sugar can be difficult because the sugar has to be useful enough to outweigh the costs of giving users two ways to express the same thing.

As far as "tagged unions" goes, Dart already has subclassing which means the much of the semantics for algebraic data types are there. You can view an ADT as simply a superclass with a closed family of subclasses. Having a nicer syntax for expressing that and pattern matching would be good, but I think we would model it in terms of subclassing (like Scala and I think Kotlin do).

My hunch is that there is a nice syntactic that unifies both "data classes" and "tagged unions" but we won't know for sure until we sit down and design it.

I haven't spent much time thinking about it, but my general sense is that union types + typedefs aren't sufficient to cover the things users want ADTs (and subclasses for that matter) for. I think users want to be clear about which types are structural and which are nominal. Structural union types are handy for after-the-fact composition of types, but you lose some of the safety of nominal typing. Having real ADTs gives you a well-defined, named, useful top type. If you have a variable whose type is some union type, there isn't much you can do with it aside from type test it. With an ADT, the superclass can have useful functionality, implement interfaces, etc.

In other words, if you have the luxury of defining all of the different cases yourself, you're better off defining a real supertype for them instead of just a union type. Union types are handy for when you don't have that luxury.

wkornewald commented 4 years ago

What kind of unsafety do you mean? Union types that can be extended after the fact can indeed introduce unsafety, but I never proposed that. Structural types like in e.g. TypeScript or Go are unsafe because two semantically different structs/interfaces can clash when they define the same attrs/functions. However, the union types I'm proposing can't have name clashes because the variants themselves are nominal.

The only structural aspect of union types is that they're just (unordered) sets of types and every set with the same types is equal. If you want to ensure that two sets are distinct you have to define named variants (via. e.g. data classes). So, you can be semantically very precise without introducing unsafety. When talking about an int you can use int directly. When talking about some new semantic concept you can give it a name (using e.g. a data class). If you want to reuse an existing semantic concept (an existing class or a variant in some other union type) you can do that directly. You don't get this level of semantic precision with an ADT. Even better, union types often need less code than ADTs.

If you want to define a common superclass/interface you can easily have every variant extend a common base class (e.g. like sealed classes). Union types can and often should be combined with inheritance (but due to their flexibility they don't have to).

Maybe if the Dart team thinks that interface requirements should be exhaustively enforceable (like with sealed classes): Similar to Rust, we could also express something like "this union type (any given set of types) must implement interface X" which statically enforces that all variants implement that interface. If you add such a requirement to a third-party class, though, Dart would have to support extensions similar to Swift or Rust.

Regarding your "there isn't much you can do with it" argument: Similar to sealed classes or TypeScript, union types should be compatible with all interfaces implemented by all variants. To give you a very simple (though maybe absurd) example of the sealed classes use-case: an int | double has a .round() method and you can call it directly without testing the type. You can also pass an int | double to any function taking a num.

For attrs and methods we could even dispatch on everything that is common to all variants (even if there's no interface implemented by all of them), but this might be going too far towards unsafe structural typing. Extensions could express the same use-case more explicitly and safely (and verbosely ;).

I'm still sure we can express the same use-cases with the same level of safety as ADTs through a clean set of orthogonal and individually useful set of features. ADTs are just an inflexible special case of the more powerful solution based on union types. Having both in the same language makes no sense. ADTs as a stopgap towards union types might work if we can define a common syntax that later just needs to be opened up (by removing restrictions/limitations) to express union types.

g5becks commented 4 years ago

@wkornewald I agree with your sentiments on this. When using F#, discriminated unions, it sucks to have to define things like this,

/// represents what type of content posts each platform supports
    type ContentType =
        | Video
        | Audio
        | Blog
        | Blurb
        | BookMark
        | Image

    type IMediaPost<'Content> =
        { contentType: ContentType
          content: 'Content }

    type VideoPost = IMediaPost<FileBasedContent>

    type AudioPost = IMediaPost<FileBasedContent>

    type ImagePost = IMediaPost<FileBasedContent>

    type BookMarkPost = IMediaPost<BookMarkContent>

    type BlogPost = IMediaPost<BlogPostContent>

    type BlurbPost = IMediaPost<BlurbContent>

    type postContent<'T> = IMediaPost<'T> -> Unit

The issue for me is that (in my personal experience) about 80% of the time, what you want is each different variant to be its own type and not a value of a type that you HAVE to pattern match on and destructure. It has its uses in a more functional language, but if dart added just the features that you are proposing, with some type of exhaustivity checking, it would be good enough, for now, to fulfill most of my needs (even without pattern matching) because of the already present smart casting feature of dart.

In a functional language, where the behavior and the data are meant to be separate, the destructuring of data based on its value ( and or ) shape becomes more important because you need to pass that data to another function in a pipeline, pass only a part of in order to curry a function, etc.

Dart, being object-oriented, most times behavior and data live together, so once I know the type of what I am dealing with, I don't have as much of a need to destructure the data as I do to call methods on the object. This is why Kotlin gets away with having only when expressions instead of full-blown pattern matching.

I can't say that I have a strong preference on which form is better per se, but what @wkornewald has proposed is what would work best for dart in my opinion, and I would imagine (I may be wrong) would be easier to implement as well, seems like mostly just syntactic sugar.

As far as syntax is concerned, I think using the existing switch keyword and extending it the way they did with C# is better than adding a match keyword.

eernstg commented 4 years ago

@wkornewald wrote:

I'm not sure if you agree or disagree .. union types + data classes are a more flexible superset of tagged unions ..

I just maintain that an algebraic datatype and union types aren't the same thing.

(We should find a different term than 'untagged union types' because every OO object has a tag, but I'm referring to a notion of union types where the operands exist independently of the construction of the union type, which I believe is the case for all parts of your proposal. This is in contrast to algebraic datatypes as in SML and other functional languages, where the type A has a fixed set of cases and each case is known to be a case of A. So I'll just say 'algebraic datatype' for the "tagged" kind and 'union type' for the "untagged" kind.)

I agree that if we have an expression whose static type is A|B|C then the compiler will (or certainly can) maintain a representation of the type that allows for checking that any given match construct is guaranteed to match (that is, it contains an exhaustive set of cases). However, there is no particular structure to this. For instance, we could have B <: A and C <: A, and then we could declare victory just with a match on A:

class A {}
class B implements A {}
class C implements A {}

main() {
  A|B|C x = ...;
  match (x) {
    when A => print("Got an A!");
  }
}

The use of A|B|C in this situation could be a mistake in the first place (or it might be the result of a modification in some remote library L that made B <: A and C <: A true, which might not have been the case previously: then it's a mistake that is imposed, inadvertently, on my code by the person who changed L).

Another interpretation is that A|B|C is indeed the union type that the developer wanted to specify, but it is intended to be tested as follows:

match (x) {
  when B => ..;
  when C => ..;
  when A => print("Not a B, not a C, using default algorithm");
}

So do we want to insist that union types only contain operands that are unrelated types, or are they allowed to overlap? If so: how much, and in which ways?

Similarly, if you have a test like if (x is A|B) and you decide that it should have been A|B|C, how do you search your 2 MLOC project in order to find those occurrences of A|B that should be updated? Maybe some of them is somebody else's would-be algebraic datatype, and maybe they definitely shouldn't be changed! There's also the issue of accidental subtyping:

A|B|C x; // Updated!
A|B y; // Oops, we forgot to change this one to A|B|C.

main() {
  x = y; // Fine!
}

So you might have entire sub-regions of the software where a given union type is used, and it could accidentally become a subtype because we overlooked the need to update it by adding a new case. As long as this sub-region only provides values we may not discover that it is a proper subtype, and maybe we forget to inspect the code where these values are created (even though they might need to take some new situations into account and produce a C now and then).

As usual, we have less structure and fewer guarantees than we would have with an algebraic datatype. For instance, we would have completely different types, rather than ad-hoc usages of types like A|B (that may or may not be intended to mean the same thing). And if we extend an algebraic datatype with a new case then every match on that type can be checked for exhaustiveness.

I acknowledge that these issues can be made much more manageable by giving each union type (used as an algebraic datatype) a name (using a type alias like typedef ABC = A|B|C;) and consistently refer to them only by that name, but there is no language level support for maintaining that discipline.

Another way in which we have less structure with union types is that we cannot reconstruct the intended "whole" from a given instance:

main() {
  Object o = ...;
  if (x is A|B) {
    // Will succeed if it's an `A`, even is intended as an `A|B|C`.
  } else if (x is A|B|C) {
    // Only for a `C` can we definitely say that it wasn't an `A|B`.
  }
}

All in all, I don't agree that union types provide a superset of all those services (including guarantees) that we can get from algebraic datatypes, but (of course) union types can also do a lot of other things that an algebraic datatype can't.

I think the notion of being a single type is the core: All those elements that allow us to write these things more concisely would be applicable in many different models (say, Success<T>(String authToken, T data) could be a concise declaration of a completely normal class, or a data/value class with some extras, or a case in an algebraic datatype, and Success<t>(token, data) could be a pattern used in a pattern match in all those cases as well).

wkornewald commented 4 years ago

So do we want to insist that union types only contain operands that are unrelated types, or are they allowed to overlap? If so: how much, and in which ways?

First of all, you couldn't even express your example with an ADT, so I don't see how that is counterpoint to my claim (if it was one). Union types are a superset of ADTs in that everything expressible as an ADT can be expressed in the same way (i.e. tagged via data classes) with the same safety guarantees using union types. Since union types are more powerful you can also express things outside of ADTs.

Regarding your example, I'd say that exhaustiveness should always apply over all variants because that's more conservative/safe and the developer most likely had a reason for explicitly mentioning a subtype as a separate variant. You can still use an if (x is A) check, but then you obvisouly and explicitly don't want exhaustiveness and you explicitly want to check in some given order. Otherwise you'd be using a switch/match. This also applies to your other if/else example. If this was such a big problem in practice then TypeScript would be in big trouble, but it actually works quite well even in combination with its "fuzzy" (less safe) structural type system.

Similarly, if you have a test like if (x is A|B) and you decide that it should have been A|B|C, how do you search your 2 MLOC project in order to find those occurrences of A|B that should be updated? Maybe some of them is somebody else's would-be algebraic datatype, and maybe they definitely shouldn't be changed! There's also the issue of accidental subtyping:

That's almost like saying "if people use copy-paste instead of reusable functions we're in trouble". If you look at TypeScript you'll see that union types work very well and people do know when to give them a name. Anonymous union types are an exception to simplify very trivial cases (mostly two variants like String | List<String>). In more complex and repetitive cases it's too annoying (and unreadable) to type out all the variants every time, anyway. So I think most of the time human lazyness automatically leads to correct usage (see real-world TypeScript code). Also, once you add associated values / data classes every sensible developer will use a typedef, anyway (you wouldn't define a huge union type within a function argument).

I think the notion of being a single type is the core: All those elements that allow us to write these things more concisely would be applicable in many different models (say, Success<T>(String authToken, T data) could be a concise declaration of a completely normal class, or a data/value class with some extras, or a case in an algebraic datatype, and Success<t>(token, data) could be a pattern used in a pattern match in all those cases as well).

It's not just the notion of being a single type, but also the notion of being an arbitrary set of types that is useful. You don't want the set of types to be bigger than necessary because that forces you to deal with runtime cases that should be compile-time impossible, anyway. It also happens very often that you want to have a custom set of types, no matter which packages those types come from.

GregorySech commented 4 years ago

mmmh didn't ADTs also contain type products (tuples)?

GregorySech commented 4 years ago

For one I agree that we are having some problems with definitions.

We are looking at a different name for untagged union types. Would recursive/inductive data type work?

At this point I'm unsure that we all have the same definition of Algebraic Data Type (ADT).

Do we agree on Wikipedia's definition of ADTs or are we talking about something else?

If we agree I'd like to point out that Algebraic Data Types by definition are composed of sum types (which are tagged unions) and product types (which are records, tuples).

When I first wrote this issue I did it out of frustration about having to rely (if I was "lucky") on my own super-classes though the types that the API needed to express neither had common behaviour nor "shape".

This makes my APIs less expressive, by consequence they become more documentation-dependent. It also seems like a flawed approach to OOP inheritance as a concept but in this context it doesn't matter.

I want to thank @munificent whom wrote this amazing issue https://github.com/dart-lang/language/issues/546 that with this whole thread is giving me much to think about on my way in and out of work. My only problem with that issue is that he is calling ADTs tagged unions, which seems to be happening here a lot and is, by definition, wrong.

Tagged unions are by definition a sub-set of ADTs.

I hope that this post does not read as condescending, it is not my intent and I do not have the skill set to be. I want to establish some definitions so the thread can speak a common language going forward.

munificent commented 4 years ago

My only problem with that issue is that he is calling ADTs tagged unions, which seems to be happening here a lot and is, by definition, wrong.

Oh, I'm sorry for the confusion. I thought most people used "algebraic datatypes" to refer to just sum types and not the combination of both sum types and product types. (In retrospect, it is useful name for that pair of features and "algebraic datatype" is an exactly logical name.)

I should have just said "sum type", but I think that's a less familiar term for many people. Names are hard.

burakemir commented 4 years ago

The terminology isn't the problem here, as it can easily be clarified. It just needs some insight that all these things are referring to more or less the same:

1 - Among "programming languages people", algebraic datatypes were (for decades now!) referring to the Hope / ML / Miranda datatype declarations. These can well be called "sums of products", and this is explained as follows:

In type theory, we can form product types A * B and sum types A + B

Hope / ML / Miranda restricted the use of such types to "sums or products", i.e. expressions of the form S_1 + ... + S_n where each S_i would be of the form T1 * ... T(m_i); in order to inspect such a type, one uses pattern matching and it consists in a first step of telling which part of the sum (which i) it is that we are dealing with and in a second step, recursively match sub-patterns against the values e1 ... e(m_i). The part of the runtime representation of such a datatype which tells which index i it is is also called tag, and in programming we use names for such tags.

So a type expression like "(char * int) + int" would be written as, say, "data Qux = Foo(x:char, y:int) | Bar(z:int)". In the same type-theoretic representation, an enum with n value would correspond to 1 + ... + 1 (n times).

2 - In Scala, which enables class-based OO programming like dart, these algebraic datatypes were encoded in the form of "case classes", such as:

sealed abstract class Qux; case class Foo(x: char, y: int) extends Qux; case class Bar(z: int) extends Qux;

The idea here is that since that at runtime, any Qux instance carries its more specific class like 'Foo' or 'Bar' around anyway (to implemented OO features like late binding), so pattern matching can be translated to simple "is instance of" checks.

An tuples were encoded as case classes like Tuple3(x,y,z). The "tag" does not serve any purpose, but lets one conveniently reuse the machinery of case classes for runtime representation.

3 - Now, the wikipedia article tries to explain the word "algebraic" in a more "first principles" way, as types that are built from + and . And rightly so, in fact: if we could ignore the path dependence of decades of PL history, then it would indeed be a much more consistent use of the English language to call "algebraic" data type any data type that is built up from + and ...

It is also possible to define a programming language with binary * and binary +, see simply typed lambda calculus with sums and pairs. In such a pure language, all matches have two branches, and an enum Red | Green | Blue would be represented as (Red + Green) + Blue and you'd have to use nested matches if you want do deal with "Red"...

@GregorySech The pragmatic way is to accept that everybody says "compiler" when in fact we should be saying "translator". In fact, I remember when ADT used to abbreviate "abstract data type" and nobody would ever use this to mean algebraic data type. People are aware that "algebraic data type" should be understood as something slightly more general, which is why one sometimes says "SML style algebraic data types" or "tagged unions" or "discriminated unions" to be really precise.

4 - Where does "tagged union" come from?

If we wanted a mathematical explanation that uses the language of set theory, then a set A + B refers to the disjoint union of ordered pairs { (0, a) | a \in A } u { (1, b) | b \ in B }

The 0 and 1 serve as "tags" that make sure that the union is in fact disjoint. This is pretty close to how "tagged unions" aka "SML style algebraic data types" are implemented, in the sense that there is something in the runtime bits that tells which variant the object is.

In a class-based OO language, values carry something like a "class descriptor", to implement OO features. It is not true for primitive types like int, char, float, since for performance reasons, one would not want to "box" them.

So the terminology boils down to whether you pretend to be working in a framework of set theory, where + is explained in terms of union and tags ( is primitive), or whether you prefer to refer to a type theory setting where + and are both primitives.

I do consider + and as really fundamental for data types; they correspond do "or" and "and" in logic and we do try to divide and conquer the problem of modeling reality using logical predicates, all the time. In a sense, in an OO language classes (more precisely: structs, record types) give us , and the multiple subclasses gives us +.

5 - I'd now like to deal with this "can't we just have unions" idea.

Since we are in the world of OO, and not set theory "union" and "intersection" have a much more subtle and different meaning. There is "background knowledge" of the class hierarchy, and if I had a language with proper union types, I would expect the compiler to make use of this background knowledge.

@wkornewald I did read your linked proposal sketch in order to see what it's about. It seems really quite incomplete. Let me point out some issues.

For example, if I have two classes

class A { String foo(); } class B { String foo(); }

typedef AorB = A | B

AoB thing = getThing(); thing.foo(); // would cause some surprise if not permitted

... and what about this one:

class Z { String foo(); } class A extends Z class B extends Z

AoB thing = getThing(); thing.foo(); // would cause even more surprise if not permitted.

So it is certainly true that a language with union types can feel very awesome and together with singleton types, there are ways to encode our SML style algebraic datatypes(like here https://www.typescriptlang.org/docs/handbook/advanced-types.html#union-types) ...

...tuning dart into such a language would require a complete proposal that would actually need to spell out a number of things:

The Scala 3 spec has a nice page on union types: https://dotty.epfl.ch/docs/reference/new-types/union-types-spec.html here unions are a way to combine class types, and all of the above is well-defined. Scala 3 is a complete rewrite of Scala 2, where the entire underlying type system was swapped out. Scala 3 doesn't permit the first thing.foo() example above either.

It doesn't seem like it is worthwhile adding something like this afterthought.

In contrast, adding SML algebraic datatypes in the form of Scala-like case classes to dart is almost trivial in comparison, and such a design makes elegant use of the existing forms of * and + in the OO context...

None of the issues with union types that I mentioned above need any thought, since the whole proposal just consists of "hey, let's reuse the class as a tag and agree on a way to get to some of its members via pattern matching", and, for optimization, "hey let's agree on some convention to tell the compiler that there aren't going to be any other sub-classes than the ones already known".

wkornewald commented 4 years ago

@burakemir Thanks for the clarification and the link to the Scala 3 spec. Some of the open questions have been discussed in this thread, so I've extended my original post (answering some of your questions). Anyway, my post is still not a real RFC and there are open questions that should be discussed (e.g. how far narrowing can work with generic types like in List<T>).

My original intent was to get general agreement on the desirability of union types as a better replacement for ADTs. This is the main point I'm trying to make because the intent to have ADTs (even in addition to union types) keeps coming up. At least I have that impression.

Shall we do a 1-2h group call via e.g. Hangouts to at least agree on the "are union types better" question and discuss how the "end-user" would work with union types, so we can start from the same basic idea? Once there's agreement on the basics (the "if") we can discuss "how" we can adapt TypeScript's (or Scala's) union types to Dart's nominal type system in a sensible way.

burakemir commented 4 years ago

It's clear to me that you consider union types as "better", however I am not at all convinced that it makes the right addition to dart - I have not much desire to spend time on discussing what is "better" in absolute terms.

I for one still consider ML style algebraic datatypes a sweet spot in language design, maybe due to my exposure to Scala's implementation of pattern matching, and I also think it would be much easier to add produce a full specification. So I will direct my attention more towards #546 and leave this proposal here alone.

wkornewald commented 4 years ago

Sure we should be discussing what is better specifically for Dart:

What about FFI? Especially when running Dart in the browser, you deal with web APIs full of union typedness.

How do you want to represent JSON in a type-safe and convenient way? Union types help a lot here.

What about adding syntax sugar without breaking the API: https://github.com/dart-lang/sdk/issues/4938#issuecomment-305905315 Union types allow changing the padding argument to num | EdgeInsetsGeometry while staying backwards-compatible with the previous API. ADTs can't.

The backwards-compatibility property alone would be very beneficial for all existing Dart packages and the Dart community in general (and union types have many more beneficial properties).

In what way are ADTs the right addition to Dart?

I guess the main argument is „implementation complexity“. How large is the actual difference in implementation complexity? Has anyone spent sufficient time thinking about what the simplest implementation/architecture would look like? In contrast, how large is the improvement/simplification for the whole Dart community? I think the Dart team is over-estimating and over-prioritizing the implementation complexity (a one-time effort) and under-estimating the benefits for the Dart community.

This whole decision shouldn't be based on "I've never done this, so it must be complicated". The argument also can't be "I prefer Scala's ADTs" considering that Scala 3 will switch to union types. The Scala team has already concluded that union types are the right choice. Why doesn't the Dart team come to the same conclusion?

BTW, see this Scala 3 presentation at minute 16:50 for some more thoughts about the benefits of union types (looks like I'm not the only crazy one). Also, at minute 29 you can see that union types are among the most popular additions (to Martin's surprise, and maybe also to your surprise).

DanMossa commented 4 years ago

This issue thread seems to have the most discussion.

Any ideas on how this is going in terms of progress?

munificent commented 4 years ago

No progress in a while, sorry. We're all very focused on getting non-nullable types out the door right now. There's a dozen corners of the spec to pin down, a hundred Dart-team-maintained packages to migrate, and thousands of language tests to port. We've got our hands full for now. :)

realcr commented 4 years ago

Hey, I implemented a built_value.dart compatible sum type library called built_union.dart, which can be found here: https://github.com/freedomlayer/built_union.dart Obviously this is not as powerful as a language feature, but it could get you going until we have something built into dart.

ollyde commented 3 years ago

@munificent I really hope there's an option flag for 'optionals' though! Because it will never come to Flutter if that is the case that you 'need' to rewrite all the dart libraries; since many Flutter 3rd party libraries have not been updated in more than a year. Also are we going to break the coding trend and call this one something weird too instead of union types 😂?

werediver commented 3 years ago

I really hope there's an option flag for 'optionals' though! Because it will never come to Flutter if that is the case that you 'need' to rewrite all the dart libraries; since many Flutter 3rd party libraries have not been updated in more than a year.

We do have outdated Dart 1.x packages that are of no use anymore. We can as well have non-optional-compliant packages ~that are of no use~ the use of which is undesirable anymore. Coexistence of optional-compliant and non-optional-compliant packages is supported (not sure about for how long, though; it could be for a transitional period only, but it can't be for too short anyway).

jayvhaile commented 3 years ago

What's the status? Is this getting implemented anytime soon?

ngdangtu-vn commented 3 years ago

It would be great if sum type accepts values. For example:

typedef Format is 'html' | 'js' | 'json';

Format convert(Format input)
{
   // etc.
}

void main(List<String> args)
{
   Format output = convert(args[0]);
   print(output);
}

It would help me to avoid convert string to enum in cli input.