proposal: spec: add sum types / discriminated unions

DemiMarie commented 7 years ago

This is a proposal for sum types, also known as discriminated unions. Sum types in Go should essentially act like interfaces, except that:

they are value types, like structs
the types contained in them are fixed at compile-time

Sum types can be matched with a switch statement. The compiler checks that all variants are matched. Inside the arms of the switch statement, the value can be used as if it is of the variant that was matched.

ianlancetaylor commented 7 years ago

This has been discussed several times in the past, starting from before the open source release. The past consensus has been that sum types do not add very much to interface types. Once you sort it all out, what you get in the end is an interface type where the compiler checks that you've filled in all the cases of a type switch. That's a fairly small benefit for a new language change.

If you want to push this proposal along further, you will need to write a more complete proposal doc, including: What is the syntax? Precisely how do they work? (You say they are "value types", but interface types are also value types). What are the trade-offs?

rsc commented 7 years ago

See https://www.reddit.com/r/golang/comments/46bd5h/ama_we_are_the_go_contributors_ask_us_anything/d03t6ji/?st=ixp2gf04&sh=7d6920db for some past discussion to be aware of.

griesemer commented 7 years ago

I think this is too significant a change of the type system for Go1 and there's no pressing need. I suggest we revisit this in the larger context of Go 2.

rogpeppe commented 7 years ago

Thanks for creating this proposal. I've been toying with this idea for a year or so now. The following is as far as I've got with a concrete proposal. I think "choice type" might actually be a better name than "sum type", but YMMV.

Sum types in Go

A sum type is represented by two or more types combined with the "|" operator.

type: type1 | type2 ...

Values of the resulting type can only hold one of the specified types. The type is treated as an interface type - its dynamic type is that of the value that's assigned to it.

As a special case, "nil" can be used to indicate whether the value can become nil.

For example:

type maybeInt nil | int

The method set of the sum type holds the intersection of the method set of all its component types, excluding any methods that have the same name but different signatures.

Like any other interface type, sum type may be the subject of a dynamic type conversion. In type switches, the first arm of the switch that matches the stored type will be chosen.

The zero value of a sum type is the zero value of the first type in the sum.

When assigning a value to a sum type, if the value can fit into more than one of the possible types, then the first is chosen.

For example:

var x int|float64 = 13

would result in a value with dynamic type int, but

var x int|float64 = 3.13

would result in a value with dynamic type float64.

Implementation

A naive implementation could implement sum types exactly as interface values. A more sophisticated approach could use a representation appropriate to the set of possible values.

For example a sum type consisting only of concrete types without pointers could be implemented with a non-pointer type, using an extra value to remember the actual type.

For sum-of-struct-types, it might even be possible to use spare padding bytes common to the structs for that purpose.

bcmills commented 7 years ago

@rogpeppe How would that interact with type assertions and type switches? Presumably it would be a compile-time error to have a case on a type (or assertion to a type) that is not a member of the sum. Would it also be an error to have a nonexhaustive switch on such a type?

josharian commented 7 years ago

For type switches, if you have

type T int | interface{}

and you do:

switch t := t.(type) {
  case int:
    // ...

and t contains an interface{} containing an int, does it match the first case? What if the first case is case interface{}?

Or can sum types contain only concrete types?

What about type T interface{} | nil? If you write

var t T = nil

what is t's type? Or is that construction forbidden? A similar question arises for type T []int | nil, so it's not just about interfaces.

rogpeppe commented 7 years ago

Yes, I think it would be reasonable to have a compile-time error to have a case that can't be matched. Not sure about whether it's a good idea to allow non-exhaustive switches on such a type - we don't require exhaustiveness anywhere else. One thing that might be good though: if the switch is exhaustive, we could not require a default to make it a terminating statement.

That means that you can get the compiler to error if you have:

func addOne(x int|float64) int|float64 {
    switch x := x.(type) {
    case int:
        return x + 1
    case float64:
         return x + 1
    }
}

and you change the sum type to add an extra case.

rogpeppe commented 7 years ago

For type switches, if you have

type T int | interface{}

and you do:

switch t := t.(type) { case int: // ... and t contains an interface{} containing an int, does it match the first case? What if the first case is case interface{}?

t can't contain an interface{} containing an int. t is an interface type just like any other interface type, except that it can only contain the enumerated set of types that it consists of. Just like an interface{} can't contain an interface{} containing an int.

Sum types can match interface types, but they still just get a concrete type for the dynamic value. For example, it would be fine to have:

type R io.Reader | io.ReadCloser

What about type T interface{} | nil? If you write

var t T = nil

what is t's type? Or is that construction forbidden? A similar question arises for type T []int | nil, so it's not just about interfaces.

According to the proposal above, you get the first item in the sum that the value can be assigned to, so you'd get the nil interface.

In fact interface{} | nil is technically redundant, because any interface{} can be nil.

For []int | nil, a nil []int is not the same as a nil interface, so the concrete value of ([]int|nil)(nil) would be []int(nil) not untyped nil.

bcmills commented 7 years ago

The []int | nil case is interesting. I would expect the nil in the type declaration to always mean "the nil interface value", in which case

type T []int | nil
var x T = nil

would imply that x is the nil interface, not the nil []int.

That value would be distinct from the nil []int encoded in the same type:

var y T = []int(nil)  // y != x

jimmyfrasche commented 7 years ago

Wouldn't nil always be required even if the sum is all value types? Otherwise what would var x int64 | float64 be? My first thought, extrapolating from the other rules, would be the zero value of the first type, but then what about var x interface{} | int? It would, as @bcmills points out, have to be a distinct sum nil.

It seems overly subtle.

Exhaustive type switches would be nice. You could always add an empty default: when it's not the desired behavior.

rogpeppe commented 7 years ago

The proposal says "When assigning a value to a sum type, if the value can fit into more than one of the possible types, then the first is chosen."

So, with:

type T []int | nil
var x T = nil

x would have concrete type []int because nil is assignable to []int and []int is the first element of the type. It would be equal to any other []int (nil) value.

Wouldn't nil always be required even if the sum is all value types? Otherwise what would var x int64 | float64 be?

The proposal says "The zero value of a sum type is the zero value of the first type in the sum.", so the answer is int64(0).

My first thought, extrapolating from the other rules, would be the zero value of the first type, but then what about var x interface{} | int? It would, as @bcmills points out, have to be a distinct sum nil

No, it would just be the usual interface nil value in that case. That type (interface{} | nil) is redundant. Perhaps it might be a good idea to make it a compiler to specify sum types where one element is a superset of another, as I can't currently see any point in defining such a type.

ianlancetaylor commented 7 years ago

The zero value of a sum type is the zero value of the first type in the sum.

That is an interesting suggestion, but since the sum type must record somewhere the type of the value that it currently holds, I believe it means that the zero value of the sum type is not all-bytes-zero, which would make it different from every other type in Go. Or perhaps we could add an exception saying that if the type information is not present, then the value is the zero value of the first type listed, but then I'm not sure how to represent nil if it is not the first type listed.

jimmyfrasche commented 7 years ago

So (stuff) | nil only makes sense when nothing in (stuff) can be nil and nil | (stuff) means something different depending on whether anything in stuff can be nil? What value does nil add?

@ianlancetaylor I believe many functional languages implement (closed) sum types essentially like how you would in C

struct {
    int which;
    union {
         A a;
         B b;
         C c;
    } summands;
}

if which indexes into the union's fields in order, 0 = a, 1 = b, 2 = c, the zero value definition works out to all bytes are zero. And you'd need to store the types elsewhere, unlike with interfaces. You'd also need special handling for the nil tag of some kind wherever you store the type info.

That would make union's value types instead of special interfaces, which is also interesting.

shanemhansen commented 7 years ago

Is there a way to make the all zero value work if the field which records the type has a zero value representing the first type? I'm assuming that one possible way for this to be represented would be:

type A = B|C
struct A {
  choice byte // value 0 or 1
  value ?// (thing big enough to store B | C)
}

[edit]

Sorry @jimmyfrasche beat me to the punch.

jimmyfrasche commented 7 years ago

Is there anything added by nil that couldn't be done with

type S int | string | struct{}
var None struct{}

?

That seems like it avoids a lot of the confusion (that I have, at least)

jimmyfrasche commented 7 years ago

Or better

type (
     None struct{}
     S int | string | None
)

that way you could type switch on None and assign with None{}

bcmills commented 7 years ago

@jimmyfrasche struct{} is not equal to nil. It's a minor detail, but it would make type-switches on sums needlessly(?) diverge from type-switches on other types.

jimmyfrasche commented 7 years ago

@bcmills It wasn't my intent to claim otherwise—I meant that it could be used for the same purpose as differentiating a lack of value without overlapping with the meaning of nil in any of the types in the sum.

jimmyfrasche commented 7 years ago

@rogpeppe what does this print?

// r is an io.Reader interface value holding a type that also implements io.Closer
var v io.ReadCloser | io.Reader = r
switch v.(type) {
case io.ReadCloser: fmt.Println("ReadCloser")
case io.Reader: fmt.Println("Reader")
}

I would assume "Reader"

bcmills commented 7 years ago

@jimmyfrasche I would assume ReadCloser, same as you'd get from a type-switch on any other interface.

(And I would also expect sums which include only interface types to use no more space than a regular interface, although I suppose that an explicit tag could save a bit of lookup overhead in the type-switch.)

jimmyfrasche commented 7 years ago

@bcmills it's the assigment that's interesting, consider: https://play.golang.org/p/PzmWCYex6R

rogpeppe commented 7 years ago

@ianlancetaylor That's an excellent point to raise, thanks. I don't think it's hard to get around though, although it does imply that my "naive implementation" suggestion is itself too naive. A sum type, although treated as an interface type, does not have to actually contain direct pointer to the type and its method set - instead it could, when appropriate, contain an integer tag that implies the type. That tag could be non-zero even when the type itself is nil.

Given:

 var x int | nil = nil

the runtime value of x need not be all zeros. When switching on the type of x or converting it to another interface type, the tag could be indirected through a small table containing the actual type pointers.

Another possibility would be to allow a nil type only if it's the first element, but that precludes constructions like:

var t nil | int
var u float64 | t

rogpeppe commented 7 years ago

@jimmyfrasche I would assume ReadCloser, same as you'd get from a type-switch on any other interface.

Yes.

@bcmills it's the assigment that's interesting, consider: https://play.golang.org/p/PzmWCYex6R

I don't get this. Why would "this [...] have to be valid for the type switch to print ReadCloser" Like any interface type, a sum type would store no more than the concrete value of what's in it.

When there are several interface types in a sum, the runtime representation is just an interface value - it's just that we know that the underlying value must implement one or more of the declared possibilities.

That is, when you assign something to a type (I1 | I2) where both I1 and I2 are interface types, it's not possible to tell later whether the value you put into was known to implement I1 or I2 at the time.

jimmyfrasche commented 7 years ago

If you have a type that's io.ReadCloser | io.Reader you can't be sure when you type switch or assert on io.Reader that it's not an io.ReadCloser unless assignment to a sum type unboxes and reboxes the interface.

jimmyfrasche commented 7 years ago

Going the other way, if you had io.Reader | io.ReadCloser it would either never accept an io.ReadCloser because it goes strictly right-to-left or the implementation would have to search for the "best matching" interface from all interfaces in the sum but that cannot be well defined.

griesemer commented 7 years ago

@rogpeppe In your proposal, ignoring optimization possibilities in the implementation and subtleties of zero values, the main benefit of using a sum type over a manually crafted interface type (containing the intersection of the relevant methods) is that the type checker can point out errors at compile time rather than runtime. A 2nd benefit is that a type's value is more discriminated and thus may help with readability/understanding of a program. Is there any other major benefit?

(I am not trying to diminish the proposal in any way, just trying to get my intuition right. Especially if the extra syntactic and semantic complexity is "reasonably small" - whatever that may mean - I can definitively see the benefit of having the compiler catch errors early.)

rogpeppe commented 7 years ago

@griesemer Yes, that's about right.

Particularly when communicating messages over channels or the network, I think it helps readability and correctness to be able to have a type that expresses exactly the available possibilities. It's common currently to make a half-hearted attempt to do this by including an unexported method in an interface type, but this is a) circumventable by embedding and b) it's hard to see all the possible types because the unexported method is hidden.

rogpeppe commented 7 years ago

@jimmyfrasche

If you have a type that's io.ReadCloser | io.Reader you can't be sure when you type switch or assert on io.Reader that it's not an io.ReadCloser unless assignment to a sum type unboxes and reboxes the interface.

It you have that type, you know that it's always an io.Reader (or nil, because any io.Reader can also be nil). The two alternatives aren't exclusive - the sum type as proposed is an "inclusive or" not an "exclusive or".

Going the other way, if you had io.Reader | io.ReadCloser it would either never accept an io.ReadCloser because it goes strictly right-to-left or the implementation would have to search for the "best matching" interface from all interfaces in the sum but that cannot be well defined.

If by "going the other way", you mean assigning to that type, the proposal says:

"When assigning a value to a sum type, if the value can fit into more than one of the possible types, then the first is chosen."

In this case, a io.ReadCloser can fit into both an io.Reader and an io.ReadCloser, so it chooses io.Reader, but there's actually no way to tell afterwards. There is no detectable difference between the type io.Reader and the type io.Reader | io.ReadCloser, because io.Reader can also hold all interface types that implement io.Reader. That's why I suspect it might be a good idea to make the compiler reject types like this. For example, it could reject any sum type involving interface{} because interface{} can already contain any type, so the extra qualifications don't add any information.

jimmyfrasche commented 7 years ago

@rogpeppe there are a lot of things I like about your proposal. The left to right assignment semantics and the zero value is the zero value of the leftmost type rules are very clear and simple. Very Go.

What I'm worried about is assigning a value that's already boxed in an interface to a sum typed variable.

Let's, for the moment, use my previous example and say that RC is a struct that can be assigned to an io.ReadCloser.

If you do this

var v io.ReadCloser | io.Reader = RC{}

the results are obvious and clear.

However, if you do this

var r io.Reader = RC{}
var v io.ReadCloser | io.Reader = r

the only sensible thing to do is have v store r as an io.Reader, but that means when you type switch on v you can't be sure that when you hit the io.Reader case that you don't in fact have an io.ReadCloser. You'd need to have something like this:

switch v := v.(type) {
case io.ReadCloser: useReadCloser(v)
case io.Reader:
  if rc, ok := v.(io.ReadCloser); ok {
    useReadCloser(rc)
  } else {
    useReader(v)
  }
}

Now, there's a sense in which io.ReadCloser <: io.Reader, and you could just disallow those, as you suggested, but I think the problem is more fundamental and may apply to any sum type proposal for Go†.

Let's say you have three interfaces A, B, and C, with the methods A(), B(), and C() respectively, and a struct ABC with all three methods. A, B, and C are disjoint so A | B | C and its permutations are all valid types. But you still have cases like

var c C = ABC{}
var v A | B | C = c

There a bunch of ways to rearrange that and you still get no meaningful guarantees about what v is when interfaces are involved. After you unbox the sum you need to unbox the interface if order is important.

Maybe the restriction should be that none of the summands can be interfaces at all?

The only other solution I can think of is to disallow assigning an interface to a sum typed variable, but that seems in its own way more severe.

† that doesn't involve type constructors for the types in the sum to disambiguate (like in Haskell where you have to say Just v to construct a value of type Maybe)—but I am not in favor of that at all.

bcmills commented 7 years ago

@jimmyfrasche Is the use-case for ordered unboxing actually important? That's not obvious to me, and for the cases where it is important it's easy to work around with explicit box structs:

type ReadCloser struct {  io.ReadCloser }
type Reader struct { io.Reader }

var v ReadCloser | Reader = Reader{r}

jimmyfrasche commented 7 years ago

@bcmills It's more that the results are not obvious and fiddly and means that all the guarantees you want with a sum type evaporate when interfaces are involved. I can see it causing all kinds of subtle bugs and misunderstanding.

The explicit box structs example you provide shows that disallowing interfaces in sum types doesn't limit the power of sum types at all. It's effectively creating the type constructors for disambiguation that I mentioned in the footnote. Admittedly it's slightly annoying and an extra step, but it's simple and feels very much in line with Go's philosophy of letting language constructs be as orthogonal as possible.

rogpeppe commented 7 years ago

all the guarantees you want with a sum type

It depends what guarantees you expect. I think you're expecting a sum type to be a strictly tagged value, so given any types A|B|C, you know exactly what static type you assigned to it. I see it as a type restriction on a single value of concrete type - the restriction is that the value is type-compatible with (at least) one of A, B and C. In the end it's just an interface with a value in.

That is, if a value can be assigned to a sum type by virtue of it being assignment-compatible with one of the sum type's members, we don't record which of those members has been "chosen" - we just record the value itself. The same as when you assign an io.Reader to an interface{}, you lose the static io.Reader type and just have the value itself which is compatible with io.Reader but also with any other interface type that it happens to implement.

In your example:

var c C = ABC{}
var v A | B | C = c

A type assertion of v to any of A, B and C would succeed. That seems reasonable to me.

jimmyfrasche commented 7 years ago

@rogpeppe those semantics make more sense than what I was imagining. I'm still not entirely convinced that interfaces and sums mix well, but I'm no longer certain they don't. Progress!

Let's say you have type U I | *T where I is an interface type and *T is a type that implements I.

Given

var i I = new(T)
var u U = i

the dynamic type of u is *T, and in

var u U = new(T)

you can access that *T as an I with a type assertion. Is that correct?

That would mean assignment from a valid interface value to a sum would have to search for the first matching type in the sum.

It would also be somewhat different from something like var v uint8 | int32 | int64 = i which would, I imagine, just always go with whichever of those three types i is even if i was an int64 that could fit in a uint8.

rogpeppe commented 7 years ago

Progress!

Yay!

you can access that *T as an I with a type assertion. Is that correct?

Yes.

That would mean assignment from a valid interface value to a sum would have to search for the first matching type in the sum.

Yup, as the proposal says (of course the compiler knows statically which one to choose so there's no searching at runtime).

It would also be somewhat different from something like var v uint8 | int32 | int64 = i which would, I imagine, just always go with whichever of those three types i is even if i was an int64 that could fit in a uint8.

Yes, because unless i is a constant, it will only be assignable to one of those alternatives.

rogpeppe commented 7 years ago

Yes, because unless i is a constant, it will only be assignable to one of those alternatives.

That's not quite true, I realise, because of the rule allowing assignment of unnamed types to named types. I don't think that makes too much difference though. The rule remains the same.

jimmyfrasche commented 7 years ago

So the I | *T type from my last post is effectively the same as the type I and io.ReadCloser | io.Reader is effectively the same type as io.Reader?

rogpeppe commented 7 years ago

That's right. Both types would be covered by my suggested rule that the compiler reject sum types where one type is an interface that is implemented by another of the types. The same or similar rule could cover sum types with duplicate types like int|int.

One thought: it is perhaps unintuitive that int|byte isn't the same as byte|int, but it's probably ok in practice.

dr2chase commented 7 years ago

That would mean assignment from a valid interface value to a sum would have to search for the first matching type in the sum.

Yup, as the proposal says (of course the compiler knows statically which one to choose so there's no searching at runtime).

I'm not following this. The way I read it (which could be different from what was intended) there's at least two ways to deal with a union U of I and T-implements-I.

1a) at assignment of U u = t, the tag is set to T. Later selection results in a T because the tag is a T. 1b) at assignment of U u = i (i is really a T), the tag is set to I. Later selection results in a T because the tag is a I but a second check (performed because T implements I and T is a member of U) discovers a T.

2a) like 1a 2b) at assignment of U u = i (i is really a T), generated code checks the value (i) to see if it is actually a T, because T implements I and T is also a member of U. Because it is, the tag is set to T. Later selection directly results in a T.

In the case that T, V, W all implement I and U = *T | *V | *W | I, assignment U u = i requires (up to) 3 type tests.

Interfaces and pointers was not the original use case for union types, though, was it?

I can imagine certain sorts of hackery where a "nice" implementation would perform some bit banging -- for example, if you have a union of 4 or fewer pointer types where all referents are 4-byte aligned, store the tag in the lower 2 bits of the value. This in turn implies that it's not good to take the address of a member of a union (it wouldn't be anyhow, since that address could be used to re-store an "old" type without adjusting the tag).

Or if we had a 50-ish-bit address space and were willing to take some liberties with NaNs, we could slap integers, pointers, and doubles all into a 64-bit union, and the possible cost of some bit fiddling.

Both sub-suggestions are gross, I am certain that both would have a small (?) number of fanatical proponents.

bcmills commented 7 years ago

This in turn implies that it's not good to take the address of a member of a union

Correct. But I don't think the result of a type assertion is addressable today anyway, is it?

rogpeppe commented 7 years ago

at assignment of U u = i (i is really a T), the tag is set to I.

I think this is the crux - there is no tag I.

Ignore the runtime representation for a moment and consider a sum type as an interface. As with any interface, it has a dynamic type (the type that's stored in it). The "tag" you refer to is exactly that dynamic type.

As you suggest (and I tried to imply in the last paragraph of the proposal) there may be ways to store the type tag in more efficient ways than with a pointer to the runtime type, but in the end it is always just encoding the dynamic type of the sum-type value, not which of the alternatives was "chosen" when it was created.

Interfaces and pointers was not the original use case for union types, though, was it?

It was not, but any proposal needs to be as orthogonal as possible with respect to other language features, in my view.

jimmyfrasche commented 7 years ago

@dr2chase my understanding so far is that, if a sum type includes any interface types in its definition, then at runtime its implementation is identical to an interface (containing the intersection of method sets) but the compile-time invariants about allowable types are still enforced.

jimmyfrasche commented 7 years ago

Even if a sum type only contained concrete types and it was implemented like a C-style discriminated union, you wouldn't be able to address a value in the sum type since that address could become a different type (and size) after you took the address. You could take the address of the sum typed value itself, though.

dr2chase commented 7 years ago

Is it desirable that sum types behave this way? We could just as easily declare that the selected/asserted type is the same as what the programmer said/implied when a value was assigned to the union. Otherwise we might get led to interesting places with respect to int8 vs int16 vs int32, etc. Or, e.g., int8 | uint8.

rogpeppe commented 7 years ago

Is it desirable that sum types behave this way?

That's a matter of judgement. I believe it is, because we already have the concept of interfaces in the language - values with both a static and a dynamic type. The sum types as proposed just provide a more precise way to specify interface types in some cases. It also means that sum types can work without restriction on any other types. If you don't do that, you need to exclude interface types and then the feature isn't fully orthogonal.

rogpeppe commented 7 years ago

Otherwise we might get led to interesting places with respect to int8 vs int16 vs int32, etc. Or, e.g., int8 | uint8.

What's your concern here?

jimmyfrasche commented 7 years ago

You can't use a function type as a map's key type. I'm not saying that that's equivalent, just that there is precedent for types restricting other kinds of types. Still open to allowing interfaces, still not sold.

What kind of programs can you write with a sum type containing interfaces that you can't otherwise?

jimmyfrasche commented 7 years ago

Counterproposal.

A union type is a type that lists zero or more types, written

union {
  T0
  T1
  //...
  Tn
}

All of the listed types (T0, T1, ..., Tn) in a union must be different and none can be interface types.

Methods may be declared on a defined (named) union type by the usual rules. No methods are promoted from the listed types.

There is no embedding for union types. Listing one union type in another is the same as listing any other valid type. However, a union cannot list its own type recursively, for the same reason that type S struct { S } is invalid.

Unions can be embedded in structs.

The value of a union type is a dynamic type, limited to one of the listed types, and a value of the dynamic type—said to to be the stored value. Exactly one of the listed types is the dynamic type at all times.

The zero value of the empty union is unique. The zero value of a nonempty union is the zero value of the first type listed in the union.

A value for a union type, U, can be created with U{} for the zero value. If U has one or more types and v is a value of one of the listed types, T, U{v} creates a union value storing v with dynamic type T. If v is of a type not listed in U that can be assigned to more than one of the listed types, an explicit conversion is required to disambiguate.

A value of a union type U can be converted to another union type V as in V(U{}) iff the set of types in U is a subset of the set of types in V. That is, ignoring order, U must have all the same types as V does, and U cannot have types that are not in V but V can have types not in U.

Assignability between union types is defined as convertibility is, as long as at most one of the union types is defined (named).

A value of one of the listed types, T, of a union type U may be assigned to a variable of the union type U. This sets the dynamic type to T and stores the value. Assignment compatible values work as above.

If all of the listed types support the equality operators:

the equality operators can be used on two values of the same union type. Two values of a union type are never equal if their dynamic types differ.
a value of that union may be compared with a value of any of its listed types. If the dynamic type of the union is not the type of the other operand, == is false and != is true regardless of the stored value. Assignment compatible values work as above.
the union may be used as a map key

No other operators are supported on values of a union type.

A type assertion against a union type for one of its listed types holds if the asserted type is the dynamic type.

A type assertion against a union type for an interface type holds if its dynamic type implements that interface. (Notably, if all the listed types implement this interface the assertion always holds).

Type switches must either be exhaustive, including all listed types, or contain a default case.

Type assertions and type switches return a copy of the stored value.

Package reflect would require a way to get the dynamic type and stored value of a reflected union value and a way to get the listed types of a reflected union type.

Notes:

The union{...} syntax was chosen partially to differentiate from the sum type proposal in this thread, primarily to retain the nice properties in the Go grammar, and incidentally to reinforce that this is a discriminated union. As a consequence, this allows somewhat strange unions such as union{} and union{ int }. The first is in many senses equivalent to struct{} (though by definition a different type) so it doesn't add to the language, other than adding another empty type. The second is perhaps more useful. For example, type Id union { int } is very much like type Id struct { int } except that the union version allows direct assignment without having to specify idValue.int allowing for it to seem more like a built in type.

The disambiguating conversion required when dealing with assignment compatible types are a bit harsh but would catch errors if a union is updated to introduce an ambiguity that downstream code is unprepared for.

The lack of embedding is a consequence of allowing methods on unions and requiring exhaustive matching in type switches.

Allowing methods on the union itself rather than taking the valid intersection of methods of the listed types avoid accidentally getting an unwanted methods. Type asserting the stored value to common interfaces allows simple, explicit wrapper methods when promotion is desired. For example, on a union type U all of whose listed types implement fmt.Stringer:

func (u U) String() string {
  return u.(fmt.Stringer).String()
}

jimmyfrasche commented 7 years ago

In the linked reddit thread, rsc said:

It would be weird for the zero value of sum { X; Y } to be different from that of sum { Y; X }. That's not how sums usually work.

I've been thinking about this, since it applies to any proposal really.

That's not a bug: it's a feature.

Consider

type (
  Undefined = struct{}
  UndefinedOrInt union { Undefined; int }
)

vs.

type (
  Illegal = struct{}
  IntOrIllegal union { int; Illegal }
)

UndefinedOrInt says by default it's not yet defined, but, when it is, it will be an int value. This is analogous to *int which is how the sum type (1 + int) needs to be represented in Go now and the zero value is also analogous.

IntOrIllegal, on the other hand, says by default it's the int 0, but it may at some point be marked as illegal. This is still analogous to *int but the zero value is more expressive of the intent, like enforcing that it defaults to new(int).

It's kind of like being able to phrase a bool field in a struct in the negative so the zero value is what you want as the default.

Both zero values of the sums are useful and meaningful in their own right and the programmer can choose the most appropriate for the situation.

If the sum were a days of the week enum (each day being a defined struct{}), whichever is listed first is the first day of the week, the same for an iota-style enum.

Also, I'm not aware of any languages with sum types or discriminated/tagged unions that have the concept of a zero value. C would be the closest but the zero value is uninitialized memory—hardly a lead to follow. Java defaults to null, I believe, but that's because everything is a reference. All the other languages I know of have mandatory type constructors for the summands so there isn't really a notion of zero value. Is there such a language? What does it do?

bcmills commented 7 years ago

If the difference from the mathematical concepts of "sum" and "union" is the problem, we can always call them something else (e.g. "variant").

as commented 7 years ago

For names: Union confuses c/c++ purists. Variant is mainly familiar to COBRA and COM programmers where as discriminated union seems to be preferred by the functional languages. Set is a verb and noun. I like the keyword pick. Limbo used pick. It's short and describes the type's intention to pick from a finite set of types.

golang / go

proposal: spec: add sum types / discriminated unions #19412

Sum types in Go

Implementation