golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
124.29k stars 17.7k forks source link

proposal: spec: add sum types / discriminated unions #19412

Open DemiMarie opened 7 years ago

DemiMarie commented 7 years ago

This is a proposal for sum types, also known as discriminated unions. Sum types in Go should essentially act like interfaces, except that:

Sum types can be matched with a switch statement. The compiler checks that all variants are matched. Inside the arms of the switch statement, the value can be used as if it is of the variant that was matched.

jimmyfrasche commented 7 years ago

@rogpeppe if you're going to use reflection why not just use reflection?

jimmyfrasche commented 7 years ago

I've written up a revised version of my second proposal based on comments here and in other issues.

Notably, I've removed exhaustiveness checking. However, an external exhaustiveness checker is trivial to write for the below proposal, though I do not believe one can be written for other Go types used to simulate a sum type.

Edit: I've removed the ability to type assert on the dynamic value of a pick value. It's too magical and the reason for allowing it is served just as well by code generation.

Edit2: clarified how field names work with assertions and switches when the pick is defined in another package.

Edit3: restricted embedding and clarified implicit field names

Edit4: clarify default in switch

Pick types

A pick is a composite type syntactically similar to a struct:

pick {
  A, B S
  C, D T
  E U "a pick tag"
}

In the above, A, B, C, D, and E are the field names of the pick and S, T, and U are the respective types of those fields. Field names may be exported or unexported.

A pick may not be recursive without indirection.

Legal

type p pick {
    //...
    p *p
}

Illegal

type p pick {
    //...
    p p
}

There is no embedding for picks, but a pick may be embedded in a struct. If a pick is embedded in a struct, method on the pick are promoted to the struct but the fields of a pick are not.

A type without a field name is shorthand for defining a field with the same name as the type. (This is an error if the type is unnamed, with an exception for *T where the name is T).

For example,

type p pick {
    io.Reader
    io.Writer
    string
}

has three fields Reader, Writer, and string, with the respective types. Note that the field string is unexported even though it is in the universe scope.

A value of a pick type consists of a dynamic field and the value of that field.

The zero value of a pick type is its first field in source order and the zero value of that field.

Given two values of the same pick type, a and b, the pick value may be assigned as any other value

a = b

Assigning a non-pick value, even one of a type of one of the fields in a pick, is illegal.

A pick type only ever has one dynamic field at any given time.

The composite literal syntax is similar to structs, but there are extra restrictions. Namely, keyless literals are always invalid and only one key may be specified.

The following are valid

pick{A string; B int}{A: "string"} //value is (B, "string")
pick{A, B int}{B: 1} //value is (B, 1)
pick{A, B string}{} //value is (A, "")

The following are compile time errors:

pick{A int; B string}{A: 1, B: "string"} //a pick can only have one value at a time
pick{A int; B uint}{1} //pick composite literals must be keyed

Given a value p of type pick {A int; B string} the following assignment

p.B = "hi"

sets the dynamic field of p to B and the value of B to "hi".

Assignment to the current dynamic field updates the value of that field. Assignment that sets a new dynamic field must zero any unspecified memory locations. Assignment to a pick or struct field of a pick field updates or sets the dynamic field as necessary.

type P pick {
    A, B image.Point
}

var p P
fmt.Println(P) //{A: {0 0}}

p.A.X = 1 //A is the dynamic field, update
fmt.Println(P) //{A: {1 0}}

p.B.Y = 2 //B is not the dynamic value, create zero image.Point first
fmt.Println(P) //{B: {0 2}}

The value held in a pick can only be accessed by a field assert or field switch.

x := p.[X] //panics if X is not the dynamic field of p
x, ok := p.[X] //returns the zero value of X and false if X is not the dynamic field of p

switch v := p.[var] {
case A:
case B, C: // v is only defined in this case if fields B and C have identical type names
case D:
default: // always legal even if all fields are exhaustively listed above
}

The field names in field assertions and field switches are a property of the type, not the package it was defined in. They are not, and cannot be, qualified by the package name that defines the pick.

This is valid:

_, ok := externalPackage.ReturnsPick().[Field]

This is invalid:

_, ok := externalPackage.ReturnsPick().[externalPackage.Field]

Field assertions and field switches always return a copy of the dynamic field's value.

Unexported field names can only be asserted in their defining package.

Type assertions and type switches also work on picks.

//removed, see note at top
//v, ok := p.(fmt.Stringer) //holds if the type of the dynamic field implements fmt.Stringer
//v, ok := p.(int) //holds if the type of the dynamic field is an int

Type assertions and type switches always return a copy of the dynamic field's value.

If the pick is stored in an interface, type assertions for interfaces only match against the method set of the pick itself. [still true but redundant as the above has been removed]

If all the types of a pick support the equality operators then:

No other operators are supported on values of a pick type.

A value of a pick type P can be converted to another pick type Q iff the set of field names and their types in P is a subset of the field names and their types in Q.

If P and Q are defined in different packages and have unexported fields, those fields are considered different regardless of name and type.

Example:

type P pick {A int; B string}
type Q pick {B string; A int; C float64}

//legal
var p P
q := Q(p)

//illegal
var q Q
p := P(Q) //cannot handle field C

Assignability between two pick types is defined as convertibility, as long as no more than one of the types is defined.

Methods may be declared on a defined pick type.

jimmyfrasche commented 7 years ago

I created (and added to the wiki) an experience report https://gist.github.com/jimmyfrasche/ba2b709cdc390585ba8c43c989797325

Edit: and :heart: to @mewmew who left a much better and more detailed report as a reply on that gist

ianlancetaylor commented 7 years ago

What if we had a way to say, for a given type T, the list of types that could be converted to type T or assigned to a variable of type T? For example

type T interface{} restrict { string, error }

defines an empty interface type named T such that the only types that may be assigned to it are string or error. Any attempt to assign a value of any other type produces a compile time error. Now I can say

func FindOrFail(m map[int]string, key int) T {
    if v, ok := m[key]; ok {
        return v
    }
    return errors.New("no such key")
}

func Lookup() {
    v := FindOrFail(m, key)
    if err, ok := v.(error); ok {
        log.Fatal(err)
    }
    s := v.(string) // This type assertion must succeed.
}

What key elements of sum types (or pick types) would not be satisfied by this kind of approach?

stevenblenkinsop commented 7 years ago
s := v.(string) // This type assertion must succeed.

This isn't strictly true, since v could also be nil. It would take a fairly major change to the language to remove this possibility, since it would mean introducing types that don't have zero values and everything that entails. The zero value simplifies parts of the language, but also makes designing these kinds of features more difficult.

Interestingly, this approach is fairly similar to @rogpeppe's original proposal. What it doesn't have is coercion to the listed types, which could be useful in situations like I pointed out earlier (http.Handler). Another thing is that it requires each variant to be a distinct type, since variants are discriminated by type rather than a distinct tag. I think this is strictly as expressive, but some people prefer to have variant tags and types be distinct.

jimmyfrasche commented 7 years ago

@ianlancetaylor

the pros

the cons

That said, it does check the major boxes (the first two pros I listed) and I'd take it in a heartbeat if that's all I could get. I hope for better, though.

stevenblenkinsop commented 7 years ago

I assumed type assertion rules applied. So the type needs to be either identical to a concrete type or assignable to an interface type. Basically, it works exactly like an interface but any value (other than nil) must be assertable to at least one of the listed types.

urandom commented 7 years ago

@jimmyfrasche In your updated proposal, would the following assignment be possible, if all elements of the type are of distinct types:

type p pick {
    A int
    B string
}

func Foo(P p) {
}

var P p = 42
var Q p = "foo"

Foo(42)
Foo("foo")

The usability of sum types when such assignments are possible is a lot bigger.

josharian commented 7 years ago

With the pick proposal you can choose to have a p or *p giving you more greater control over memory trade offs.

The reason interfaces allocate to store scalar values is so you you don't have to read a type word in order to decide whether the other word is a pointer; see #8405 for discussion. The same implementation considerations would likely apply for a pick type, which might mean in practice that p end up allocating and being non-local anyway.

jimmyfrasche commented 7 years ago

@urandom no, given your definitions it would have to be written

var p P = P{A: 42} // p := P{A: 42}
var q P = P{B: "foo")
Foo(P{A: 42}) // or Foo({A: 42}) if types can be elided here
Foo(P{B: "foo"})

It's best to think of them as a struct that can only have one field set at a time.

If you don't have that and then you add a C uint to p what happens to p = 42?

You can make up a lot of rules based on order and assignability but they always mean that changes to the type definition can have subtle and dramatic effects on all the code using the type.

In the best case a change breaks all the code relying on the lack of ambiguity and says you need to change it to p = int(42) or p = uint(42) before it compiles again. A one line change shouldn't require fixing a hundred lines. Especially if those lines are in packages of people depending on your code.

You either have to be 100% explicit or have a very fragile type that no one can touch because it might break everything.

This applies to any sum type proposal but if there are explicit labels you still have assignability because the label is explicit about which type is being assigned to.

jimmyfrasche commented 7 years ago

@josharian so if I'm reading that correctly the reason iface is now always (*type, *value) instead of stashing word-sized values in the second field as Go did previously is so that the concurrent GC doesn't need to inspect both fields to see whether the second is a pointer—it can just assume that it always is. Did I get that right?

In other words, if the pick type were implemented (using C notation) like

struct {
    int which;
    union {
         A a;
         B b;
         C c;
    } summands;
}

the GC would need to take a lock (or something fancy but equivalent) to inspect which to determine if summands needed to be scanned?

josharian commented 7 years ago

the reason iface is now always (type, value) instead of stashing word-sized values in the second field as Go did previously is so that the concurrent GC doesn't need to inspect both fields to see whether the second is a pointer—it can just assume that it always is.

That's right.

Of course, the limited nature of pick types would allow some alternative implementations. The pick type could be laid out such that there's always a consistent pattern of pointer/non-pointer; e.g. all scalar types can overlap, and a string field could overlap with the beginning of a slice field (because both start "pointer, non-pointer"). So

pick {
  a uintptr
  b string
  c []byte
}

could be laid out roughly:

[ word 1 (ptr) ] [ word 2 (non-ptr) ] [ word 3 (non-ptr) ]
[    <nil>         ] [                 a           ] [                              ]
[       b.ptr      ] [            b.len          ] [                              ]
[       c.ptr      ] [             c.len         ] [        c.cap             ]

but other pick types might not allow such optimal packing. (Sorry about the broken ASCII, I can't seem to make GitHub render it right. You get the point, I hope.)

This ability to do static layout might even be a performance argument in favor of including pick types; my goal here is simply to flag relevant implementation details for you.

jimmyfrasche commented 7 years ago

@josharian and thank you for doing so. I hadn't thought of that (in all honestly I just googled if there existed research on how to GC discriminated unions, saw that yes you can do that and called it a day—for some reason my brain didn't associate "concurrency" with "Go" that day: facepalm!).

There would be less choice if one of the types a defined struct which already had a layout.

One option would be to not "compact" the summands if they contain pointers meaning that the size would be the same as the equivalent struct (+ 1 for the discriminator int). Maybe taking a hybrid approach, when possible, so that all the types that can share layout do.

It would be a shame to lose the nice size properties but that really is just an optimization.

Even if it were always 1 + the size of an equivalent struct even when they contained no pointers, it would still have all the other nice properties of the type itself, including control over allocations. Additional optimizations could be added over time and would at least be possible as you point out.

as commented 7 years ago
type p pick {
    A int
    B string
}

Do A and B need to be there? A pick picks from a set of types, so why not throw out their identifier names completely:

type p pick {
    int
    string
}
q := p{string: "hello"}

I believe this form is already valid for struct. There can be a constraint that it is required for pick.

jimmyfrasche commented 7 years ago

@as if the field name is omitted it is the same as the type so your example works, but since those field names are unexported they could only be set/accessed from within the defining package.

The field names do need to be there, even if implicitly generated based on the type name, or there are bad interactions with assignability and interface types. The field names are what makes it work with the rest of Go.

jimmyfrasche commented 7 years ago

@as apologies, I just realized you meant something different than what I read.

Your formulation works but then you have things that look like struct fields but behave differently because of the usual exported/unexported thing.

Is string accessible from outside the package defining p because it's in the universe?

What about

type t struct {}
type P pick {
  t
  //other stuff
}

?

By separating the field name from the type name you can do things like

pick {
  unexported Exported
  Exported unexported
}

or even

pick { Recoverable, Fatal error }

If pick fields behave like struct fields you can use a lot of what you already know about struct fields to think about pick fields. The only real difference is that only one can pick field can be set at a time.

as commented 7 years ago

@jimmyfrasche Go already supports embedding anonymous types inside structs, so the restriction of scope is one that already exists in the language, and I believe that problem is being solved by type aliases. But admit I haven't thought of every possible use case. It seems to hinge on whether this idiom is common in Go:

package p
type T struct{
    Exported t
}
type t struct{}

The small t exists in a package where it's embedded in large T, and it's only exposure is through such exported types.

jimmyfrasche commented 7 years ago

@as

I'm not sure I entirely follow, however:

//with the option to have field names
pick { //T is in the namespace of the pick and the type isn't exposed to other packages
  T t
  //...
}

//without
type T = t //T has to be defined in the outer scope and now t is exposed to other packages
pick {
  T
  //...
}

Also, if you only had the type name for the label, to include say a []string you'd need to do a type Strings = []string.

DemiMarie commented 7 years ago

That is very much the way I want to see pick types implemented. In particular, it is how Rust and C++ (the gold standards for performance) do it.

If I just wanted exhaustiveness checking, I could use a checker. I want the performance win. That means that pick types cannot be nil, either.

Taking the address of a member of a pick element should not be allowed (it is not memory safe, even in the single-threaded case, as is well-known in the Rust community.). If that requires other restrictions on a pick type, then so be it. But for me having pick types always allocate on the heap would be bad.

On Aug 18, 2017 12:01 PM, "jimmyfrasche" notifications@github.com wrote:

@josharian https://github.com/josharian so if I'm reading that correctly the reason iface is now always (type, value) instead of stashing word-sized values in the second field as Go did previously is so that the concurrent GC doesn't need to inspect both fields to see whether the second is a pointer—it can just assume that it always is. Did I get that right?

In other words, if the pick type were implemented (using C notation) like

struct { int which; union { A a; B b; C c; } summands; }

the GC would need to take a lock (or something fancy but equivalent) to inspect which to determine if summands needed to be scanned?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/golang/go/issues/19412#issuecomment-323393003, or mute the thread https://github.com/notifications/unsubscribe-auth/AGGWB3Ayi31dYwotewcfgmCQL-XVrfxIks5sZbVrgaJpZM4MTmSr .

jimmyfrasche commented 7 years ago

@DemiMarie

Taking the address of a member of a pick element should not be allowed (it is not memory safe, even in the single-threaded case, as is well-known in the Rust community.). If that requires other restrictions on a pick type, then so be it.

That's a good point. I had that in there but it must have got lost in an edit. I did include that when you access the value from a pick it always returns a copy for that same reason though.

As an example of why that's true, for posterity, consider

v := pick{ A int; B bool }{A: 5}
p := &v.[A] //(this would be illegal but pretending it's not for a second)
v.B = true

If v is optimized so that the fields A and B take up the same position in memory then p isn't pointing to an int: it's pointing to a bool. Memory safety was violated.

stevenblenkinsop commented 7 years ago

@jimmyfrasche

The second reason you wouldn't want the contents to be addressable is mutation semantics. If the value is stored indirectly under certain circumstances, then

v := pick{ A int; ... }{A: 5}
v2 := v

v2.[A] = 6 // (this would be illegal under the proposal, but would be 
           // permitted if `v2.[A]` were addressable)

fmt.Println(v.[A]) // would print 6 if the contents of the pick are stored indirectly

One place where pick is similar to interfaces is that you want to retain value semantics if you store values in it. If you might need indirection as an implementation detail, the only option is to make the contents not addressable (or more precisely, mutably addressable, but the distinction doesn't exist in Go at present), so that you can't observe the aliasing.

stevenblenkinsop commented 7 years ago

Edit: Oops (see below)

@jimmyfrasche

The zero value of a pick type is its first field in source order and the zero value of that field.

Note that this wouldn't work if the first field needs to be stored indirectly, unless you special case the zero value so that v.[A] and v.(error) do the right thing.

jimmyfrasche commented 7 years ago

@stevenblenkinsop I'm not sure what you mean by "the first field needs to be stored indirectly". I assume you mean if the first field is a pointer or a type that implicitly contains a pointer. If so, there's an example below. If not, could you please clarify?

Given

var p pick { A error; B int }

the zero value, p, has dynamic field A and the value of A is nil.

stevenblenkinsop commented 7 years ago

I wasn't referring to the value stored in the pick being/containing a pointer, I was referring to a non-pointer value being stored indirectly due to layout constraints imposed by the garbage collector, as described by @josharian.

In your example, p.B—not being a pointer—would not be able to share overlapping storage with p.A, which comprises two pointers. It would most likely have to be stored indirectly (i.e. be represented as a *int that automatically gets dereferenced when you access it, rather than as an int). If p.B were the first field, the zero value of the pick would be new(int), which isn't an acceptable zero value since it requires initialization. You'd need to special case it so that a nil *int is treated as a new(int).

stevenblenkinsop commented 7 years ago

@jimmyfrasche Oh sorry. Going back over the conversation, I realized you were considering using adjacent storage to store variants with incompatible layouts, rather than copying the interface mechanism of indirect storage of non-pointer types. My last three comments don't make sense in that case.

jimmyfrasche commented 7 years ago

Edit: whoops, race condition. Posted then saw your comment.

@stevenblenkinsop ah, okay I see what you mean. But that's not a problem.

Sharing overlapping storage is an optimization. It could never do that: the semantics of the type are the important bit.

If the compiler can optimize the storage and chooses to do so, it's a nice bonus.

In your example, the compiler could store it exactly as it would the equivalent struct (adding a tag to know which is the active field). This would be

struct {
  which_field int // 0 = A, 1 = B
  A error
  B int
}

The zero value is still all bytes 0 and there's no need to surreptitiously allocate as a special case.

The important part is ensuring that only one field is in play at a given time.

jimmyfrasche commented 7 years ago

The motivation for allowing type assertions/switches on picks was so that, for example, if every type in the pick satisfied fmt.Stringer you could write a method on the pick like

func (p P) String() string {
  return p.(fmt.Stringer).String()
}

But since the types of pick fields can be interfaces this creates a subtlety.

If the pick P in the previous example had a field whose type is itself fmt.Stringer that String method would panic if that was the dynamic field and its value is nil. You cannot type assert a nil interface to anything, not even itself. https://play.golang.org/p/HMYglwyVbl While this has always been true, it just doesn't come up regularly, but it could come up more regularly with picks.

However, the closed nature of sum types would allow an exhaustiveness linter to find everywhere this would come up (potentially with some false positives) and report the case that needs to be handled.

stevenblenkinsop commented 7 years ago

It would also be surprising, if you can implement methods on the pick, that those methods aren't used to satisfy a type assertion.

type Num pick { A int; B float32 }

func (n Num) String() string {
      switch v := n.[var] {
      case A:
          return fmt.Sprint(v)
      case B:
          return fmt.Sprint(v)
      }
}
...
n := Num{A: 5}
s1, ok := p.(fmt.Stringer) // ok == false
var i interface{} = p
s2, ok := i.(fmt.Stringer) // ok == true

You could have the type assertion promote methods from the current field if they satisfy the interface, but this runs into its own issues, such as whether to promote methods from a value in an interface field which aren't defined on the interface itself (or even how to implement this efficiently). Also, one might then expect methods common to all fields to be promoted to the pick itself, but then they'd have to be dispatched via variant selection on each call, in addition potentially to a virtual dispatch if the pick is stored in an interface, and/or to a virtual dispatch if the field is an interface.

Edit: By the way, optimally packing a pick is an instance of the shortest common superstring problem, which is NP-complete, though there are greedy approximations that are commonly used.

jimmyfrasche commented 7 years ago

The rule is if it's a pick value the type assertion asserts on the dynamic field of the pick value, but if the pick value is stored in an interface the type assertion is on the method set of the pick type. It may be surprising at first but it's fairly consistent.

It wouldn't be a problem to just drop allowing type assertions on a pick value. It would be a shame though since it does make it very easy to promote methods that all the types in the pick share without having to write out all the cases or use reflection.

Though, it would be fairly easy to use code generation to write the

func (p Pick) String() string {
  switch v := p.[var] {
  case A:
    return v.String()
  case B:
    return v.String()
  //etc
  }
}
jimmyfrasche commented 7 years ago

Just went ahead and dropped type assertions. Maybe they should be added but they're not a necessary part of the proposal.

bcmills commented 7 years ago

I want to come back to @ianlancetaylor's previous comment, because I've got some new perspective on it after thinking some more about error-handling (specifically, https://github.com/golang/go/issues/21161#issuecomment-320294933).

In particular, what does the new kind of type give us that we don't get from interface types?

As I see it, the major advantage of sum-types is that they would allow us to distinguish between returning multiple values, and returning one of several values — particularly when one of those values is an instance of the error interface.

We currently have a lot of functions of the form

func F(…) (T, error) {
    …
}

Some of them, such as io.Reader.Read and io.Reader.Write, return a T along with an error, whereas others return either a T or an error but never both. For the former style of API, ignoring the T in case of error is often a bug (e.g., if the error is io.EOF); for the latter style, returning a nonzero T is the bug.

Automated tools, including lint, can check usage of specific functions to ensure that the value is (or is not) ignored correctly when the error is non-nil, but such checks do not naturally extend to arbitrary functions.

For example, proto.Marshal is intended to be the "value and error" style if the error is a RequiredNotSetError, but seems to be the "value or error" style otherwise. Because the type system does not distinguish between the two, it is easy to accidentally introduce regressions: either not returning a value when we should, or returning a value when we should not. And implementations of proto.Marshaler further complicate the matter.

On the other hand, if we could express the type as a union, we could be much more explicit about it:

type PartialMarshal struct {
    Data []byte // The marshalled value, ignoring unset required fields.
    MissingFields []string
}

func Marshal(pb Message) []byte | PartialMarshal | error
jimmyfrasche commented 7 years ago

@ianlancetaylor, I've been playing around with your proposal on paper. Can you let me know if anything below is incorrect?

Given

var r interface{} restrict { uint, int } = 1

the dynamic type of r is int, and

var _ interface{} restrict { uint32, int32 } = 1

is illegal.

Given

type R interface{} restrict { struct { n int }, etc }
type S struct { n int }

then var _ R = S{} would be illegal.

But given

type R interface{} restrict { int, error }
type A interface {
  error
  foo()
}
type C struct { error }
func (C) foo() {}

both var _ R = C{} and var _ R = A(C{}) would be legal.

Both

interface{} restrict { io.Reader, io.Writer }

and

interface{} restrict { io.Reader, io.Writer, *bytes.Buffer }

are equivalent.

Likewise,

interface{} restrict { error, net.Error }

is equivalent to

interface { Error() string }

Given

type IO interface{} restrict { io.Reader, io.Writer }
type R interface{} restrict {
  interface{} restrict { int, uint },
  IO,
}

then the underlying type of R is equivalent to

interface{} restrict { io.Writer, uint, io.Reader, int }

Edit: small correction in italics

ianlancetaylor commented 7 years ago

@jimmyfrasche I wouldn't go so far as to say that what I wrote above was a proposal. It was more like an idea. I would have to think about your comments, but at first glance they look plausible.

ns-cweber commented 7 years ago

@jimmyfrasche's proposal is pretty much how I would intuitively expect a pick type to behave in Go. I think it's especially worth noting that his proposal to use the zero value of the first field for the zero value of the pick is intuitive with the "zero value means zeroing-out the bytes", provided the tag values begin at zero (maybe this has already been noted; this thread is very long by now...). I also like the performance implications (no unnecessary allocs), and that picks are completely orthogonal to interfaces (no surprising behavior switching on an pick that contains an interface).

The only thing I would consider changing is mutating the tag: foo.X = 0 seems like it could be foo = Foo{X: 0}; a few more characters, but more explicit that it's resetting the tag and zeroing the value. This is a minor point, and I'd still be very happy if his proposal was accepted as-is.

jimmyfrasche commented 7 years ago

@ns-cweber thank you but I can't take credit for the zero value behavior. The ideas been floating around for a while and was in @rogpeppe's proposal that came earlier in this (as you point out quite long) thread. My justification was the same as the one you gave.

As far as foo.X = 0 vs foo = Foo{X: 0}, my proposal allows both, actually. The latter is useful if that field of a pick is a struct so you can do foo.X.Y = 0 instead of foo = Foo{X: image.Point{X: foo.[X].X, 0}} which in addition to being verbose could fail at runtime.

I also think it helps to keep it as such because it reinforces the elevator pitch for its semantics: It's a struct that can only have one field set a time.

jimmyfrasche commented 7 years ago

One thing that may block it from being accepted as-is is how embedding a pick in a struct would work. I realized the other day I glossed over the various effects that would have on using the struct. I think it's repairable but not entirely sure what the best repairs are. The simplest would be that it only inherits the methods and you have to directly refer to the embedded pick by name to get to its fields and I'm leaning toward that in order to avoid a struct having both struct fields and pick fields.

ns-cweber commented 7 years ago

@jimmyfrasche Thanks for correcting me about the zero-value behavior. I agree that your proposal allows for both mutators, and I think your elevator pitch point is a good one. Your explanation for your proposal makes sense, though I could see myself setting foo.X.Y, not realizing it would automatically change the pick field. I would still be positively gleeful if your proposal succeeded, even with that slight reservation.

Lastly, your simple proposal for pick embedding seems like the one I would intuit. Even if we change our minds, we can go from the simple proposal to the complex proposal without breaking existing code, but the reverse isn't true.

jimmyfrasche commented 7 years ago

@ns-cweber

I could see myself setting foo.X.Y, not realizing it would automatically change the pick field

That's a fair point, but you could make it about a lot of things in the language, or any language, for that matter. In general, Go has safety rails but not safety scissors.

There are a lot of big things it generally protects you from, if you don't go out of your way to subvert them, but you still have to know what you're doing.

That can be annoying when you make a mistake like this but, otoh, it's not that much different than "I set bar.X = 0 but I meant to set bar.Y = 0" as the hypothetical relies on you not realizing that foo is a pick type.

Similarly i.Foo(), p.Foo(), and v.Foo() all look the same but if i is a nil interface, p is a nil pointer and Foo doesn't handle that case, the first two could panic whereas if v uses a value method receiver it could not (at least not from the invocation itself, anyway).

As far as embedding, good point about it being easy to loosen later so I just went ahead and edited the proposal.

neild commented 7 years ago

Sum types often have a valueless field. For example, in the database/sql package, we have:

type NullString struct {
    String string
    Valid  bool // Valid is true if String is not NULL
}

If we had sum types / picks / unions, this might be expressed as:

type NullString pick {
  Null   struct{}
  String string
}

A sum type has obvious advantages over a struct in this case. I think this is a common enough use that it would be worth including as an example in any proposal.

Bikeshedding (sorry), I'd argue this is worth syntactic support and inconsistency with struct field embedding syntax:

type NullString union {
  Null
  String string
}
jimmyfrasche commented 7 years ago

@neild

Hitting the last point first: As a last minute change before I posted (not strictly required in any sense), I added that if there's a named type (or a pointer to a named type) with no field name the pick creates an implicit field with the same name as the type. That may not be the best idea but it seemed like it would cover one of the common case of "any of these types" without a lot of fuss. Given that your last example could be written:

type Null = struct{} //though this puts Null in the same scope as NullString
type NullString pick {
  Null
  String string
}

Back to your main point, though, yes, that's an excellent use. In fact, you can use it to build enums: type Stoplight pick { Stop, Slow, Go struct{} }. This would be much like a const/iota faux-enum. It would even compile down to the same output. The major benefit in this case is that the number representing the state is entirely encapsulated and you could not put in any state other than those three listed.

Unfortunately, there is somewhat awkward syntax for creating and setting values of Stoplight that is exacerbated in this case:

light := Stoplight{Slow: struct{}{}}
light.Go = struct{}{}

Allowing {} or _ to be shorthand for struct{}{}, as proposed elsewhere, would help.

Many languages, especially functional languages, get around this by putting the labels in the same scope as the type. This creates a lot of complexity and would disallow two picks defined in the same scope to share field names.

However, it is easy to work around this with a code generator that creates a func with the same name of each field in the pick that takes the field's type as an argument. If it also, as a special case, took no arguments if the type was zero sized, then the output for the Stoplight example would look like this

func Stop() Stoplight {
  return Stoplight{Stop: struct{}{}}
}
func Slow() Stoplight {
  return Stoplight{Slow: struct{}{}}
}
func Go() Stoplight {
  return Stoplight{Go: struct{}{}}
}

and for your NullString example it would look like this:

func Null() NullString {
  return NullString{Null: struct{}{}}
}
func String(s string) NullString {
  return NullString{String: s}
}

It's not pretty, but it's a go generate away and likely very easily inlined.

That wouldn't work in the case where it created implicit fields based on the type names (unless the types were from other packages) or it was run on two picks in the same package that shared field names, but that's fine. The proposal doesn't do everything out of the box but it enables a lot of things and gives the programmer the flexibility to decide what's best for a given situation.

neild commented 7 years ago

More syntax bikeshedding:

type NullString union {
  Null
  Value string
}

var _ = NullString{Null}
var _ = NullString{Value: "some value"}
var _ = NullString{Value} // equivalent to NullString{Value: ""}.

Concretely, a literal with an element list that contains no keys is interpreted as naming the field to set.

This would be syntactically inconsistent with other uses of composite literals. On the other hand, it's a usage that seems sensible and intuitive in the context of union/pick/sum types (to me at least), since there's no sensible interpretation of a union initializer without a key.

jimmyfrasche commented 7 years ago

@neild

This would be syntactically inconsistent with other uses of composite literals.

That seems like a huge negative to me, though it does make sense in the context.

Also note that

var ns NullString // == NullString{Null: struct{}{}} == NullString{}
ns.String = "" // == NullString{String: ""}

For dealing with struct{}{} when I'm using a map[T]struct{} I throw

var set struct{}

somewhere and use theMap[k] = set, Similar would work with picks

bcmills commented 7 years ago

Further bikeshedding: the empty type (in the context of sum types) is conventionally named "unit", not "null".

jimmyfrasche commented 7 years ago

@bcmills Sorta.

In functional languages, when you create a sum type, its labels are actually functions that create the values of that type, (albeit special functions known as "type constructors" or "tycons" that the compiler know about to allow pattern matching), so

data Bool = False | True

creates the data type Bool and two functions in the same scope, True and False, each with the signature () -> Bool.

Here () is how you write the type pronounced unit—the type with only a single value. In Go this type can be written many different ways but it is idiomatically written as struct{}.

So the type of the constructor's argument would be called unit. The convention for the name of the constructor is generally None when used as an option type like this, but it can be changed to suit the domain. Null would be a fine name if the value were coming from a database, for example.

neild commented 7 years ago

@bcmills

As I see it, the major advantage of sum-types is that they would allow us to distinguish between returning multiple values, and returning one of several values — particularly when one of those values is an instance of the error interface.

For an alternative perspective, I see this as a major disadvantage of sum types in Go.

Many languages of course use sum types for exactly the case of returning some value or an error, and this works well for them. If sum types were added to Go, there would be a great temptation to use them in the same way.

However, Go already has a large ecosystem of code which uses multiple values for this purpose. If new code uses sum types to return (value, error) tuples, then that ecosystem will become fragmented. Some authors will continue to use multiple returns for consistency with their existing code; some authors will use sum types; some will attempt to convert their existing APIs. Authors stuck on older Go versions, for whatever reason, will be locked out of new APIs. It'll be a mess, and I don't think the gains will begin to be worth the costs.

bcmills commented 7 years ago

If new code uses sum types to return (value, error) tuples, then that ecosystem will become fragmented.

If we add sum types in Go 2 and use them uniformly, then the problem reduces to one of migration, not fragmentation: it would need to be possible to convert a Go 1 (value, error) API to a Go 2 (value | error) API and vice-versa, but they could be distinct types in the Go 2 parts of the program.

neild commented 7 years ago

If we add sum types in Go 2 and use them uniformly

Note that this is a proposal that is quite different than the ones seen here so far: The standard library will need to be extensively refactored, the translation between API styles will need to be defined, etc. Go this route and this becomes a quite large and complicated proposal for an API transition with a minor codicil regarding the design of sum types.

stevenblenkinsop commented 7 years ago

The intent is for Go 1 and Go 2 to be able to coexist seamlessly in the same project, so I don't think the concern is that someone might be stuck with a Go 1 compiler "for some reason" and be unable to use a Go 2 library. However, if you have dependency A that depends in turn on B, and B updates to use a new feature like pick in its API, then that would break dependency A unless it updates to use the new version of B. A could just vendor B and keep using the old version, but if the old version isn't being maintained for security bugs, etc... or if you need to use the new version of B directly and you can't have two versions in your project for some reason, that could create a problem.

Ultimately, the problem here has little to do with language versions, and more to do with changing the signatures of existing exported functions. The fact that it would be a new feature providing the impetus is a bit of distraction from that. If the intent is to allow existing APIs to be changed to use pick without breaking backwards compatibility, then there might need to be a bridge syntax of some sort. For example (completely as a strawman):

type ReadResult pick(N int, Err error) {
    N
    PartialResult struct { N; Err }
    Err
}

The compiler could just splat the ReadResult when it is accessed by legacy code, using zero values if a field isn't present in a particular variant. I'm not sure how to go the other way or whether it's worth it. APIs like template.Must might just have to keep accepting multiple values rather than a pick and rely on splatting to make up the difference. Or something like this could be used:

type ReadResult pick(N int, Err error) {
case Err == nil:
    N
default:
    PartialResult struct { N; Err }
case N == 0:
    Err
}

This does complicate things, but I can see how introducing a feature which changes how APIs should be written requires a story for how to transition without breaking the world. Maybe there's a way to do it that doesn't require a bridge syntax.

jimmyfrasche commented 7 years ago

It's trivial to go from sum types to product types (structs, multiple return values) — just set everything that isn't the value to zero. Going from product types to sum types is not well defined in general.

If an API wants to transition seamlessly and gradually from a product type based implementation to a sum type based one, the easiest route would be to have two versions of everything necessary where the sum type version has the actual implementation and the product type version calls the sum type version, doing any runtime checking requires and any projection down into product space.

That's really abstract so here's an example

version 1 without sums

func Take(i interface{}) error {
  switch i.(type) {
  case int: //do something
  case string:
  default: return fmt.Errorf("invalid %T", i)
  }
}
func Give() (interface{}, error) {
   i := f() //something
   if i == nil {
     return nil, errors.New("whoops v:)v")
  }
  return i
}

version 2 with sums

type Value pick {
  I int
  S string
}
func TakeSum(v Value) {
  // do something
}
// Deprecated: use TakeSum
func Take(i interface{}) error {
  switch x := i.(type) {
  case int: TakeSum(Value{I: x})
  case string: TakeSum(Value{S: x})
  default: return fmt.Errorf("invalid %T", i)
  }
}
type ErrValue pick {
  Value
  Err error
}
func GiveSum() ErrValue { //though honestly (Value, error) is fine
  return f()
}
// Deprecated: use GiveSum
func Give() (interface{}, error) {
  switch v := GiveSum().(var) {
  case Value:
    switch v := v.(var) {
    case I: return v, nil
    case S: return v, nil
    }
  case Err:
    return nil, v
  }
}

version 3 would remove Give/Take

version 4 would move the implementation of GiveSum/TakeSum to Give/Take, make GiveSum/TakeSum just call Give/Take and deprecate GiveSum/TakeSum.

version 5 would remove GiveSum/TakeSum

It's not pretty or fast but it's the same as any other large scale disruption of similar nature and requires nothing extra from the language

j7b commented 7 years ago

I think (most of) the utility of a sum type could be realized with a mechanism to constrain assignment to a type of type interface{} at compile time.

In my dreams it looks like:

type T1 switch {T2,T3} // only nil, T2 and T3 may be assigned to T1
type T2 struct{}
type U switch {} // only nil may be assigned to U
type V switch{interface{} /* ,... */} // V equivalent to interface{}
type Invalid switch {T2,T2} // only uniquely named types
type T3 switch {int,uint} // switches can contain switches but... 

...it would also be a compile-time error to assert a switch type is a type not explicitly defined:

var t1 T1
i,ok := t1.(int) // T1 can't be int, only T2 or T3 (but T3 could be int)
switch t := t1.(type) {
    case int: // compile error, T1 is just nil, T2 or T3
}

and go vet would carp about ambiguous constant assignments to types like T3 but for all intents and purposes (at runtime) var x T3 = 32 would be var x interface{} = 32. Maybe some predefined switch types for builtins in a package named something like switches or ponies would be groovy too.