golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
124.3k stars 17.7k forks source link

proposal: spec: add sum types / discriminated unions #19412

Open DemiMarie opened 7 years ago

DemiMarie commented 7 years ago

This is a proposal for sum types, also known as discriminated unions. Sum types in Go should essentially act like interfaces, except that:

Sum types can be matched with a switch statement. The compiler checks that all variants are matched. Inside the arms of the switch statement, the value can be used as if it is of the variant that was matched.

jimmyfrasche commented 7 years ago

@j7b, @ianlancetaylor offered a similar idea in https://github.com/golang/go/issues/19412#issuecomment-323256891

I posted what I believe would be the logical consequences of this later at https://github.com/golang/go/issues/19412#issuecomment-325048452

It looks like much of them would apply equally given the similarity.

It would be really great if something like that would work. It would be easy to transition from interfaces to interfaces+restrictions (especially with Ian's syntax: just tack the restrict on the end of existing pseudo-sums built with interfaces). It would be easy to implement since at runtime they'd essentially be identical to interfaces and most of the work would just be having the compiler emit additional errors when their invariants are broken.

But I don't think it's possible to make it work.

Everything lines up so close that it looks like a fit, but you zoom in and it just isn't quite right, so you give it a little push and then something else pops out of alignment. You can try to repair it but then you get something that looks a lot like interfaces but behaves differently in weird cases.

Maybe I'm missing something.

stevenblenkinsop commented 7 years ago

There's nothing wrong with the restricted interface proposal as long as you're okay with the cases not necessarily being disjoint. I don't think it's as surprising as you do that a union between two interface types (like io.Reader / io.Writer) isn't disjoint. It's entirely consistent with the fact that you can't determine whether a value assigned to an interface{} had been stored as an io.Reader or an io.Writer if it implements both. The fact that you can construct a disjoint union as long as each case is a concrete type seems perfectly adequate.

The tradeoff is that, if unions are restricted interfaces, then you can't define methods directly on them. And if they're restricted interface types, you don't get the guaranteed direct storage which pick types provide. Whether it's worthwhile adding a distinct kind of thing to the language to get these additional benefits, I'm not sure.

j7b commented 7 years ago

@jimmyfrasche for type T switch {io.Reader,io.Writer} it's fine to assign a ReadWriter to T but you can only assert T is an io.Reader or Io.Writer, you'd need another assertion to assert the io.Reader or io.Writer is a ReadWriter, which should encourage adding it to the switchtype if it's a useful assertion.

jimmyfrasche commented 7 years ago

@stevenblenkinsop You could define the pick proposal without methods. In fact, if you get rid of methods and implicit field names, then you could allow pick embedding. (Though clearly I think methods and, to a much lesser degree implicit field names, are the more useful trade off there).

And, on the other hand, @ianlancetaylor's syntax would allow

type IR interface {
  Foo()
  Bar()
} restrict { A, B, C }

which would compile as long as A, B, and C each have Foo and Bar methods (though you would have to worry about nil values).

edit: clarification in italics

henryas commented 6 years ago

I think some form of restricted interface would be useful, but I disagree with the syntax. Here is what I am suggesting. It acts in a similar way as an algebraic data type, which groups domain-related objects that do not necessarily have common behavior.

//MyGroup can be any of these. It can contain other groups, interfaces, structs, or primitive types
type MyGroup group {
   MyOtherGroup
   MyInterface
   MyStruct
   int
   string
   //..possibly some other types as well
}

//type definitions..
type MyInterface interface{}
type MyStruct struct{}
//etc..

func DoWork(item MyGroup) {
   switch t:=item.(type) {
      //do work here..
   }
}

There are several benefits of this approach over the conventional empty interface interface{} approach:

Empty interface interface{} is useful when the number of types involved is unknown. You really have no choice here but to rely on runtime verification. On the other hand, when the number of types is limited and known during compile time, why not get the compiler to assist us?

Merovius commented 6 years ago

@henryas I think a more useful comparison would be the currently recommended way to do (open) sum types: Non-empty interfaces (if no clear interface can be distilled, using unexported marker functions). I don't think your arguments apply to that in a significant way.

dsnet commented 6 years ago

Here's an experience report in regards to Go protobufs:

henryas commented 6 years ago

@henryas I think a more useful comparison would be the currently recommended way to do (open) sum types: Non-empty interfaces (if no clear interface can be distilled, using unexported marker functions). I don't think your arguments apply to that in a significant way.

You mean by adding a dummy unexported method to an object so that the object can be passed as an interface, as follows?

type MyInterface interface {
   belongToMyInterface() //dummy method definition
}

type MyObject struct{}
func (MyObject) belongToMyInterface(){} //dummy method

I don't think that should be recommended at all. It's more like a workaround rather than a solution. I personally would rather forgo static type verification rather than having empty methods and unnecessary method definition lying around.

These are the problems with the dummy method approach:

as commented 6 years ago

@henryas

I don't see your third point as a strong argument. If the accountant wants to view object relationships differently the accountant can create their own interface that fits their specification. Adding a private method to an interface doesn't mean the concrete types that satisfy it are incompatible with subsets of the interface defined elsewhere.

The Go parser makes heavy use of this technique and honestly I can't imagine picks making that package so much better that it warrants implementing picks in the language.

henryas commented 6 years ago

@as My point is that every time a new relationship view is created, the relevant concrete objects must be updated to make certain accommodation for this view. It seems wrong, because in order to do that, the objects must often make a certain assumption about the consumer's domain. If the objects and the consumers are closely related or live within the same domain, such as in Go parser case, it may not matter much. However, if the objects provide basic functionalities that are to be consumed by several other domains, it becomes a problem. The objects now needs to know a little bit about all the other domains for the dummy method approach to work.

You end up with many empty methods attached to the objects, and it isn't obvious to the readers why you need those methods because the interfaces that require them live in a separate domain/package/layer.

Merovius commented 6 years ago

The point that the open-sums-via-interfaces approach doesn't let you easily¹ use sums is fair enough. Explicit sum-types obviously would make it easier to have sums. It's a very different argument than "sum types give you type-safety", though - you can still get type-safety today, if you need it.

I still see two downsides of closed sums as implemented in other languages though: One, the difficulty of evolving them in a large-scale distributed development process. And Two, that I think that they add power to the type-system and I like that Go does not have a very powerful type-system, as that discourages coding types and instead code programs - when I feel that a problem can benefit from a more powerful type-system, I move to a more powerful language (like Haskell or Rust).

That being said, at least the second one is definitely one of preference and even if you'd agree, whether the downsides are considered to outweigh the upsides is also up to personal preference. Just wanted to point out, that you can't get type-safe sums without closed sum types isn't really true :)

[1] notably, it's not easy, but still possible, e.g. you can do

type Node interface {
    node()
}

type Foo struct {
    bar.Baz
}

func (foo) node() {}
urandom commented 6 years ago

@Merovius I disagree with your second downside point. The fact that there are plenty of places in the standard library that would immensely benefit from sum types, but are now implemented using empty interfaces and panics, shows that this lacks is hurting coding. Of course, people might say that since such code has been written in the first place, there is no problem and we don't need sum types, but the folly of that logic is that we then wouldn't need any other type for function signatures, and we should just use empty interfaces instead.

As for using interfaces with some method to represent sum types right now, there's one big drawback. You don't know what types you can use for that interface, since they are implemented implicitly. With proper sum type, the type itself described exactly what types can actually be used.

Merovius commented 6 years ago

I disagree with your second downside point.

Are you disagreeing with the statement "sum types encourage programming with types", or are you disagreeing with that being a downside? Because it doesn't seem you are disagreeing with the first (your comment is basically just a re-assertion of that) and regarding the second, I acknowledged that it's up to preference above.

The fact that there are plenty of places in the standard library that would immensely benefit from sum types, but are now implemented using empty interfaces and panics, shows that this lacks is hurting coding. Of course, people might say that since such code has been written in the first place, there is no problem and we don't need sum types, but the folly of that logic is that we then wouldn't need any other type for function signatures, and we should just use empty interfaces instead.

This type of black-and-white argument doesn't really help. I agree, that sum types would reduce pain in some instances. Every change making the type-system more powerful will reduce pain in some instances - but it will also cause pain in some instances. So the question is, which outweighs the other (and that is, to a good degree, a question of preference).

The discussions shouldn't be about whether we want a python-esque type-system (no types) or a coq-esque type-system (correctness proofs for everything). The discussion should be "do the benefits of sum types outweigh their downsides" and it's helpful to acknowledge both.


FTR, I want to re-emphasize that, personally, I wouldn't be that opposed to open sum types (i.e. every sum type has an implicit or explicit "SomethingElse"-case), as it would alleviate most of the technical downsides of them (mostly that they are hard to evolve) while also providing most of the technical upsides of them (static type checking, the documentation you mentioned, you can enumerate types from other packages…).

I also assume, though, that open sums a) won't be a satisfying compromise for people who usually push for sum types and b) probably won't be considered a large enough benefit to warrant inclusion by the Go team. But I'd be ready to be proven wrong on either or both of these assumptions :)

Merovius commented 6 years ago

One more question:

The fact that there are plenty of places in the standard library that would immensely benefit from sum types

I can only think of two places in the standard library, where I'd say there is any significant benefit to them: reflect and go/ast. And even there, the packages seem to work just fine without them. From this reference point, the words "plenty" and "immensely" seem overstatements - but I might not see a bunch of legitimate places, of course.

neild commented 6 years ago

database/sql/driver.Value might benefit from being a sum type (as noted in #23077). https://godoc.corp.google.com/pkg/database/sql/driver#Value

The more public interface in database/sql.Rows.Scan would not, however, without a loss in functionality. Scan can read into values whose underlying type is e.g., int; changing its destination parameter to a sum type would require limiting its inputs to a finite set of types. https://godoc.corp.google.com/pkg/database/sql#Rows.Scan

bcmills commented 6 years ago

@Merovius

I wouldn't be that opposed to open sum types (i.e. every sum type has an implicit or explicit "SomethingElse"-case), as it would alleviate most of the technical downsides of them (mostly that they are hard to evolve)

There are at least two other options that alleviate the “hard to evolve” problem of closed sums.

One is to allow matches on types that are not actually a part of the sum. Then, to add a member to the sum, you first update its consumers to match against the new member, and only actually add that member once the consumers are updated.

Another is to allow “impossible” members: that is, members that are explicitly allowed in matches but explicitly disallowed in actual values. To add a member to the sum, you first add it as an impossible member, then update consumers, and finally change the new member to be possible.

Merovius commented 6 years ago

database/sql/driver.Value might benefit from being a sum type

Agreed, didn't know about that one. Thanks :)

One is to allow matches on types that are not actually a part of the sum. Then, to add a member to the sum, you first update its consumers to match against the new member, and only actually add that member once the consumers are updated.

Intriguing solution.

jimmyfrasche commented 6 years ago

@Merovius interfaces are essentially a family of infinite-sum types. All sum types, infinite or otherwise, have a default: case. Without finite sum types, though, default: means either a valid case you didn't know about it or an invalid case that's a bug somewhere in the program—with finite sums it's only the former and never the latter.

json.Token and the sql.Null* types are other canonical examples. go/types would benefit the same way go/ast does. I'm guessing there are a lot of examples that aren't in the exported APIs where it would have be easier to debug and test some intricate plumbing by limiting the domain of the internal state. I find them most useful for internal state and application constraints that don't come up that often in public APIs for general libraries, though they do have their occasional uses there as well.

Personally I think sum types give Go just enough extra power but not too much. The Go type system is already very nice and flexible, though it does have its shortcomings. Go2 additions to the type system just aren't going to deliver as much power as what's already there—the 80-90% of what's needed is already in place. I mean, even generics wouldn't be fundamentally letting you do something new: it would be letting you do things you already do more safely, more easily, more perfomantly, and in way that enables better tooling. Sum types are similar, imo (though obviously if it were one or the other generics would take precedence (and they pair rather nicely)).

If you allow an extraneous default (all cases + default is allowed) on sum-type switches and don't have the compiler enforce exhaustiveness (though a linter could), adding a case to a sum is just as easy (and just as difficult) as changing any other public API.

Merovius commented 6 years ago

json.Token and the sql.Null* types are other canonical examples.

Token - sure. Another instance of the AST-problem (basically any parser benefits from sum types).

I don't see the benefit for sql.Null*, though. Without generics (or adding some "magical" generic optional builtin), you are still going to have to have the types and there doesn't seem a significant difference between type NullBool enum { Invalid struct{}; Value Int } and type NullBool struct { Valid bool; Value Int }. Yes, I am aware there is a difference, but it is vanishingly small.

If you allow an extraneous default (all cases + default is allowed) on sum-type switches and don't have the compiler enforce exhaustiveness (though a linter could), adding a case to a sum is just as easy (and just as difficult) as changing any other public API.

See above. Those are what I call open sums, I'm less opposed to them.

jimmyfrasche commented 6 years ago

Those are what I call open sums, I'm less opposed to them.

My specific proposal is https://github.com/golang/go/issues/19412#issuecomment-323208336 and I believe it may satisfy your definition of open, though it is still a bit rough and I'm sure there's yet more to remove and polish. In particular I noticed it wasn't clear that a default case was admissible even if all the cases were listed so I just updated it.

Agreed that optional types aren't the killer app of sum types. They are quite nice though and as you point out with generics defining a

type Nullable(T) pick { // or whatever syntax (on all counts)
  Null struct{}
  Value T
}

once and covering all the cases would be great. But, as you also point out, we could do the same with a generic product (struct). There is the invalid state of Valid = false, Value != 0. In that scenario it would be easy to root out if that was causing problems since 2 ⨯ T is small, even if it's not as small as 1 + T.

Of course if it were a more complicated sum with lots of cases and many overlapping invariants it becomes easier to make a mistake and harder to discover the mistake even with defensive programming, so making impossible things just not compile at all can save a lot of hair pulling.

Token - sure. Another instance of the AST-problem (basically any parser benefits from sum types).

I write a lot of programs that take some input, do some processing, and produce some output and I usually divvy this up recursively into a lot of passes that divide the input into cases and transform it based on those cases as move ever closer to the desired output. I may not literally be writing a parser (admittedly sometimes I am because that's fun!) but I find the AST-problem, as you put it, applies to a lot of code—especially when dealing with abstruse business logic that has too many weird requirements and edge cases to fit in my tiny head.

When I'm writing a general library it doesn't come up in the API as often as doing some ETL or making some fanciful report or making sure that users in state X have action Y happen if they're not marked Z. Even in a general library though I find places where being able to limit the internal state would help, even if it just reduces a 10 minute debug to a 1 second "oh the compiler said I'm wrong".

With Go in particular one place where I'd use sum types is a goroutine selecting over a bunch of channels where I need to gives 3 chans to one goroutine and 2 to another. It would help me track what's going on to be able to use a chan pick { a A; b B; c C } over chan A, chan B, chan C though a chan stuct { kind MsgKind; a A; b B; c C } can do the job in a pinch at the cost of extra space and less validation.

pciet commented 6 years ago

Instead of a new type what about the compile-time type list check as an addition to the existing interface type switch feature?

func main() {
    if FlipCoin() == false {
        printCertainTypes(FlipCoin(), int(5))
    } else {
        printCertainTypes(FlipCoin(), string("5"))
    }
}
// this function compiles with main
func printCertainTypes(flip bool, in interface{}) {
    if flip == false {
        switch v := in.(type) {
        case int:
            fmt.Printf(“integer %v\n”, v)
        default:
            fmt.Println(v)
        }
    } else {
        switch v := in.(type) {
        case int:
            fmt.Printf(“integer %v\n”, v)
        case string:
            fmt.Printf(“string %v\n”, v)
        }
    }
}
// this function compiles with main
func printCertainTypes(flip bool, in interface{}) {
    switch v := in.(type) {
    case int:
        fmt.Printf(“integer %v\n”, v)   
    case bool:
        fmt.Printf(“bool %v\n”, v)
    }
    fmt.Println(flip)
    switch v := in.(type) {
    case string:
        fmt.Printf(“string %v\n”, v)
    case bool:
        fmt.Printf(“bool 2 %v\n”, v)
    }
}
// this function emits a type switch not complete error when compiled with main
func printCertainTypes(flip bool, in interface{}) {
    if flip == false {
        switch v := in.(type) {
        case int:
            fmt.Printf(“integer %v\n”, v)
        case bool:
            fmt.Printf(“bool %v\n”, v)
        }
    } else {
        switch v := in.(type) {
        case string:
            fmt.Printf(“string %v\n”, v)
        case bool:
            fmt.Printf(“bool %v\n”, v)
        }
    }
}
// this function emits a type switch not complete error when compiled with main
func printCertainTypes(flip bool, in interface{}) {
    fmt.Println(flip)
    switch v := in.(type) {
    case int:
        fmt.Printf(“integer %v\n”, v)
    case bool:
        fmt.Printf(“bool %v\n”, v)
    }
}
jimmyfrasche commented 6 years ago

In fairness, we should explore ways of approximating sum types in the current type system and weigh their pros and cons. If nothing else, it gives a baseline for comparison.

The standard means is an interface with an unexported, do-nothing method as a tag.

One argument against this is that each type in the sum needs to have this tag defined on it. This isn't strictly true, at least for members that are structs, we could do

type Sum interface { sum() }
type sum struct{}
func (sum) sum() {}

and just embed that 0-width tag in our structs.

We can add external types to our sum by introducing a wrapper

type External struct {
  sum
  *pkg.SomeType
}

though this is a bit ungainly.

If all members in the sum share common behavior, we can include those methods in the interface definition.

Constructs like this let us say that a type is in a sum, but it does not let us say what is not in that sum. In addition to the mandatory nil case, the same embedding trick can be used by external packages like

import "p"
var member struct {
  p.Sum
}

Within the package we have to take care to validate values that compile but are illegal.

There are various ways to recover some type-safety at runtime. I've found including a valid() error method in the definition of the sum interface coupled with a func like

func valid(s Sum) error {
  switch s.(type) {
  case nil:
    return errors.New("pkg: Sum must be non-nil")
  case A, B, C, ...: // listing each valid member
    return s.valid()
  }
  return fmt.Errorf("pkg: %T is not a valid member of Sum")
}

to be useful as it allows taking care of two kinds of validation at once. For members that happen to always be valid, we can avoid some boilerplate with

type alwaysValid struct{}
func (alwaysValid) valid() error { return nil }

One of the more common complaints about this pattern is that it does not make membership in the sum clear in godoc. Since it also does not let us exclude members and requires us to validate anyway, there's a simple way around this: export the dummy method. Instead of,

//A Node is one of (list of types).
type Node interface { node() }

write

//A Node is only valid if it is defined in this package.
type Node interface { 
  //Node is a dummy method that signifies that a type is a Node.
  Node()
}

We can't stop anyone from satisfying Node so we may as well let them know what does. While this doesn't make it clear at a glance which types satisfy Node (no central list), it does make it clear whether the particular type you're looking at now satisfies Node.

This pattern is useful when the majority of the types in the sum are defined in the same package. When none are, the common recourse is to fall back to interface{}, like json.Token or driver.Value. We could use the previous pattern with wrapper types for each but in the end it says as much as interface{} so there is little point. If we expect such values to come from outside the package, we can be courteous and define a factory:

//Sum is one of int64, float64, or bool.
type Sum interface{}
func New(v interface{}) (Sum, error) {
  switch v.(type) {
  case nil:
    return errors.New("pkg: Sum must be non-nil")
  case int64, float64, bool:
     return v
  }
  return fmt.Printf("pkg: %T is not a valid member of Sum")
}

A common use of sums is for optional types, where you need to differentiate between "no value" and "a value that may be zero". There are two ways to do this.

*T let's you signify no value as a nil pointer and a (possibly) zero value as the result of derefencing a non-nil pointer.

Like the previous interface-based approximations, and the various proposals for implementing sum types as interfaces with restrictions, this requires an extra pointer dereference and a possible heap allocation.

For optionals this can be avoided using the technique from the sql package

type OptionalT struct {
  Valid bool
  Value T
}

The major downside of this is that it allows encoding invalid state: Valid can be false and Value can be non-zero. It's also possible to grab Value when Valid is false (though this can be useful if you want the zero T if it was not specified). Casually setting Valid to false without zeroing Value followed by setting Valid to true (or ignoring it) without assigning Value causes a previously discarded value to accidentially resurface. This can be worked around by providing setters and getters to protect the invariants of the type.

The simplest form of sum types is when you care about the identity, not the value: enumerations.

The traditional way to handle this in Go is const/iota:

type Enum int
const (
  A Enum = iota
  B
  C
)

Like the OptionalT type this doesn't have any unnecessary indirection. Like the interface sums, it doesn't limit the domain: there are only three valid values and many invalid values, so we need to validate at runtime. If there are exactly two values we can use bool.

There's also the issue of the fundamental number-ness of this type. A+B == C. We can convert untyped integral constants to this type a bit too easily. There are plenty of places where that's desirable, but we get this no matter what. With a little extra work, we can limit this to just identity:

type Enum struct { v int }
var (
  A = Enum{0}
  B = Enum{1}
  C = Enum{2}
)

Now these are just opaque lables. They can be compared but that's it. Unfortunately now we lost const-ness, but we could get that back with a little more work:

func A() Enum { return Enum{0} }
func B() Enum { return Enum{1} }
func C() Enum { return Enum{2} }

We've regained the inability for an external user to alter the names at the cost of some boilerplate and some function calls that are highly inline-able.

However, this is in some ways nicer than the interface sums since we've almost fully closed the type. External code can only use A(), B(), or C(). They can't swap the labels around like in the var example and they can't do A() + B() and we're free to define whatever methods we want on Enum. It would still be possible for code in the same package to erroneously create or modify a value, but if we take care to ensure that does not happen, this is the first sum type that does not require validation code: if it exists, it's valid.

Sometimes you have many labels and some of them have additional date and the ones that do have the same kind of data. Say you have a value that has three valueless states (A, B, C), two with a string value (D, E) and one with a string value and an int value (F). We could use a number of combinations of the above tactics, but the simplest way is

type Value struct {
  Which int // could have consts for A, B, C, D, E, F
  String string
  Int int
}

This is a lot like the OptionalT type above, but instead of a bool it has an enumeration and there are multiple fields that can be set (or not) depending on the value of Which. Validation has to be careful that these are set (or not) appropriately.

There are lots of ways to kinda express "one of the following" in Go. Some require more care than others. They often require validating the "one of" invariant at runtime or extraneous dereferences. A major downside they all share is that since they're being simulated in the language instead of being a part of the language, the "one of" invariant doesn't show up in reflect or go/types, making it hard to metaprogram with them. To use them in metaprogramming you both need to be able to recognize and validate the correct flavor of sum and be told that that's what you're looking for since they all look a lot like valid code without the "one of" invariant.

If sum types were a part of the language, they could be reflected upon and easily pulled out of source code, resulting in better libraries and tooling. The compiler could make a number of optimizations if it were aware of that "one of" invariant. Programmers could focus on the important validation code instead of the trivial maintenance of checking that a value is indeed in the correct domain.

Merovius commented 6 years ago

Constructs like this let us say that a type is in a sum, but it does not let us say what is not in that sum. In addition to the mandatory nil case, the same embedding trick can be used by external packages like […] Within the package we have to take care to validate values that compile but are illegal.

Why? As a package author, this seems firmly in the realm of "your problem" to me. If you pass me an io.Reader, whose Read method panic's, I'm not going to recover from that and just let it panic. Likewise, if you go out of your way to create an invalid value of a type I declared - who am I to argue with you? I.e. I consider "I embedded an emulated closed sum" a problem to rarely (if ever) come up by accident.

That being said, you can prevent that problem, by changing the interface to type Sum interface { sum() Sum } and have every value return itself. That way, you can just use the return of sum(), which will be well-behaved even under embedding.

One of the more common complaints about this pattern is that it does not make membership in the sum clear in godoc.

This may help you.

The major downside of this is that it allows encoding invalid state: Valid can be false and Value can be non-zero.

This isn't an invalid state to me. Zero values aren't magical. There is no difference, IMO, between sql.NullInt64{false,0} and NullInt64{false,42}. Both are valid and equivalent representations of an SQL NULL. If all code checks Valid before using Value, the difference is not observable to a program.

It's a fair and correct criticism that the compiler does not enforce doing this check (which it probably would, for "real" optionals/sum types), making it easier to not do it. But if you do forget it, I wouldn't consider it any better to accidentally use a zero value than to accidentally use a non-zero value (with the possible exception of pointer-shaped types, as they'd panic when used, thus failing loudly - but for those, you should just use the bare pointer-shaped type anyway and use nil as "unset").

There's also the issue of the fundamental number-ness of this type. A+B == C. We can convert untyped integral constants to this type a bit too easily.

Is this a theoretical concern or has it come up in practice?

Programmers could focus on the important validation code instead of the trivial maintenance of checking that a value is indeed in the correct domain.

Just FTR, in the cases that I do use sum-types-as-sum-types (i.e. the problem can't be more elegantly modeled via golden variety interfaces) I never write any validation code. Just like I don't check for nil-ness of pointers passed as receivers or arguments (unless it's documented as a valid variant). In the places where the compiler forces me to deal with that (i.e. "no return at end of function" style problems), I panic in the default case.

Personally, I consider Go a pragmatic language, which doesn't just add safety-features for their own sake or because "everyone knows they are better", but based on demonstrated need. I think using it in a pragmatic way is thus fine.

urandom commented 6 years ago

The standard means is an interface with an unexported, do-nothing method as a tag.

There's a fundamental difference between interfaces and sum types (I didn't see it mentioned in your post). When you approximate a sum type via an interface, there's really no way to handle the value. As the consumer, you have no idea what it actually holds, and can only guess. This is no better than than just using an empty interface. It's only usefulness is if any implementation can only come from the same package that defines the interface, since only then can you control what you can get.

On the other hand, having something like:

func foo(val string|int|error) {
    switch v:= val.(type) {
    case string:
        ...
    }
}

Gives the consumer full power in using the value of the sum type. It's value is concrete, not open to interpretation.

@Merovius These "open sums" you mention have what some people might classify as a significant drawback, in that they would allow abusing them for "feature creep". This very reason has been given for why optional function arguments have been rejected as a feature.

Merovius commented 6 years ago

These "open sums" you mention have what some people might classify as a significant drawback, in that they would allow abusing them for "feature creep". This very reason has been given for why optional function arguments have been rejected as a feature.

That seems like a pretty weak argument to me - if nothing else, then because they exist, so you are already allowing whatever they enable. Indeed, we already have optional arguments, for all intents and purposes (not that I like that pattern, but it clearly already is possible in the language).

Merovius commented 6 years ago

There's a fundamental difference between interfaces and sum types (I didn't see it mentioned in your post). When you approximate a sum type via an interface, there's really no way to handle the value. As the consumer, you have no idea what it actually holds, and can only guess.

I've tried parsing this a second time and still can't. Why wouldn't you be able to use them? They can be regular, exported types. Yes, they have to be types created in your package (obviously), but apart from that there does not seem to be any restriction in how you can use them, compared to actual, closed sums.

urandom commented 6 years ago

I've tried parsing this a second time and still can't. Why wouldn't you be able to use them? They can be regular, exported types. Yes, they have to be types created in your package (obviously), but apart from that there does not seem to be any restriction in how you can use them, compared to actual, closed sums.

What happens in the case when the dummy method is exported and any third party can implement the "sum type"? Or the quite realistic scenario where a team member is not familiar with the various consumers of the interface, decides to add another implementation in the same package, and an instance of that implementation winds up being passed to these consumers through various means of the code? At a risk of repeating my apparent "unparseable" statement: "As the consumer, you have no idea what [the sum value] actually holds, and can only guess.". You know, since it's an interface, and it doesn't tell you who's implementing it.

jimmyfrasche commented 6 years ago

@Merovius

Just FTR, in the cases that I do use sum-types-as-sum-types (i.e. the problem can't be more elegantly modeled via golden variety interfaces) I never write any validation code. Just like I don't check for nil-ness of pointers passed as receivers or arguments (unless it's documented as a valid variant). In the places where the compiler forces me to deal with that (i.e. "no return at end of function" style problems), I panic in the default case.

I don't treat this as an always or never thing.

If someone passing bad input would immediately explode, I don't bother with validation code.

But if someone passing bad input might eventually cause a panic but it won't show up for awhile, then I write validation code so that the bad input is flagged as soon as possible and no one has to figure out that the error was introduced 150 frames up in the call stack (especially since they then may have to go up another 150 frames in the call stack to figure out where that bad value was introduced).

Spending half a minute now to potentially save a half hour of debugging later is pragmatic. Especially for me since I make dumb mistakes all the time and the sooner I get schooled the sooner I can move on to make the next dumb mistake.

If I have a func that takes a reader and immediately starts using it, I won't check for nil, but if the func is a factory for a struct that won't call the reader until a certain method is invoked, I'll check it for nil and panic or return an error with something like "reader must not be nil" so that the cause of the error is as close to the source of the error as possible.

godoc -analysis

I'm aware but I don't find it useful. It ran for 40 minutes on my workspace before I hit ^C and that needs to be refreshed every time a package is installed or modified. There's #20131 (forked from this very thread!) though.

That being said, you can prevent that problem, by changing the interface to type Sum interface { sum() Sum } and have every value return itself. That way, you can just use the return of sum(), which will be well-behaved even under embedding.

I haven't found that that useful. It doesn't provide any more benefits than explicit validation and it provides less validation.

Is [the fact that you can add members of a const/iota enumeration] a theoretical concern or has it come up in practice?

That particular one was theoretical: I was trying to list all the pros and cons I could think of, theoretical and practical. My larger point, though, was that there were many ways to try to express the "one of" invariant in the language that do get used fairly commonly but none as simple as just having it be a kind of type in the language.

Is [the fact that you can assign an untyped integral to a const/iota enumeration] a theoretical concern or has it come up in practice?

That one has come up in practice. It didn't take long to figure out what went wrong but it would have taken even less time if the compiler had said "there, that line—that's the one that's wrong". There's talk of other ways of handling that particular case, but I don't see how they'd be of general use.

This isn't an invalid state to me. Zero values aren't magical. There is no difference, IMO, between sql.NullInt64{false,0} and NullInt64{false,42}. Both are valid and equivalent representations of an SQL NULL. If all code checks Valid before using Value, the difference is not observable to a program.

It's a fair and correct criticism that the compiler does not enforce doing this check (which it probably would, for "real" optionals/sum types), making it easier to not do it. But if you do forget it, I wouldn't consider it any better to accidentally use a zero value than to accidentally use a non-zero value (with the possible exception of pointer-shaped types, as they'd panic when used, thus failing loudly - but for those, you should just use the bare pointer-shaped type anyway and use nil as "unset").

That "If all code checks Valid before using Value" is where the bugs slip in and what the compiler could enforce. I have had bugs like that happen (albeit with larger versions of that pattern, where there were more than one value field and more than two states for the discriminator). I believe/hope I found all of these during development and testing and none escaped into the wild, but it would be nice if the compiler could have just told me when I made that mistake and I could be sure that the only way one of these slipped past was if there was a bug in the compiler, the same way it would tell me if I tried to assign a string to a variable of type int.

And, sure, I prefer *T for optional types though that does have non-zero costs associated with it, both in execution spacetime and in the readability of the code.

(For that particular example the code to get the actual value or the correct zero value with the pick proposal would be v, _ := nullable.[Value] which is concise and safe.)

DemiMarie commented 6 years ago

That is very much not what I would want. Pick types should be value types, as in Rust. Their first word should be a pointer to GC Metadata, if needed.

Otherwise their use comes with a performance penalty that might be unacceptable. For me, the pass 10:41 AM, "Josh Bleecher Snyder" < notifications@github.com> wrote:

With the pick proposal you can choose to have a p or *p giving you more greater control over memory trade offs.

The reason interfaces allocate to store scalar values is so you you don't have to read a type word in order to decide whether the other word is a pointer; see #8405 https://github.com/golang/go/issues/8405 for discussion. The same implementation considerations would likely apply for a pick type, which might mean in practice that p end up allocating and being non-local anyway.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/golang/go/issues/19412#issuecomment-323371837, or mute the thread https://github.com/notifications/unsubscribe-auth/AGGWB-wQD75N44TGoU6LWQhjED_uhKGUks5sZaKbgaJpZM4MTmSr .

Merovius commented 6 years ago

@urandom

What happens in the case when the dummy method is exported and any third party can implement the "sum type"?

There is a difference between the method being exported and the type being exported. We seem to be talking past each other. To me, this seems to work just fine, without any difference between open and closed sums:

type X interface { x() X }
type IntX int
func (v IntX) x() X { return v }
type StringX string
func (v StringX) x() X { return v }
type StructX struct{
    Foo bool
    Bar int
}
func (v StructX) x() X { return v }

There's no extension outside the package possible, yet consumers of the package can use, create and pass around the values just like any other.

jimmyfrasche commented 6 years ago

You can embed X, or one of the local types that satisfy it, externally and then pass it to a function in your package that takes an X.

If that func calls x it either panics (if X itself was embedded and not set to anything) or returns a value that your code can operate on—but it's not what was passed by the caller, which would be a bit surprising to the caller (and their code is already suspect if they're attempting something like this because they didn't read the docs).

Calling a validator that panics with a "don't do that" message seems like the least surprising way to handle that and lets the caller fix their code.

Merovius commented 6 years ago

If that func calls x it either panics […] or returns a value that your code can operate on—but it's not what was passed by the caller, which would be a bit surprising to the caller

Like I said above: If you are surprised, that your intentional construction of an invalid value is invalid, you need to rethink your expectations. But in any case, that is not what this particular strain of discussion was about and it would be helpful to keep separate arguments separate. This one was about @urandom saying that open sums via interfaces with tag-methods wouldn't be introspectable or usable by other packages. I find that a dubious claim, it would be great if it could be clarified.

jimmyfrasche commented 6 years ago

The problem is that someone can create a type that is not in the sum that compiles and can be passed to your package.

Without adding proper sum types to the language, there are three options for handling it

  1. ignore the situation
  2. validate and panic/return an error
  3. try to "do what you mean" by implicitly extracting the embedded value and using it

3 seems like a strange mix of 1 and 2 to me: I don't see what it buys.

I agree that "If you are surprised, that your intentional construction of an invalid value is invalid, you need to rethink your expectations", but, with 3, it can be very hard to notice that something has gone wrong and even when you do it'd be hard to figure out why.

2 seems best because it both protects the code from slipping into an invalid state and sends up a flare if someone messes up letting them know why they're wrong and how to correct it.

Am I misunderstanding the intent of the pattern or are we just approaching this from different philosophies?

@urandom I'd also appreciate clarification; I'm not 100% sure on what you're trying to say, either.

Merovius commented 6 years ago

The problem is that someone can create a type that is not in the sum that compiles and can be passed to your package.

You can always do that; if in doubt, you could always use unsafe, even with compiler-checked sum types (and I don't see that as a qualitatively different way of constructing invalid values from embedding something that is clearly intended as a sum and not initializing it to a valid value). The question is "how often will this pose a problem in practice and how severe will that problem be". In my opinion, with the solution from above the answer is "pretty much never and very low" - you apparently disagree, which is fine. But either way, there doesn't seem to be much of a point laboring over this - the arguments and views on both sides of this particular point should be sufficiently clear and I'm trying to avoid too much noisy repetition and focus on the genuinely new arguments. I brought up above construction to demonstrate that there is no difference in exportability between first-class sum types and emulated-sums-via-interfaces. Not to show that they are strictly better in every way.

bcmills commented 6 years ago

if in doubt, you could always use unsafe, even with compiler-checked sum types (and I don't see that as a qualitatively different way of constructing invalid values from embedding something that is clearly intended as a sum and not initializing it to a valid value).

I think it is qualitatively different: when people misuse embedding in this way (at least with proto.Message and the concrete types that implement it), they're generally not thinking about whether it is safe and what invariants it might break. (Users assume that interfaces completely describe the required behaviors, but when interfaces are employed as union or sum types they often do not. See also https://github.com/golang/protobuf/issues/364.)

In contrast, if someone uses package unsafe to set a variable to a type to which it cannot normally refer, they're more-or-less explicitly claiming to have at least thought about what they might break and why.

jimmyfrasche commented 6 years ago

@Merovius Perhaps I've been unclear: the fact that the compiler would tell someone they used embedding wrong is more of a nice side benefit.

The largest gain of the safety feature is that it would be honored by reflect and represented in go/types. That gives tooling and libraries more information to work with. There are lots of ways to simulate sum types in Go but they're all identical to non-sum type code, so tooling and library needs out of band info to know that it's a sum type and has to be able to recognize the specific pattern being used but even those patterns allow significant variation.

It would also make unsafe the only way to create an invalid value: now you have regular code, generated code, and reflect—the latter two being more likely to cause an issue as unlike a person they cannot read the documentation.

Another side benefit of the safety means the compiler has more information and can generate better faster code.

There's also the fact that in addition to being able replace the pseudo-sum with interfaces you could replace the pseudo-sum "one of these regular types" like json.Token or driver.Value. Those are few and far between but it would be one less place where interface{} is necessary.

neild commented 6 years ago

It would also make unsafe the only way to create an invalid value

I don't think I understand the definition of "invalid value" that leads to this statement.

jimmyfrasche commented 6 years ago

@neild if you had

var v pick {
  None struct{}
  A struct { X int; Y *T}
  B int
}

it would be laid out in memory like

struct {
  activeField int //which of None (0), A (1), or B (2) is the current field
  theInt int // If None always 0
  thePtr *T // If None or B, always nil
}

and with unsafe you could set thePtr even if activeField was 0 or 2 or set a value of theInt even if activeField was 0.

In either case this would invalidate assumptions the compiler would be making and allows the same kind of theoretical bugs that we can have today.

But as @bcmills pointed out if you're using unsafe you'd better know what you're doing because it's the nuclear option.

neild commented 6 years ago

What I don't understand is why unsafe is the only way to create an invalid value.

var t time.Timer

t is an invalid value; t.C is unset, calling t.Stop will panic, etc. No unsafe required.

Some languages have type systems which go to great lengths to prevent the creation of "invalid" values. Go is not one of them. I don't see how unions move that needle significantly. (There are other reasons to support unions, of course.)

jimmyfrasche commented 6 years ago

@neild yes sorry I'm being loose with my definitions.

I should have said invalid with respect to the invariants of the sum type.

The individual types in the sum can of course be in an invalid state.

However, maintaining the sum type invariants means they're accessible to reflect and go/types as well as the programmer so manipulating them in libraries and tools maintains that safety and provides more information to the metaprogrammer

urandom commented 6 years ago

@jimmyfrasche , I'm saying that unlike a sum type, which tells you every possible type it can be, an interface is opaque in that you don't know, or at least you can't be use, what the list of types that implement the interface are. This makes writing the switch portion of the code a bit of a guesswork:

func F(sum SumInterface) {
    switch v := sum {
    case Screwdriver:
             ...
    default:
           panic ("Someone implementing a new type which gets passed to F and causes a runtime panic 3 weeks into production")
    }
}
Merovius commented 6 years ago

So, it would seem to me, that most of the issues people are having with the interface-based sum-type emulation can be solved by tolling and/or convention. E.g. if an interface contains an unexported method, it would be trivial to figure out all possible (yes, intentional circumventions) implementations. Similarly, to address most of the issues with iota-based enums, a simple convention of "an enum is a type Foo int with a declaration of the form const ( FooA Foo = iota; FooB; FooC )" would enable to write extensive and precise tools for them too.

Yes, this isn't equivalent to actual sum types (among other things, they wouldn't get first-class reflect support, though I don't really understand how important that would be anyway), but it does mean that the existing solutions appear, from my POV, better than they are often painted. And IMO it would be worth exploring that design space before actually putting them into Go 2 - at least if they really are that important to people.

(and I want to re-emphasize that I'm aware of the advantages of sum types, so there's no need to restate them for my benefit. I just don't weigh them as heavily as other people, also see the disadvantages and thus come to different conclusions on the same data)

jimmyfrasche commented 6 years ago

@Merovius that's a fine position.

The reflect support would allow libraries as well as off-line tools—linters, code generators, etc.—to access the information and to disallow it from modifying it inappropriately which cannot be detected statically with any precision.

Regardless, it's a fair idea to explore, so let's explore it.

To recap the most common families of pseudosums in Go are: (roughly in order of occurrence)

All of those can be used for both sum types and non-sum types. The first two are so rarely used for anything else that it might make sense to just assume that they represent sum types and accept the occasional false positive. For interface sums, it could limit it to unexported method with no params or returns and with no body on any members. For enums it would make sense to only recognize them when they're just Type = iota so it's not tripped up when iota is used as part of an expression.

*T for an optional T would be really hard to distinguish from a regular pointer. This could be given the convention type O = *T. That would be possible to detect, though a bit difficult since the alias name isn't part of the type. type O *T would be easier to detect but harder to work with in code. On the other hand everything that needs to be done is essentially built into the type so there's little to be gained in tooling from recognizing this. Let's just ignore this one. (Generics would likely allow something along the lines of type Optional(T) *T which would simplify "tagging" these).

The struct with an enum would be hard to reason about in tooling, which fields go with which value for the enum? We could simplify this to the convention that there must be one field per member in the enum and that the enum value and the field value must be the same, for example:

type Which int
const (
  A Which = iota
  B
  C
)
type Sum struct {
  Which
  A struct{} // has to be included to line up with the value of Which
  B struct { X int; Y float64 }
  C struct { X int; Y int } 
}

That wouldn't get optional types but we could special case "2 fields, first is bool" in the recognizer.

Using an interface{} for a grab bag sum would be impossible to detect without a magic comment like //gosum: int, float64, string, Foo

Alternately, there could be a special package with the following definitions:

package sum
type (
  Type struct{}
  Enum int
  OneOf interface{}
)

and only recognize enums if they're of the form type MyEnum sum.Enum, only recognize interfaces and structs only if they embed sum.Type, and only recognize interface{} grab bags like type GrabBag sum.OneOf (but that would still need a machine recognizable comment to explain its comments). That would have the following pros and cons: Pros

Regardless of which of those two ways are used to identify sum types, let's assume that they were recognized and move on to using that information to see what kind of tooling we can build.

We can roughly group tooling into generative (like stringer) and introspective (like golint).

The simplest generative code would be a tool to fill in a switch statement with missing cases. This could be used by editors. Once a sum type is identified as a sum type this is trivial (a bit tiresome but the actual generation logic is going to be the same with or without language support).

In all cases it would be possible to generate a function that validates the "one of" invariant.

For enums there could be more tools like stringer. In https://github.com/golang/go/issues/19814#issuecomment-291002852 I mentioned some possibilities.

The biggest generative tool is the compiler which could produce better machine code with this info, but ah well.

I can't think of any others at the moment. Is there anything on anyone's wish list?

For introspection, the obvious candidate is exhaustiveness linting. Without language support there are actually two different kinds of linting required

  1. making sure all possible states are handled
  2. making sure no invalid states are created (which would invalidate the work done by 1)

1 is trivial, but it would require all possible states and a default case because 2 can't be verified 100% (even ignoring unsafe) and you can't expect all code using your code runs this linter anyway.

2 couldn't really follow values through reflect or identify all code that could generate an invalid state for the sum but it could catch a lot of simple errors, like if you embed a sum type and then call a func with it, it could say "you wrote pkg.F(v) but you meant pkg.F(v.EmbeddedField)" or "you passed 2 to pkg.F, use pkg.B". For the struct it couldn't do much to enforce the invariant that one field is set at a time except in really obvious cases like "you're switching on Which and in the case X you set the field F to a non-zero value". It could insist that you use the generated validation function when accepting values from outside the package.

The other big thing would be showing up in godoc. godoc already groups const/iota and #20131 would help with the interface pseudosums. There's not really anything to do with the struct version that isn't explicit in the definition other than to specify the invariant.

Merovius commented 6 years ago

as well as off-line tools—linters, code generators, etc.

No. The static information is present, you don't need the type-system (or reflect) for that, convention works fine. If your interface contains unexported methods, any static tool can choose to treat that as a closed sum (because it effectively is) and do any analysis/codegen you might want. Likewise with the convention of iota-enums.

reflect is for runtime type information - and in a sense, the compiler erases the necessary info to make sums-by-convention work here (as it doesn't give you access to a list of functions or declared types or declared consts), which is why I agree that actual sums enable this.

(also, FTR, depending on the use case, you could still have a tool that uses the statically known information to generate the necessary runtime-information - e.g. it could enumerate the types which have the required tag-method and generate a lookup-table for them. But I don't understand what a use-case would be, so it's hard to evaluate the practicality of this).

So, my question was intentionally: What would the use case be, of having this info available at runtime?

Regardless, it's a fair idea to explore, so let's explore it.

When I said "explore it", I did not mean "enumerate them and argue about them in a vacuum", I meant "implement tools that use these conventions and see how useful/necessary/practical they are".

The advantage of experience reports is, that they are based on experience: You needed to do a thing, you tried to use existing mechanisms for that, you found that they didn't suffice. This focuses the discussion on the actual use-case (as in "the case it was used in") and enables to evaluate any proposed solutions against them, against the tried alternatives and to see, how a solution would not have the same pitfalls.

You are skipping the "trying to use existing mechanisms for that" part. You want to have static exhaustiveness-checks of sums (problem). Write a tool that finds interfaces with unexported methods, does the exhaustiveness-checks for any type-switch it's used in, use that tool for a while (use the existing mechanisms for it). Write up, where it failed.

jimmyfrasche commented 6 years ago

I was thinking out loud and have begun work on a static recognizer based on those thoughts that tools may use. I was, I suppose, implicitly looking for feedback and more ideas (and that paid off re generating the info necessary for reflect).

Merovius commented 6 years ago

FWIW, if I where you I'd simply ignore the complex cases and focus on the things that work: a) unexported methods in interfaces and b) simple const-iota-enums, that have int as an underlying type and a single const-declaration of the expected format. Using a tool would require using one of these two workarounds, but IMO that's fine (to use the compiler tool, you'd also need to explicitly use sums, so that seems okay).

jimmyfrasche commented 6 years ago

That's definitely a good place to start and it can be dialed in after running it over a large set of packages and seeing how many false positives/negatives there are

jimmyfrasche commented 6 years ago

https://godoc.org/github.com/jimmyfrasche/closed

Still very much a work in progress. I can't promise I won't have to add extra parameters the the constructor. It probably has more bugs than tests. But it's good enough to play with.

There's an example of usage in cmds/closed-exporer that will also list all closed types detected in a package specified by its import path.

I started just detecting all interfaces with unexported methods but they're fairly common and while some were clearly sum types others clearly weren't. If I just limited it to the empty tag method convention, I lost a lot of sum types, so I decided to record both separately and generalize the package a little bit beyond sum types to closed types.

With enums I went the other way and just recorded every non-bitset const of a defined type. I plan to expose the discovered bitsets, too.

It doesn't detect optional structs or defined empty interfaces yet since they'll require some kind of marker comment, but it does special case the ones in the stdlib.

Merovius commented 6 years ago

I started just detecting all interfaces with unexported methods but they're fairly common and while some were clearly sum types others clearly weren't.

I would find it helpful if you could provide some of the examples that weren't.

jimmyfrasche commented 6 years ago

@Merovius sorry I didn't keep a list. I found them by running stdlib.sh (in cmds/closed-explorer). If I run across a good example next time I get to play with this I'll post it.

The ones that I'm not considering as sum types were all unexported interfaces that were being used to plug in one of several implementations: nothing cared what was in the interface, just that there was something that satisfied it. They were very much being used as interfaces not sums, but just happened to be closed because they were unexported. Perhaps that's a distinction without a difference, but I can always change my mind after further investigation.