proposal: spec: sum types based on general interfaces

ianlancetaylor commented 1 year ago

This is a speculative issue based on the way that type parameter constraints are implemented. This is a discussion of a possible future language change, not one that will be adopted in the near future. This is a version of #41716 updated for the final implementation of generics in Go.

We currently permit type parameter constraints to embed a union of types (see https://go.dev/ref/spec#Interface_types). We propose that we permit an ordinary interface type to embed a union of terms, where each term is itself a type. (This proposal does not permit the underlying type syntax ~T to be used in an ordinary interface type, though of course that syntax is still valid for a type parameter constraint.)

That's really the entire proposal.

Embedding a union in an interface affects the interface's type set. As always, a variable of interface type may store a value of any type that is in its type set, or, equivalently, a value of any type in its type set implements the interface type. Inversely, a variable of interface type may not store a value of any type that is not in its type set. Embedding a union means that the interface is something akin to a sum type that permits values of any type listed in the union.

For example:

type MyInt int
type MyOtherInt int
type MyFloat float64
type I1 interface {
    MyInt | MyFloat
}
type I2 interface {
    int | float64
}

The types MyInt and MyFloat implement I1. The type MyOtherInt does not implement I1. None of MyInt, MyFloat, or MyOtherInt implement I2.

In all other ways an interface type with an embedded union would act exactly like an interface type. There would be no support for using operators with values of the interface type, even though that is permitted for type parameters when using such a type as a type parameter constraint. This is because in a generic function we know that two values of some type parameter are the same type, and may therefore be used with a binary operator such as +. With two values of some interface type, all we know is that both types appear in the type set, but they need not be the same type, and so + may not be well defined. (One could imagine a further extension in which + is permitted but panics if the values are not the same type, but there is no obvious reason why that would be useful in practice.)

In particular, the zero value of an interface type with an embedded union would be nil, just as for any interface type. So this is a form of sum type in which there is always another possible option, namely nil. Sum types in most languages do not work this way, and this may be a reason to not add this functionality to Go.

As an implementation note, we could in some cases use a different implementation for interfaces with an embedded union type. We could use a small code, typically a single byte, to indicate the type stored in the interface, with a zero indicating nil. We could store the values directly, rather than boxed. For example, I1 above could be stored as the equivalent of struct { code byte; value [8]byte } with the value field holding either an int or a float64 depending on the value of code. The advantage of this would be reducing memory allocations. It would only be possible when all the values stored do not include any pointers, or at least when all the pointers are in the same location relative to the start of the value. None of this would affect anything at the language level, though it might have some consequences for the reflect package.

As I said above, this is a speculative issue, opened here because it is an obvious extension of the generics implementation. In discussion here, please focus on the benefits and costs of this specific proposal. Discussion of sum types in general, or different proposals for sum types, should remain on #19412 or newer variants such as #54685. Thanks.

dsnet commented 1 year ago

This proposal does not permit the underlying type syntax ~T to be used in an ordinary interface type, though of course that syntax is still valid for a type parameter constraint.

Could you comment on why this restriction occurs? Is this simply to err on the side of caution initially and potentially remove this restriction in the future? Or is there a technical reason not to do this?

ianlancetaylor commented 1 year ago

The reason to not permit ~T is that the current language would provide no mechanism for extracting the type of such a value. Given interface { ~int }, if I store a value of type myInt in that interface, then code in some other package would be unable to use a type assertion or type switch to get the value out of the interface type. The best that it could do would be something like reflect.TypeOf(v).Kind(). That seems sufficiently awkward that it requires more thought and attention, beyond the ideas in this proposal.

dsnet commented 1 year ago

Is there a technical reason that the language could not also evolve to support ~T in a type switch? Granted that this is outside the scope of this proposal, but I think there is a valid use case for it.

jimmyfrasche commented 1 year ago

In a vacuum, I'd prefer pretty much any other option, but since it's what generics use, it's what we should go with here and we should embrace it fully. Specifically,

type I2 int | float64 should be legal
v, ok := i.(int | float64) follows from 1
in a type switch case int | float64: works like 2
string | fmt.Stringer should be legal even though that does not currently work with constraints

@dsnet I think comparable and ~T could be considered and discussed separately—if for no reason other than this thread will probably get quite long on its own. I'm :+1: on both.

DeedleFake commented 1 year ago

With the direct storage mechanism detailed in the post as an alternative to boxing, would it be possible for the zero-value not to be nil after all? For example, if the code value is essentially an index into the list of types and the value stores the value of that type directly, then the zero value with all-zeroed memory would actually default to a zero value of the first type in the list. For example, given

type Example interface {
  int16 | string
}

the zero value in memory would look like {code: 0, value: 0}.

Also, in that format, would the value side change sizes depending on the type? For example, would a value of Example(1) look like {code: 0, value: [...]byte{0, 1}) ignoring endianess, while a value of Example("example") would look like {code: 1, value: [...]byte{/* raw bytes of a string header */}}? If so, how would this affect embedding these interface types into other types, such as a []Example? Would the slice just assume the maximum possible necessary size for the given types? Edit: Never mind, dumb question. The size changing could be a minor optimization when copying, but of course anywhere it's stored would have to assume the maximum possible size, even just local variables, unless the compiler could prove that it's only ever used with a smaller type, I guess.

It would only be possible when all the values stored do not include any pointers, or at least when all the pointers are in the same location relative to the start of the value.

I don't understand this comment, which may indicate that I'm missing something fundamental about the explanation. Why would pointers make any difference? If the above Example type had int16 | string | *int, why would it not just be {code: 2, value: /* the pointer value itself, ignoring whatever it points to */}?

apparentlymart commented 1 year ago

The example in the proposal is rather contrived, so I tried to imagine some real situations I've encountered where this new capability could be useful to express something that was harder to express before.

Is the following also an example of something that this proposal would permit?

type Success[T] struct {
    Value T
}

type Failure struct {
    Err error
}

type Result[T] interface {
    Success[T] | Failure
}

func Example() Result[string] {
    return Success[string]{"hello"}
}

(NOTE WELL: I'm not meaning to imply that the above would be a good idea, but it's the example that came most readily to mind because I just happened to write something similar -- though somewhat more verbose -- to smuggle (result, error) tuples through a single generic type parameter yesterday. Outside of that limited situation I expect it would still be better to return (string, error).)

Another example I thought of is encoding/json's Token type, which is currently defined as type Token any and is therefore totally unconstrained.

Although I expect it would not be appropriate to change this retroactively for compatibility reasons, presumably a hypothetical green field version of that type could be defined like this instead:

type Token interface {
    Delim | bool | float64 | Number | string
    // (json.Token also allows nil, but since that isn't a type I assume
    // it wouldn't be named here and instead it would just be
    // a nil value of type Token.)
}

Given that the exact set of types here is finite, would we consider it to be a breaking change to add new types to this interface later? If not, that could presumably allow the following to compile by the compiler noticing that the case labels are exhaustive:

// TokenString is a rather useless function that's just here to illustrate an
// exhaustive type switch...
func TokenString(t Token) string {
    switch t := t.(type) {
        case Delim:
            return string(t)
        case bool:
            return strconv.FormatBool(t)
        case float64:
            return strconv.FormatFloat(t, 'g', -1, 64)
        case Number:
            return string(t)
        case string:
            return string
    }
}

I don't feel strongly either way about whether such sealed interfaces should have this special power, but it does seem like it needs to be decided either way before implementation because it would be hard to change that decision later without breaking some existing code.

Even if this doesn't include a special rule for exhaustiveness, this still feels better in that it describes the range of Decoder.Token() far better than any does.

EDIT: After posting this I realized that my type switch doesn't account for nil. That feels like it's a weird enough edge that it probably wouldn't be worth the special case of allowing exhaustive type-switch matching.

Finally, it seems like this would shrink the boilerplate required today to define what I might call a "sealed interface", by which I mean one which only accepts a fixed set of types defined in the same package as the interface.

One way I've used this in the past is to define struct types that act as unique identifiers for particular kinds of objects but then have some functions that can accept a variety of different identifier types for a particular situation:

type ResourceID struct {
    Type string
    Name string
}

type ModuleID struct {
    Name string
}

type Targetable interface {
    // Unexported method means that only types
    // in this package can implement this interface.
    targetable()
}

func (ResourceID) targetable() {}
func (ModuleID) targetable() {}

func Target(addr Targetable) {
    // ...
}

I think this proposal could reduce that to the following, if I've understood it correctly:

type ResourceID struct {
    Type string
    Name string
}

type ModuleID struct {
    Name string
}

type Targetable interface {
    ResourceID | ModuleID
}

func Target(addr Targetable) {
    // ...
}

If any of the examples I listed above don't actually fit what this proposal is proposing (aside from the question about exhaustive matching, which is just a question), please let me know!

If they do, then I must admit I'm not 100% convinced that the small reduction in boilerplate is worth this complexity, but I am leaning towards :+1: because I think the updated examples above would be easier to read for a future maintainer who is less experience with Go and so would benefit from a direct statement of my intent rather than having to infer the intent based on familiarity with idiom or with less common language features.

ianlancetaylor commented 1 year ago

@dsnet Sure, we could permit case ~T in a type switch, but there are further issues. A type switch can have a short declaration, and in a type switch case with a single type we're then permitted to refer to that variable using the type in the case. What type would that be for case ~T? If it's T then we lost the methods, and fmt.Printf will behave unexpectedly if the original type had a String method. If it's ~T what can we do with a value of that type? It's quite possible that these questions can be answered, but it's not just outside the scope of this proposal, it's actually complicated.

ianlancetaylor commented 1 year ago

@DeedleFake The alternative implementation is only an implementation issue, not a language issue. We shouldn't use that to change something about the language, like whether the value can be nil or some other zero value. In Go the zero value of interface types is nil. It would be odd to change that for the special case of interfaces that embed a union type element.

The reason pointer values matter is that given a value of the interface type, the current garbage collector implementation has to be able to very very quickly know which fields in that value are pointers. The current implementation does this by associating a bitmask of pointers with each type, such that a 1 in the bitmask means that the pointer-sized slot at that offset in the value always holds a pointer.

ianlancetaylor commented 1 year ago

@apparentlymart I think that everything you wrote is correct according to this proposal. Thanks.

DeedleFake commented 1 year ago

In Go the zero value of interface types is nil. It would be odd to change that for the special case of interfaces that embed a union type element.

It would be, but I think it would be worth it. And I don't think it would be so strange as to completely preclude eliminating the extra oddness that would come from union types always being nilable. In fact, I'd go so far as to say that if this way of implementing unions has to have them be nilable, then a different way of implementing them should be found.

The reason pointer values matter is that given a value of the interface type, the current garbage collector implementation has to be able to very very quickly know which fields in that value are pointers.

I was worried it was going to be the garbage collector... Ah well.

merykitty commented 1 year ago

A major problem is that type constraints work on static types while interfaces work on dynamic types of objects. This immediately prohibits this approach to do union types.

type Addable interface {
    int | float32
}

func Add[T Addable](x, y T) T {
    return x + y
}

This works because the static type of T can only be int or float, which means the addition operation is defined for all the type set of T. However, if we allow Addable to be a sum type, then the type set of T becomes {int, float, Addable} which does not satisfy the aforementioned properties!!!

apparentlymart commented 1 year ago

@merykitty per my understanding of the proposal, I think for the dynamic form of what you wrote you'd be expected to write something this:

type Addable interface {
    int | float32
}

func Add(x, y Addable) Addable {
    switch x := x.(type) {
    case int:
        return x + y.(int)
    case float32:
        return x + y.(float32)
    default:
        panic("unsupported Addable types %T + %T", x, y)
    }
}

Of course this would panic if used incorrectly, but I think that's a typical assumption for interface values since they inherently move the final type checking to runtime.

I would agree that the above seems pretty unfortunate, but I would also say that this feels like a better use-case for type parameters than for interface values and so the generic form you wrote is the better technique for this (admittedly contrived) goal.

Merovius commented 1 year ago

@merykitty No, in your example, Addable itself should not be able to instantiate Add. Addable does not implement itself (only int and float32 do).

Merovius commented 1 year ago

also, note that the type set never includes interfaces. So Addable is never in its own type set.

mateusz834 commented 1 year ago

Is something like that going to be allowed?

type IntOrStr interface {
    int | string
}

func DoSth[T IntOrStr](x T) {
    var a IntOrStr = x
        _ = a
}

zephyrtronium commented 1 year ago

Let's say I have these definitions.

type I1 interface {
    int | any
}

type I2 interface {
    string | any
}

type I interface {
    I1 | I2
}

Would it be legal to have a variable of type I? Can I assign an I1 to it? What about string? any(int8)? int8?

Merovius commented 1 year ago

@mateusz834 Can't see why not.

@zephyrtronium

Would it be legal to have a variable of type I? Can I assign an I1 to it? What about string? any(int8)? int8?

I think the answer to all of these is "yes". For the cases where you assign an interface value, the dynamic type/value of the I variable would then become the dynamic type/value of the assigned interface. In particular, the dynamic type would never be an interface.

Merovius commented 1 year ago

FWIW my main issue with this proposal is that IMO union types should allow representing something like ~string | fmt.Stringer , but for well-known reasons this isn't possible right now and it's not clear it ever would be. One advantage of "real" sum types is that they have an easier time representing that kind of thing. Specifically, I don't think #54685 has that problem (though it's been a spell that I looked at that proposal in detail).

leighmcculloch commented 1 year ago

I think this approach is elegant given that type sets on constraints already exist, and so for any union discriminated only by types this seems almost perfect.

I think there are three short comings of the proposal that would prevent it from being usable in many of the cases where I currently construct union-like structures.

prevailing nil

In particular, the zero value of an interface type with an embedded union would be nil, just as for any interface type. So this is a form of sum type in which there is always another possible option, namely nil. Sum types in most languages do not work this way, and this may be a reason to not add this functionality to Go.

This is mentioned in the proposal, and I think this constraint simplification is problematic. In Go default values are useful, but by making sum types nillable, we make their default value not useful. Of course maybe this is reasonable given that Go has no widely used "optional" value type beyond pointers.

To address this shortcoming, could we make interface types that contain type sets non-nullable by default, and require an explicit nil | in the type set list. For type sets that do not specify nil, the default value of the interface value would be the zero value of the first type listed.

no support for non-type discriminants

The proposal defines a discriminated union where the discriminant is always the types of each case in the union. This prevents applications from creating unions where the same type appears across multiple cases but with different semantics. This happens a lot in code where I write union-like types today, and I don't think I could use this proposal for most of my union cases without it.

Here's an example of a union-like structure from some code I have.

type ClaimPredicateType int32

const (
    ClaimPredicateTypeClaimPredicateUnconditional      ClaimPredicateType = 0
ClaimPredicateTypeClaimPredicateAnd                ClaimPredicateType = 1
ClaimPredicateTypeClaimPredicateOr                 ClaimPredicateType = 2
ClaimPredicateTypeClaimPredicateNot                ClaimPredicateType = 3
ClaimPredicateTypeClaimPredicateBeforeAbsoluteTime ClaimPredicateType = 4
ClaimPredicateTypeClaimPredicateBeforeRelativeTime ClaimPredicateType = 5
)

type ClaimPredicate struct {
Type          ClaimPredicateType
AndPredicates *[]ClaimPredicate `xdrmaxsize:"2"`
OrPredicates  *[]ClaimPredicate `xdrmaxsize:"2"`
NotPredicate  **ClaimPredicate
AbsBefore     *Int64
RelBefore     *Int64
}

Ref: https://github.com/stellar/go/blob/b4ba6f8e6/xdr/xdr_generated.go#L5815-L5822

The proposal would allow only for writing the following case, which would fail to represent the complete union type:

type ClaimPredicate interface {
  []ClaimPredicate | ClaimPredicate | Int64
}

I have the same type in a few languages, and here's the same type in Rust:

pub enum ClaimPredicate {
   Unconditional,
   And(VecM<ClaimPredicate, 2>),
   Or(VecM<ClaimPredicate, 2>),
   Not(Option<Box<ClaimPredicate>>),
   BeforeAbsoluteTime(i64),
   BeforeRelativeTime(i64),
}

Ref: https://github.com/stellar/rs-stellar-xdr/blob/154e07ebb/src/curr/generated.rs#L6672-L6679

To address this shortcoming could the type set be a type list where each type in the list is also given a field name? This doesn't feel good, but it's the only way I see to address this inside the proposal in its current form. It's not clear to me how this would work in a switch statement as well. For example:

type ClaimPredicate interface {
  and                []ClaimPredicate |
  or                 []ClaimPredicate |
  not                ClaimPredicate |
  beforeAbsoluteTime Int64 |
  beforeRelativeTime Int64
}

no support for a void / no-type case

Sometimes discriminated unions have cases where no data is required. I don't think the proposal supports this. The example in point 2 above has one case like that, the Unconditional case. If such a thing was supported, it could be like:
```
type ClaimPredicate interface {
  unconditional      void |
  and                []ClaimPredicate |
  or                 []ClaimPredicate |
  not                ClaimPredicate |
  beforeAbsoluteTime Int64 |
  beforeRelativeTime Int64
}
```

leighmcculloch commented 1 year ago

@ianlancetaylor Does the proposal as-is allow both type sets and functions in an interface? It would have a remarkable property not typically present in sum types where you could have a closed set of types along with the ability to have those types implement some common functions and be used as an interface.

zephyrtronium commented 1 year ago

@leighmcculloch

To address this shortcoming, could we make interface types that contain type sets non-nullable by default, and require an explicit nil | in the type set list. For type sets that do not specify nil, the default value of the interface value would be the zero value of the first type listed.

For reference, this has been suggested a few times in #19412 and #41716, starting with https://github.com/golang/go/issues/19412#issuecomment-288485048. Requiring nil variants versus allowing source code order to affect semantics is the classic tension of sum types proposals.

Sometimes discriminated unions have cases where no data is required. I don't think the proposal supports this.

The spelling of a type with no information beyond existence is usually struct{}, or more generally any type with exactly one value. void, i.e. the zero type, means something different: logically it would represent that your unconditional variant is impossible, not that it carries no additional information.

Does the proposal as-is allow both type sets and functions in an interface? It would have a remarkable property not typically present in sum types where you could have a closed set of types along with the ability to have those types implement some common functions and be used as an interface.

Yes, since the proposal is just to allow values of general interfaces less ~T elements, methods would be fine and would dynamically dispatch to the concrete type. I agree that's a neat behavior. Unfortunately it does imply that methods can't be defined on a sum type itself; you'd have to wrap it in a struct or some other type.

leighmcculloch commented 1 year ago

Thanks @zephyrtronium. Taking your feedback into account, and also realizing that it is easy to redefine types, then I think points (2) and (3) I raised are not issues. Type definitions can be used to give the same type different semantics for each case. For example:

type ClaimPredicateUnconditional struct{}
type ClaimPredicateAnd []ClaimPredicate
type ClaimPredicateOr []ClaimPredicate
type ClaimPredicateNot ClaimPredicate
type ClaimPredicateBeforeAbsoluteTime Int64
type ClaimPredicateBeforeRelativeTime Int64

type ClaimPredicate interface {
    ClaimPredicateUnconditional |
    ClaimPredicateAnd |
    ClaimPredicateOr |
    ClaimPredicateNot |
    ClaimPredicateBeforeAbsoluteTime |
    ClaimPredicateBeforeRelativeTime
}

In the main Go code base I work in we have 106 unions implemented as multi-field structs, which require a decent amount of care to use. I think this proposal would make using those unions easier to understand, probably on par in terms of effort to write. If tools like gopls went on to support features like pre-filling out the case statements of a switch based on the type sets, since it can know the full set, that would make writing code using them easier too.

The costs of this proposal feel minimal. Any code using the sum type would experience the type as an interface and have nothing new to learn over that of interfaces. This is I think the biggest benefit of this proposal.

ncruces commented 1 year ago

To me, nil seems to be the big question here?

On the one hand, interface types are nilable and their zero value is nil.

On the other hand, union interface constraints made only of non-nilable types prevent a T from being nil, and that behaviour seems useful here as well. Is it that big a can of worms to say these can't be nil?

Exhaustiveness in type switches could potentially be left to tools.

Merovius commented 1 year ago

@ncruces

Is it that big a can of worms to say these can't be nil?

And instead, they are what? The reason to use nil is that it's precedented for "there is no dynamic type to this interface". If you don't want to use nil, you'd at least have to say what the dynamic type of a union is.

More general, there are essentially four choices around the zero value of a union type:

There is none, values of union type must be explicitly initialized. The downside is, that the language as a whole very much assumes that every type has a zero value (e.g. make([]T, …), map-indexing of non-existing keys, receiving from a closed channel…) and to a lesser degree, that it's represented by all zero bits.
The zero value is specified in the type definition. One downside is that we can't re-use the existing syntax, it needs at least to be amended by a way to specify the zero value.
The zero value is derived from the definition, most obviously "the zero value of the first case". The downside is that now the order of union terms matters, which is counter-intuitive and might not play well with existing assumptions.
The zero value is nil. The downside is, that any union value has an additional case.

This proposal makes the last choice and it seems to me, that's a pretty foundational choice to any union/sum type proposal. So, from the proposal text:

In discussion here, please focus on the benefits and costs of this specific proposal. Discussion of sum types in general, or different proposals for sum types, should remain on #19412 or newer variants such as #54685. Thanks.

So we should, in this discussion, assume that the choice of zero value is fixed as nil and not try to come up with alternative designs. If we dislike a separate nil zero value, then that's simply a reason to reject this proposal:

In particular, the zero value of an interface type with an embedded union would be nil, just as for any interface type. So this is a form of sum type in which there is always another possible option, namely nil. Sum types in most languages do not work this way, and this may be a reason to not add this functionality to Go.

ncruces commented 1 year ago

OK. If we are to leave it at that, then yes, IMO, this is a reason to reject the proposal.

I still think it might be worth discussing here why that's the best choice. Detractors might be persuaded that this is in fact the best choice.

PS: I suppose I find your 3rd option best, and not counter intuitive; but it's a different proposal, and I won't discuss it here if it's considered off topic.

atdiar commented 1 year ago

In principle, I'm in favor of this proposal. Especially since it seems nicely orthogonal.

Just a few concerns that might require further thoughts as was mentioned:

~T : a quick idea off top, would be for this to be a special anonymous interface analogous to basic interfaces where instead of a method set, an underlying type T is specified. In which case, to retrieve the initial type, one would have to switch over as usual. Might also allow conversion to T.
I think nil as the zero value is fine, provided a union interface value where the runtime.type pointer is nil cannot be assigned. It has some complexities wrt slices (make?) , the recently added clear, and perhaps a few other things (or not, I don't know). But if it works, that will be quite a nice fit. :)

Especially since the implementation of an interface is inscrutable, even when aliased, an interface value internal representation cannot be changed back to nil(?) , so for now I'm optimistic.

Why I would appreciate this feature?

(just an example) A library I wrote needed to limit a function parameter to any of the Go types that can be sent over to JavaScript world via wasm (bool, string, float64, []any, map[string]any).

It's manageable without union types but the API is not as nice as it could be as it requires plenty conversions. (had to use the trick of defining a Value interface with an unexported method, to be implemented by type Bool bool, type Float float64, etc...)

Also related to marshalling.

Merovius commented 1 year ago

@ncruces

I still think it might be worth discussing here why that's the best choice.

I don't think that's the claim. It's just what's proposed here. This is not the only proposal of its kind.

Merovius commented 1 year ago

provided a union interface value where the runtime.type pointer is nil cannot be assigned.

That seems impossible, or close to.

type I interface { int | string }
func F(p *I) {
    var v I
    if someProgramHalts() {
        *p = v
    }
}

So, the only way to do that, AFAICT, would be to make such interfaces not assignable to their own type. Which means, they can't be passed around as arguments either. Or used in many other places.

I don't believe this is workable at all.

atdiar commented 1 year ago

Well that's the point. If a variable of this type is declared but not assigned a proper value, it shouldn't be usable.

The one issue would be channels for example, one would have to find a palatable way to deal with channel closures.

One way could be to make nilability opt-in explicitly in those cases:

chan(int | string) // disallowed
chan(int | string | nil) // allowed and actually a supertype of the above, nil being sent on close

That would keep people from assigning the nilable supertype to the actual union type. They would not exactly be the same type and a regular type assertion check would have to happen.

Anyway, this is just an idea that someone can work with and ponder, I'll leave it at that.

zephyrtronium commented 1 year ago

I hadn't thought of it before writing https://github.com/golang/go/issues/57644#issuecomment-1374380626, but I think being unable to define methods on these sum types is a major downside. If I define a type like type Parameter interface { int | string } then I can't make a String or UnmarshalText method for it. (Most encoders can marshal it without issue since it would be like having an int or string in an any, but because there's no reflection interface in the proposal, there's no way to automatically unmarshal.) I could define it as this instead:

type Parameter interface {
    IntParam | StringParam
    String() string
    UnmarshalText([]byte) error
}

Then I need to define a separate type for each variant as well as the methods those types need, and I have to use IntParam and StringParam instead of just int and string. It seems like this is the code I would write already, except that I write a union instead of an unexported method. The only thing we've gained is that the implementing types show up in godoc.

Instead, I could write this:

type Parameter struct {
    F interface{ int | string }
}

func (p Parameter) String() string

func (p *Parameter) UnmarshalText(text []byte) error

Now we gain some of the advantages of typical sum types, but first the author has to know that this is a good approach, and then we have to use the struct box instead of the interface value, UnmarshalText needs two layers of indirection (unless we choose a less conservative algorithm for unboxed sum type representations), and we lose some of the nice properties of interfaces like implicit conversions for assignments. Maybe those penalties aren't that bad overall, but I can imagine situations where they would push me toward a different design. And again, this looks very similar to code I might write today.

All of this would be a non-issue if we could define methods on interfaces. That was already rejected in #39799.

leighmcculloch commented 1 year ago

To me, nil seems to be the big question here?

I think the interface value being nillable is fine. I think if we wanted a way to make interface values not nil by default, there could be a separate proposal for that, and it would interact well with this. We don't need to solve that problem as part of this proposal.

Exhaustiveness in type switches could potentially be left to tools.

I think whether switch is exhaustive is an entirely independent proposal to sum types. It can be proposed independent of any type changes, and it doesn't need to be attached to this proposal.

jimmyfrasche commented 1 year ago

It would be nice to replace something like this

func (*VM) PushInt(i int)
func (*VM) PushString(s string)
func (*VM) PushEtc(etc *Etc)

with something like this

func (*VM) Push(v int | string | *Etc)

ncruces commented 1 year ago

func (*VM) Push(v int | string | *Etc)

I assume an anonymous interface would work:

func (*VM) Push(v interface { int | string | *Etc})

jimmyfrasche commented 1 year ago

A | B | C is shorthand for interface { A | B | C } in generics code and I'm an advocate for that the same rule applying here

DeedleFake commented 1 year ago

A | B | C is shorthand for interface { A | B | C } in generics code and I'm an advocate for that the same rule applying here

That sounds good, but it leads to this oddity:

func F1(int) { ... }
func F2(string) { ... }
func F3(int | string) { ... }

F1(nil) // Error.
F2(nil) // Error.
F3(nil) // Valid.

leighmcculloch commented 1 year ago

A benefit of this proposal is that it simplifies code that consumes an interface that attempts to do the same thing with interfaces today, especially for the consumer using the types.

Without this proposal:

type RGB struct {
    R byte
    G byte
    B byte
}

func (RGB) isSumType() {}

type CMYK struct {
    C byte
    M byte
    Y byte
    K byte
}

func (CMYK) isSumType() {}

type Color interface{ isSumType() }

func PrintColor(c Color) {
    switch v := c.(type) {
    case nil:
        fmt.Println("nil")
    case RGB:
        fmt.Println(v.R, v.G, v.B)
    case *RGB:
        if v == nil {
            fmt.Println("RGB(nil)")
        } else {
            fmt.Println(v.R, v.G, v.B)
        }
    case CMYK:
        fmt.Println(v.C, v.M, v.Y, v.K)
    case *CMYK:
        if v == nil {
            fmt.Println("CMYK(nil)")
        } else {
            fmt.Println(v.C, v.M, v.Y, v.K)
        }
    }
}

With this proposal there are less surprising cases, like the fact that the type and pointer-type have to be included in the switch above.

type RGB struct {
    R byte
    G byte
    B byte
}

type CMYK struct {
    C byte
    M byte
    Y byte
    K byte
}

func PrintColor(c interface { RGB | CMYK }) {
    switch v := c.(type) {
    case nil:
        fmt.Println("nil")
    case RGB:
        fmt.Println(v.R, v.G, v.B)
    case CMYK:
        fmt.Println(v.C, v.M, v.Y, v.K)
    }
}

gophun commented 1 year ago

@leighmcculloch The point of methods and interfaces is to avoid all this type switchery within a function. In your example the types RGB and CMYK could have a Print method, which can be part of an interface if needed. The concept of an "interface" is that types have something in common. This proposal counteracts this concept, because it allows to fit types through the same hole that have nothing in common, that's why I don't like it very much. I want less type switches in code, not more.

gophun commented 1 year ago

Putting sum types under the umbrella of interfaces is almost comedic. You almost always have to treat them with a type switch. If recipients need to inspect the things they receive and have to treat them differently, then it's not an interface, it's the opposite, it's a nuisance.

Type union elements when used as a constraint for type parameters, on the other hand, are ok to be called "interfaces", because we want to access their common operators (like +, * etc.), which is in the spirit of the "interface" concept (to treat things uniformly). And here we're not encouraged to type switch on them, because it's not supported (unless we convert it to 'any' first).

merykitty commented 1 year ago

@gophun I don't see why you cannot treat them as normal interfaces with methods? The only difference between a normal interface and a union interface is that the former is open to inheritance while the dynamic type set of the latter is closed.

bronger commented 1 year ago

FWIW, I find much of it hard to understand, and Go is supposed to be simple. I agree with @gophun that it would be a petty if a new language feature made ubiquitous type switches necessary – I consider them “last resort” and not a good general pattern.

But I must also admit that I still have not understood the actual use cases. I’ve never had large Go code bases to maintain. Maybe explaining the Target function in https://github.com/golang/go/issues/57644#issuecomment-1373008575 would make it clearer to me.

My own approach has been: If I need type flexibility, use interfaces if methods are shared and embedding if attributes are shared. And in the remaining edge cases, I use any as the argument type and, well, use a type switch. Of course some errors are then caught in the tests rather than the compilation step, but is this disadvantageous enough that we should make interfaces even more complicated?

atdiar commented 1 year ago

I think it's a feature that will be more useful for library writers who need to expose a given interface.

So basically, it should allow for improvements in coder UX.

I expect that the consumer of a library will often be oblivious to the internal type switching.

Another example of such usefulness is when defining a tree datastructure where nodes can be a handful of very specific types.

bronger commented 1 year ago

Yes but I still want to understand the motivation better. (Besides, I also write or will write libraries.)

Let be ask something very specific: If you use a sum type instead of any in a function signature, you obviously gain the compile-time check that the given type is one of the types included into the union. I also read in this issue that with sum types, there is the possibility to enhance the Go compiler or linters to see whether your type switches are complete. What are further advantages of sum types over any?

merykitty commented 1 year ago

@bronger Some advantages that I can take from the top of my mind: 1, Sum types are also interfaces and can have common behaviours expressed through methods 2, The type sets of sum types are known, allowing better layout and improved performance 3, Similar to normal interfaces, they express intents regarding the signature of the function, instead of relying on reading the implementation details and documentations

bronger commented 1 year ago

1, Sum types are also interfaces and can have common behaviours expressed through methods

You mean, additionally to type unions, an interface defines some common methods?

merykitty commented 1 year ago

@bronger WDYM? From the proposal

In all other ways an interface type with an embedded union would act exactly like an interface type.

Which means we can define a union as

type Vehicle interface {
    Car | Bicycle
    Go()
}

bronger commented 1 year ago

But then, your point (1) is not an advantage because this is also possible with any.

gophun commented 1 year ago

type Vehicle interface {
    Car | Bicycle
    Go()
}

This interface is unnecessarily specific. It should just be

type Vehicle interface {
    Go()
}

The other version unnecessarily limits the types an outsider can use, and you would need a type switch to discriminate between the two allowed types. The point of interfaces is to treat different things uniformly and do give an outsider the possibility to provide their own types that implement the interface. In this example the outsider can no longer add a Boat to implement the Vehicle interface.

merykitty commented 1 year ago

@gophun That's just an example to show that a sum type also has methods.

The point of interfaces is to treat different things uniformly

This does not in any way say how they are treated to show the uniformity to the outside world. A dynamic dispatch is as valid as an explicit type switch.

and do give an outsider the possibility to provide their own types that implement the interface

No you just made this up, there are interfaces out there that intentionally declare private methods so that other packages cannot implement them. An interface just declares a contract, and a contract can involve no unexpected implementation.

Merovius commented 1 year ago

The most commonly mentioned use-case for sum/union types are AST packages. For example, go/ast.Node is currently an interface, with a bunch of methods. But that definition is obviously wrong. For example, the type struct { ast.Node } also satisfies the interface and has all the necessary methods, but it's certainly not intended to be usable, from the point of view of the ast package.

This becomes a problem when you then pass this into ast.Walk, for example. Walk is implemented as a big type switch over all dynamic types which are expected to be possible Nodes. But the set of possible Nodes is infinite, so the (static) type of ast.Walk is really incorrect - there has to be additional type-checking at runtime.

There are several different ways to fix this:

Sum/union types. This just allows the ast package to enumerate all the valid Node types and be done with it. Nothing else needs to happen.
Add more methods to the interface. For example, the interface could also include a Walk(f func(Visitor, Node)) (or something) method. But then, what about go/format? Or go/types? Or third party Go tools? These also commonly accept an ast.Node (or similar) to do their thing and they are not at liberty to specify that a Node has to have additional methods. So they still have to do the type-switch and runtime type checking.
Make Node a struct, instead of an interface. It could look like type Node struct { comment *Comment; commentGroup *CommentGroup; /* … */ } with a field per possible case. As long as the ast package only creates Nodes with a single of these fields set, it can treat it as a sum. And a struct type can't be "subtyped". However, this has performance problems (there are a lot of possible Nodes, so you need to carry around a lot of pointers, almost all are nil). It's also, technically, still a bit prone to programmer error, as the ast package itself could contain a bug, creating a Node with more than one set field.

So union/sum types are not the only way to solve this. And it might still be questioned if this problem needs solving and if the cost is justified to solve it. But they are a relatively common solution to this kind of problem, in other languages.

bronger commented 1 year ago

So,

type Vehicle interface {
    Car | Bicycle
    Go()
}

is for the use case that e.g. a function wants to call methods, and additionally do type-switching stuff with the argument?

golang / go

proposal: spec: sum types based on general interfaces #57644

Why I would appreciate this feature?