golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
122.8k stars 17.51k forks source link

proposal: spec: sum types based on general interfaces #57644

Open ianlancetaylor opened 1 year ago

ianlancetaylor commented 1 year ago

This is a speculative issue based on the way that type parameter constraints are implemented. This is a discussion of a possible future language change, not one that will be adopted in the near future. This is a version of #41716 updated for the final implementation of generics in Go.

We currently permit type parameter constraints to embed a union of types (see https://go.dev/ref/spec#Interface_types). We propose that we permit an ordinary interface type to embed a union of terms, where each term is itself a type. (This proposal does not permit the underlying type syntax ~T to be used in an ordinary interface type, though of course that syntax is still valid for a type parameter constraint.)

That's really the entire proposal.

Embedding a union in an interface affects the interface's type set. As always, a variable of interface type may store a value of any type that is in its type set, or, equivalently, a value of any type in its type set implements the interface type. Inversely, a variable of interface type may not store a value of any type that is not in its type set. Embedding a union means that the interface is something akin to a sum type that permits values of any type listed in the union.

For example:

type MyInt int
type MyOtherInt int
type MyFloat float64
type I1 interface {
    MyInt | MyFloat
}
type I2 interface {
    int | float64
}

The types MyInt and MyFloat implement I1. The type MyOtherInt does not implement I1. None of MyInt, MyFloat, or MyOtherInt implement I2.

In all other ways an interface type with an embedded union would act exactly like an interface type. There would be no support for using operators with values of the interface type, even though that is permitted for type parameters when using such a type as a type parameter constraint. This is because in a generic function we know that two values of some type parameter are the same type, and may therefore be used with a binary operator such as +. With two values of some interface type, all we know is that both types appear in the type set, but they need not be the same type, and so + may not be well defined. (One could imagine a further extension in which + is permitted but panics if the values are not the same type, but there is no obvious reason why that would be useful in practice.)

In particular, the zero value of an interface type with an embedded union would be nil, just as for any interface type. So this is a form of sum type in which there is always another possible option, namely nil. Sum types in most languages do not work this way, and this may be a reason to not add this functionality to Go.

As an implementation note, we could in some cases use a different implementation for interfaces with an embedded union type. We could use a small code, typically a single byte, to indicate the type stored in the interface, with a zero indicating nil. We could store the values directly, rather than boxed. For example, I1 above could be stored as the equivalent of struct { code byte; value [8]byte } with the value field holding either an int or a float64 depending on the value of code. The advantage of this would be reducing memory allocations. It would only be possible when all the values stored do not include any pointers, or at least when all the pointers are in the same location relative to the start of the value. None of this would affect anything at the language level, though it might have some consequences for the reflect package.

As I said above, this is a speculative issue, opened here because it is an obvious extension of the generics implementation. In discussion here, please focus on the benefits and costs of this specific proposal. Discussion of sum types in general, or different proposals for sum types, should remain on #19412 or newer variants such as #54685. Thanks.

DeedleFake commented 1 year ago

@Merovius

That points out another problem, namely that defining methods is sometimes impossible because the type parameters don't propogate fully:

type Result[T any] interface {
  Success[T] | Error

  Or(T) Result[T]
  Must() T
}

// So far so good...
func (r Success[T]) Or(v T) Result[T] { return r }
func (r Success[T]) Must() T { return r.Val }

// Uh oh...
func (r Error) Or(v ???) Result[???] { return ??? }
func (r Error) Must() ??? { panic(r.E) }

You could fix this by attaching an otherwise unused [T any] to Error so that you can propogate the parameter through, but that's going to make instantiation annoying:

func SomethingThatCanFail() Result[Value] {
  if failure {
    // Need to manually specify even with a constructor.
    return NewError[Value](err)
  }
}
johnwarden commented 1 year ago

The reason to not permit ~T is that the current language would provide no mechanism for extracting the type of such a value. Given interface { ~int }, if I store a value of type myInt in that interface, then code in some other package would be unable to use a type assertion or type switch to get the value out of the interface type.

If I understand correctly, it's the proposed implementation and not the the proposal itself that would prevent this. In the current language, non-nil interface values always include a reference to the dynamic type, so type assertions or switches can be used to extract the type as long as that type is in the type set. However if as proposed a small code was used instead of a reference to a type, then the type would always have to be one of the finite set of declared types. Is this correct?

Merovius commented 1 year ago

@johnwarden No. The hindrance @ianlancetaylor is talking about is that you need to be able to lexically refer to a a type to type-assert on it. That is, to write x.(time.Duration), you first have to be able to write time.Duration and thus import time. That is, you have to statically know a type to type-assert on it. This has nothing to do with the implementation.

I think, on the contrary, the implementation of using "tags" is only possible as long as ~T is forbidden, FWIW. Because otherwise you'd have to carry the type as a pointer, so you can pass such an interface value to reflect and/or do a full type-assertion on it. That is, int64 and time.Duration must always have different representations when stored in an interface { ~int64 }.

Merovius commented 1 year ago

@ncruces Why do you need this proposal for that? i.e. why do people not do it today?

I mean, I guess I see the argument - that if it's good to take something as a parameter, returning it is also a bit good, because it makes this composition easier. But I think for the specific case of Result[T], I don't see a lot of use cases for accepting a Result[T] where that kind of safety is very useful, or offsets the incredible awkwardness of using it. So I'm not worried about people moving away from (T, error), is all I'm saying. It just seems like the worse mechanism with this proposal.

johnwarden commented 1 year ago

@johnwarden No. The hindrance @ianlancetaylor is talking about is that you need to be able to lexically refer to a a type to type-assert on it. That is, to write x.(time.Duration), you first have to be able to write time.Duration and thus import time. That is, you have to statically know a type to type-assert on it. This has nothing to do with the implementation.

Yes, I understand you can't actually extract the type of a value in Go except by using reflection. But I am trying to understand this comment from @ianlancetaylor where he refers to extracting the type of an interface value: "The reason to not permit ~T is that the current language would provide no mechanism for extracting the type of such a value". I assume he meant "find out" the type by using a switch or assertion.

I think, on the contrary, the implementation of using "tags" is only possible as long as ~T is forbidden, FWIW. Because otherwise you'd have to carry the type as a pointer, so you can pass such an interface value to reflect and/or do a full type-assertion on it. That is, int64 and time.Duration must always have different representations when stored in an interface { ~int64 }.

By "tags", do you mean using the short (e.g. one byte) code to indicate the type?

If so then it sounds like it is the proposed implementation, not the the proposal itself that would prevent use of ~T in type lists, correct?

Merovius commented 1 year ago

@johnwarden I don't understand. ISTM you are restating your questions, quoting my answers to them.

But I am trying to understand this comment from @ianlancetaylor where he refers to extracting the type of an interface value:

Yes. I'm pretty confident he refers to what I said: If you have func F(v ~int), then F has no way to "unpack" v by writing a type-assertion or type-switch, as the set of types v can have is infinite and unknown to F. So it makes little sense to have it as a value. It couldn't be used.

Or, to be more precise, one thing we could allow is make x.(~int64) on a time.Duration have type int64, instead of time.Duration. That wouldn't require the function to know the actual type. But that has other problems.

By "tags", do you mean using the short (e.g. one byte) code to indicate the type?

Yes.

If so then it sounds like it is the proposed implementation, not the the proposal itself that would prevent use of ~T in type lists, correct?

No. The issue is the semantic one I explained above, which has nothing to do with the implementation.

The proposed implementation is to use a plain interface, as it is represented now. And that wouldn't stand in the way at all. The proposal also says that in some cases we might choose a different implementation as an optimization and yes, that would stand in the way. But it doesn't matter that it would, because the actual semantic problem is far more important. Anything else would only be an optimization, it wouldn't prevent us from doing it.

johnwarden commented 1 year ago

@johnwarden I don't understand. ISTM you are restating your questions, quoting my answers to them.

Hmm, my apologies. I don't have a lot of experience contributing to this project, and I think there may be some fundamental concept I am missing. It may not be worth you trying to explain it to me and I would not be offended if you don't respond. But I will take one more shot.

But I am trying to understand this comment from @ianlancetaylor where he refers to extracting the type of an interface value:

Yes. I'm pretty confident he refers to what I said: If you have func F(v ~int), then F has no way to "unpack" v by writing a type-assertion or type-switch, as the set of types v can have is infinite and unknown to F. So it makes little sense to have it as a value. It couldn't be used.

I am confused by func F(v ~int) here because that is not correct Go code. Are you suggesting Func F(v interface{~int}) would not make sense, because the set of types v can have is unknown to F?

But interface always have infinite type sets. So a far as I see interface{~int} would be no different. If you wanted to get at the dynamic type, you would need to do a type switch or assertion on a known type.

So for example you could do this:

type IntLike interface {~int64}

func F(x IntLike) {
    switch x.(type) {
    case int64:
      fmt.Println("int64")
    case time.Duration:
      fmt.Println("time.Duration")
    }
}

F(time.Seconds)
    // Output: time.Duration

Or, to be more precise, one thing we could allow is make x.(~int64) on a time.Duration have type int64, instead of time.Duration. That wouldn't require the function to know the actual type. But that has other problems.

Yeah allowing ~T in type switches seems problematic because ~T defines an infinite set of types. But I don't see why it would be necessary to allow ~T in type switches.

Merovius commented 1 year ago

Are you suggesting Func F(v interface{~int}) would not make sense, because the set of types v can have is unknown to F?

Correct.

But interface always have infinite type sets. So a far as I see interface{~int} would be no different.

The difference is that you can do something useful with an interface without knowing its dynamic type: call its methods. You can'd do anything useful with an interface{ ~int } (except pass it on). interface{ ~int } is semantically equivalent to any (well, not completely, but for the most part. You can also do binary operations with integer literals. But even that goes away once you add more terms to the union).

ncruces commented 1 year ago

@Merovius

The current, idiomatic, way of returning an optional T is (T, bool). The way of accepting it is, probably, if any, *T. If you want to plug the two, there's work to do. If the idiom were Option[T] plugging would be easier.

With errors, the idiom is to return (T, error), with no real idiom to accept. Result[T] would change this.

As for use cases, no pressing need. But I imagine stream processing, reactive programming, some functional idioms would probably benefit. Overall, I don't expect the Go idioms to change much if at all. But in those niches/corners they might?

Merovius commented 1 year ago

@ncruces Maybe. If I thought they might, I'd count that as a reason against this proposal. It would make for bad code and I don't like making it too tempting to write bad code.

[edit] Actually:

With errors, the idiom is to return (T, error), with no real idiom to accept. Result[T] would change this.

ISTM if you need an idiom, accepting (T, error) would work far better (as it's already established in existing corpora). And I think that's what current attempts in the fields you mention are doing.[/edit]

johnwarden commented 1 year ago

The difference is that you can do something useful with an interface without knowing its dynamic type: call its methods. You can'd do anything useful with an interface{ ~int } (except pass it on). interface{ ~int } is semantically equivalent to any (well, not completely, but for the most part. You can also do binary operations with integer literals. But even that goes away once you add more terms to the union).

Yes, I see what you are saying. interface{ ~int } might not be very useful. One thing you can do with it, however, is ~coerce~ convert it to an int (because one thing the compiler knows about everything in its typeset is that it is ~coercable~ convertible to int). So this should be possible.

type IntLike interface { ~int }
var x IntLike = int(2)
var y = int(x)

I had forgotten that ~T can be shorthand for interface{~T}. So in that case, since interfaces are allowed in type switches, it seems reasonable that ~T should be too. ~T would match anything that implemented the interface (e.g. anything with an underlying type of T), and a value of that type could be ~coerced~ converted into T.

I think it would follow that the meaning of ~T in type switches would be pretty well defined. For example:

type Num = interface { ~int64 | ~float64 }

func F(x Num) {

  switch y := x.(type) {
    case ~int:
      // here y has type interface{ ~int }
      fmt.Println("x is in an ~int64:", int64(y))
    case ~float64:
      // here y has type interface{ ~float64 }
      fmt.Println("x is in a ~float64:", float64(y))     
  }
}

F(2 * time.Seconds)
// Output: x is an ~int64: 2000000000
Merovius commented 1 year ago

One thing you can do with it, however, is coerce it to an int (because one thing the compiler knows about everything in its typeset is that it is coercable to int).

But that falls apart as soon as you have interface{ ~int | ~string }. So it would only really help if you'd have exactly one type which seems an oddly restrictive special case when talking about unions.

~T would match anything that implemented the interface (e.g. anything with an underlying type of T), and a value of that type could be coerced into T.

That's what I mentioned here, where I also linked to this comment explaining that this has problems as well.

atdiar commented 1 year ago

@johnwarden I tend to agree if we substitute coercion with conversion which might have been what you meant.

AndrewHarrisSPU commented 1 year ago

@ncruces

Overall, I don't expect the Go idioms to change much if at all. But in those niches/corners they might?

I don't think Option[T], Result[T] are likely to do much, the arity of the cases really are covered by multiple returns.

I might speculate that there's more utility when considering sum types with larger arity of variants, when business logic has more than a few variants that are nearly synonyms of zero (e.g. how missing, none, n/a, undefined can be meaningfully different than zero or nil), or more than a few variants that are synonyms of "a specific error occurred and should be dealt with specifically".

johnwarden commented 1 year ago

I tend to agree if we substitute coercion with conversion which might have been what you meant.

@atdiar oh yes "conversion" is what I should have said. Thank you.

@Merovius Sorry for the confusion on my part, I think I do now understand what you are trying to explain, and that it has nothing to do with implementation.

But that falls apart as soon as you have interface{ ~int | ~string }. So it would only really help if you'd have exactly one type which seems an oddly restrictive special case when talking about unions.

Yes, and I see that the issues with this restrictive case in type switches were discussed by Ian in this comment, as you pointed out.

But I think the answers to some of his questions follow naturally from the rules for interfaces in type switches: no special rules would need to be defined. In fact ~T syntax would not even need to be supported in type switches:

func hash64Bit(x interface { ~int64 | ~float64 | ~complex64 } ) int64 {     
    v := switch x.(type) {
        case interface { ~int64 }:
            return hashInt64(int64(v))
        case interface { ~float64 }:
            return hashFloat64(float64(v))
        case interface { ~complex64 }:
            return hashComplex64(complex64(v))
    }
}

This may have limited usefulness -- this is definitely a contrived example. But is this limited usefulness reason to maintain different rules for interfaces used as type constraints and those used as regular interface types? The exception is often as costly as the rule; having consistency here would have benefits such as make the language easier to learn.

But as Ian also mentioned in this comment:

It's quite possible that these questions can be answered, but it's not just outside the scope of this proposal, it's actually complicated.

So I see that perhaps this discussion is off-topic, since this proposal explicitly excludes the possibility of interface { ~T }.

johnwarden commented 1 year ago

@ncruces @AndrewHarrisSPU

Overall, I don't expect the Go idioms to change much if at all. But in those niches/corners they might?

I don't think Option[T], Result[T] are likely to do much, the arity of the cases really are covered by multiple returns.

Optional values are often included in data structures as well, especially when unmarshaling data from external data sources such as json, protobufs, and databases. While *T is common here, another common idiom is something like struct{Valid int, Value T} (for example sql.NullInt64), which avoids the memory allocation ~and safety issues of pointers~.

If this proposal were implemented, I suspect an Option[T] defined as follows would often be preferred to either of these:

type Option[T] interface{ T }

This would avoid memory allocations if the T value is stored directly ~, and would be safer than a pointer. It would also be safer than a struct{Valid int, Value T} because the value could not be accessed without first checking that it was valid~. The possible advantage over struct{Valid int, Value T} would be that if the value was accessed without first checking that it was valid, it would panic, instead of silently returning a zero value.

For example:

func f(optionalV Option[float64]) float64 {
    // return f*2 
    // not allowed: * operator not defined on Option[float64]

    if v, ok := optionalV.(float64); ok {
        return f*2
    }
    return 0
}

EDIT: as @Merovius points out there are actually no safety benefits here, since x := optionalV.(float64) can panic.

Merovius commented 1 year ago

This would avoid memory allocations if the T value is stored directly, and would be safer than a pointer.

I don't believe the value T can commonly be stored directly, for the same reason we can't do it in a regular interface. I also don't believe there is anything "safer" about an Option[T] than a *T, unless you add significant extra machinery to the language. In this particular case (your type Option[T] interface{ T }, which to be clear isn't actually valid - you can't use a type parameter as an interface element directly) they are exactly equivalent in terms of safety. You have to spell the check the same - x == nil - and in both cases the code will panic if you forget it.

Under this proposal, Option[T] offers no safety guarantees whatsoever over a *T.

because the value could not be accessed without first checking that it was valid. For example:

Of course it could. You can write x := optionalV.(float64). There is no safety difference between that and spelling it *x. And there is no meaningful difference between remembering to spell it if x, ok := optionalV.(float64); ok and spelling it if x != nil.

DeedleFake commented 1 year ago

Under this proposal, Option[T] offers no safety guarantees whatsoever over a *T.

It does a little:

type Value struct {
  S string
}

type ViaPointer struct {
  Val *Value
}

type ViaOption struct {
  Val Option[T]
}

func P(v ViaPointer) string {
  // Will panic if Val is nil.
  return fmt.Sprintf("Value: %q", v.Val.S)
}

func O(v ViaOption) string {
  // Compile-time error without explicit check.
  return fmt.Sprintf("Value: %q", v.Val.S)
}

Option[T] also makes the case of an optional pointer nicer. Double pointers feel weird to me.

johnwarden commented 1 year ago

This would avoid memory allocations if the T value is stored directly, and would be safer than a pointer.

I don't believe the value T can commonly be stored directly, for the same reason we can't do it in a regular interface.

In the proposal, Ian mentions that in some cases values could be stored directly, rather than boxed. This is impossible now because the exact type and size of an interface value's dynamic value can't be known at compile time. There may be other reasons, but I would guess interface { int64 } would be one case where the value could be stored directly.

Under this proposal, Option[T] offers no safety guarantees whatsoever over a *T.

because the value could not be accessed without first checking that it was valid. For example:

Of course it could. You can write x := optionalV.(float64). There is no safety difference between that and spelling it *x. And there is no meaningful difference between remembering to spell it if x, ok := optionalV.(float64); ok and spelling it if x != nil.

Oh yes you are absolutely right, I hadn't thought that through.

leaxoy commented 1 year ago

The proposal above illustrates how to add sum type (aka: union), but tagged union is more powerful and useful (which allows multiple occurrences of a type, use tags to distinguish different variants).

Real world designs: https://doc.rust-lang.org/book/ch06-01-defining-an-enum.html https://docs.swift.org/swift-book/LanguageGuide/Enumerations.html

DmitriyMV commented 1 year ago

@leaxoy sigma types issue is probably what you are looking for.

jimmyfrasche commented 1 year ago

Unlike most proposals, there are downsides to not accepting this one:

  1. union elements in interfaces can only ever be used in constraints
  2. if, hypothetically, there were a separate mechanism for sum or union types, it would be confusing that there is also this very similar mechanism that's unrelated

I think the second situation is unlikely. The bar would be much higher than any other language change, which is already pretty high.

I dislike the first situation. There are many uses for union types even if they have downsides compared to other more theoretically pure alternatives.

leaxoy commented 1 year ago

How about introduce new keyword enum or union and make it cannot be nil, just like struct. Nil in sum type is a big challenge.

zephyrtronium commented 1 year ago

@leaxoy Introducing a new keyword is not backward-compatible, because any code today using enum or union as variable names will cease to compile. We would also need to decide on what "it cannot be nil" actually means, because every type in Go must have a zero value. #19412 contains a great deal of discussion on this.

leaxoy commented 1 year ago

Perhaps this is a trade-off, although introducing new keywords breaks some compatibility, but handling nil is also tricky. Nil takes on too many features in go.

But after #56986 and #57001 and #60078, is there a mature way to introduce new features.

Merovius commented 1 year ago

FWIW this issue is specifically about using the existing syntax, because it seems dubious to have two different syntactical constructs to mean very similar things. Being able to reuse that syntax was, in fact, one of the (minor) arguments for introducing it to begin with.

ydnar commented 1 year ago

The proposal above illustrates how to add sum type (aka: union), but tagged union is more powerful and useful (which allows multiple occurrences of a type, use tags to distinguish different variants).

Real world designs: https://doc.rust-lang.org/book/ch06-01-defining-an-enum.html https://docs.swift.org/swift-book/LanguageGuide/Enumerations.html

This proposal can support tagged unions via specialized types:

type All struct{}

type None struct {}

type Some []string

type Filter interface {
    All | None | Some
}

func Select(f Filter) ([]string, error) {
    // ...
}

Empty struct values could be optimized away:

As an implementation note, we could in some cases use a different implementation for interfaces with an embedded union type. We could use a small code, typically a single byte, to indicate the type stored in the interface, with a zero indicating nil. We could store the values directly, rather than boxed.

Edit: "multiple values of a type:"

type Width uint32
type Height uint32
type Weight uint32

type Dimension interface {
    Width | Height | Weight
}

func f(d Dimension) ...

_ = f(Width(10))
_ = f(Height(20))
ydnar commented 1 year ago

In particular, the zero value of an interface type with an embedded union would be nil, just as for any interface type. So this is a form of sum type in which there is always another possible option, namely nil. Sum types in most languages do not work this way, and this may be a reason to not add this functionality to Go.

@ianlancetaylor would you consider a form that would disallow nil?

Some hypothetical syntax:

type I1 interface! {
    int | float64
}
  1. What would the zero value be? The zero value of the first type in the union?
  2. How would the compiler enforce assignment to variables of type I1?
zephyrtronium commented 1 year ago

@ydnar The question of "what would the zero value be" is exactly the one that needs to be answered for that kind of proposal. Syntax aside, the concept of non-nillable interfaces or sum types has been suggested many times between here, #19412, and other proposals. None of them have answered the zero value question in a way that satisfies even a majority of people (including those that have tried the answer "that of the first type in the union").

DeedleFake commented 1 year ago

Sum types in most languages do not work this way

That's true, but I think most languages also don't have a concept of zero values in the way that Go does. Rust, for example, requires all values of any type to always be explicitly set to something, even if it's a default value.

Random thought that might be terrible: What if Go did allow nil sum interfaces but with an unusual caveat: A variable of a sum interface type can't be set to nil manually. In other words, a declared but unassigned sum interface variable would be nil, but once changed it would be impossible to get it back to being nil, with a runtime check on assignment to make sure that an assignment of a dynamic value doesn't do that, either. Then it'll have a useful zero value, but one with minimal impact.

tinne26 commented 1 year ago

A variable of a sum interface type can't be set to nil manually.

You could still set values to nil with:

type Bits64 interface { int64 | uint64 | float64 }
type Dummy struct { Field64 Bits64 }
func (d *Dummy) SetField64(value Bits64) {
    d.Field64 = value
}
func main() {
    var d1, d2 Dummy
    var value uint64
    d1.SetField64(value) // d1.Field64 stops being nil
    d1.SetField64(d2.Field64) // d1.Field64 is nil again...
}

To prevent this you would have allow only the concrete types listed in the sum type be assignable to sum type variables. Which is way, way much more restrictive. But yeah, it may be the only real alternative to fully nilable sum types if we want to make them be based on interfaces.

Merovius commented 1 year ago

@DeedleFake I don't believe that would be practically feasible. x := make([]T, 42) is an "assignment of zero values", so would have to panic - and, less obviously, x = append(x, someNonNilThing) - so we couldn't have slices. _, ok := m[x] as well, so you couldn't check if an element is in a map. x, ok := <-ch would be an assignment of a zero value, if the channel is closed, so you couldn't use that as a select-case. Etc. We would have to exhaustively list exceptions to that rule of "assigning a zero value is a runtime check and causes a panic" to make this work. It's not practical.

And without that runtime check, there is no real benefit, because you still have to code against the possibility of it being nil.

@tinne26 Under @DeedleFake's suggestion, that code would panic, because passing d2.Field64 as an argument is an assignment (from the language POV) so it would panic, as it is nil.

gophun commented 1 year ago

including those that have tried the answer "that of the first type in the union"

I don't remember, what was the problem with it? It's a straightforward rule that can be easily explained. It also appears to be intuitive, as it's the default response for most proposers when posed with the question.

Merovius commented 1 year ago

@gophun Currently, | in an interface is commutative - interface{ a | b } and interface{ b | a } mean the same thing. That would no longer be the case. There's also the question of what would happen with something like interface{ a | b ; b | a }. "first" is sometimes not super straight forward.

gophun commented 1 year ago

@Merovius Yes, it would be awkward if interface is reused for union types. But if we made them separate (adding a keyword is not completely ruled out) with the "zero value based on first type" rule:

type foo union { a | b }

Then these

interface { a | b }
interface { a | b ; b | a }

could be short form for:

interface { union{ a | b } }
interface { union{ a | b } ; union{ b | a } }

Here the order wouldn't matter.

Merovius commented 1 year ago

@gophun This issue is about using general interfaces for unions. #19412 is about other options - which each have their own set of problems, but that discussion doesn't belong here. And FWIW, adding a union keyword like you suggest has been discussed over there at length.

gophun commented 1 year ago

@Merovius Thank you for the pointer; I'll take it over there. The keyword option was criticized solely on the grounds of being "not backwards compatible," a stance that has been clearly contradicted by the Go project lead.

Merovius commented 1 year ago

The keyword option was criticized solely on the grounds of being "not backwards compatible,"

That is not true. But again, that discussion doesn't belong here.

arvidfm commented 1 year ago

Would this proposal also allow for something like this?

type MyType[T constraints.Integer | constraints.Float | string | []byte] struct {
    Value T
}

func (m *MyType[T]) UnmarshalText(data []byte) (err error) {
    switch v := (&m.Value).(type) {
    case *constraints.Signed:
        err = parseInt(data, v)
    case *constraints.Unsigned:
        err = parseUint(data, v)
    case *constraints.Float:
        err = parseFloat(data, v)
    case *string | *[]byte:
        // ideally we could get the concrete type of v
        // in the context of this case as some type name *U,
        // so we could even do: *v = U(data)
        parseString(data, v)
    }
    return err
}

func parseInt[T constraints.Signed](data []byte, v *T) error {
    i, err := strconv.ParseInt(string(data), 10, 64)
    *v = T(i)
    return err
}

func parseUint[T constraints.Unsigned](data []byte, v *T) error {
    u, err := strconv.ParseUint(string(data), 10, 64)
    *v = T(u)
    return err
}

func parseFloat[T constraints.Float](data []byte, v *T) error {
    f, err := strconv.ParseFloat(string(data), 64)
    *v = T(f)
    return err
}

func parseString[T string | []byte](data []byte, v *T) {
    *v = T(data)
}

Because the below is a pattern I find myself using frequently at the moment, and it would be amazing to be able to replace it with something more compact like the above.

func (m *MyType[T]) UnmarshalText(data []byte) (err error) {
    switch v := any(&m.Value).(type) { // note having to cast to any first...
    case *int:
        err = parseInt(data, v)
    case *int8:
        err = parseInt(data, v)
    case *int16:
        err = parseInt(data, v)
    case *int32:
        err = parseInt(data, v)
    case *int64:
        err = parseInt(data, v)
    case *uint:
        err = parseUint(data, v)
    case *uint8:
        err = parseUint(data, v)
    case *uint16:
        err = parseUint(data, v)
    case *uint32:
        err = parseUint(data, v)
    case *uint64:
        err = parseUint(data, v)
    case *uintptr:
        err = parseUint(data, v)
    case *float32:
        err = parseFloat(data, v)
    case *float64:
        err = parseFloat(data, v)
    case *string:
        *v = string(data)
    case *[]byte:
        *v = data
    }
    return err
}

(Note that case *int, *int8, *int16, *int32, *int64 isn't useful here, since this would coerce the type of v to any, so you couldn't pass it to any of the specialised parser functions.)

zephyrtronium commented 1 year ago

It would not, for a couple reasons:

  1. The type parameter T is set at compile time. &m.Value is a *T where T is one of the elements of constraints.Integer, constraints.Float, string, or []byte, not a sum type. This proposal is about allowing run-time values of interface types that contain union elements; it is mostly orthogonal to type parameters.
  2. constraints.Signed &c. use ~T elements. This proposal does not allow values of interface types when those interfaces contain ~T elements.

I think what you want is #45380.

zephyrtronium commented 1 year ago

Seeing this example, it occurs to me that https://github.com/golang/go/issues/57644#issuecomment-1373901347 actually seems to be wrong. Consider these definitions:

type bytestring interface {
    string | []byte
}

func f[T bytestring]() {}

Type bytestring itself can instantiate f if bytestring satisfies bytestring, which it does if bytestring implements bytestring. Since bytestring is an interface, it implements any interface of which its type set is a subset, which trivially includes itself. Therefore f[bytestring] is a legal instantiation.

So, it seems that we need additional adjustments to the spec to make interfaces with union elements legal. Otherwise every type constraint which includes a union element and no ~T terms gains a non-empty set of members, all of interface type, which will be illegal in almost every case.

zephyrtronium commented 11 months ago

Triple posting aside, discussion on #48522 prompted me to think about what "additional adjustments to the spec" we would actually need for the proposed union types to not break existing code.

My initial thought was "interfaces containing union elements cannot satisfy the interfaces they implement." That would prevent bytestring above from instantiating functions with any constraints, which seems obviously a non-starter.

The minimal condition, in the sense of allowing unions to satisfy the most constraints, would be that they satisfy any constraint of which their operation set is a superset. Being interfaces, the operations they bring are comparison, type assertion, and their methods. Type assertion is mostly irrelevant due to the rules about interfaces in union elements. So, it seems like we could get away with a rule like, "an interface T containing union elements with no ~U terms satisfies constraints that are basic interfaces that T implements, as well as constraints that can be written in the form interface { comparable; E } where E is a basic interface that T implements."

The question then becomes where that rule leaves us with covariance. With this rule, and returning to the definitions above, can we write func g[T bytestring]() { f[T]() }? Since type parameters are interfaces underneath, I think the answer is no. We need more precision to handle type parameters. We end up with a definition of "satisfies" that looks, in total, something like:


A type T satisfies a constraint C if:


I find this definition hard to follow compared to the current one, but I think it does everything we need for this proposal.

mikeschinkel commented 5 months ago

I have been reading this thanks to @Merovius linking it from the [go-nuts] list.

Seems to me the biggest argument against interfaces-as-sum-types is over zero values, i.e. that Go fundamentally requires zero values and that can't change, there is no consensus on how to arrive at a zero value for these sum types, and with others wanting sum types to not have zero values as they see zero values conflicting with the benefits they see sum-types providing. IOW, a classic catch-22.

If I understanding this wrong, please let me know.

I think it would be great if this could become a feature of Go so I considered that catch-22 in hopes to resolve it and came up with something I think could work.

The first aspect would be to require that these sum types not be able to be instantiated without providing an explicit value. That would be mean some of the following would throw a compiler error:

type Identifier interface {
   int | string 
}
var widgetId Identifier                     // throws compile error
widgetId := Identifier(1)                   // compiles fine
widgetIds := make([]Identifier,3)           // throws compile error 
widgetIds := []Identifier{                  // compiles fine
   Identifier(123), 
   Identifier("happy"), 
   Identifier(456),
}

Unless I miss some way in which a property can get a zero value, the above limitation would be sufficient to ensure that a sum type never had an opportunity to have a zero value (I ignored in my example returning an uninitialized value from a func but let's assume that is disallowed to.)

If simply disallowing sum types from being created if not initialized is not sufficient — because someone might use CGo or some other edge case to create an uninitialized sum type — then we would need a real zero value. That is where IMO reconsidering the untyped builtin zero (#61372) could fit in. A sum type could have a zero value of just zero and could not be otherwise be represented.

So if it comes to pass that a variable or expression of type of a sum type has a zero value then using that variable or expression for anything other than assignment of a non-zero value or checking if it is equal to zero would be a runtime error generating a panic. Since in normal cases it should never have a zero value then having a zero value occuring would truly be an exceptional case indicating an error somewhere else and code, and thus deserving of a panic.

The zero value could be represented internally exactly as an interface containing nil is represented, but for a sum type st then st==nil would always be false and st==zero would only ever be true for an exceptional case that should never happen for normal use cases.

The only real downside I see to this approach would be that you could not pre-create a slice or map with any elements using make().

However, if for our 3rd aspect we allowed extending make() to recognize what I will call an "initializer" then make could initialize sum types to a default value. Consider setting the value of a slice of ten Identifiers to be 0:

ids := make([]Identifier{0},10)

And this sets a slice of 25 Identifiers to be initialized to an empty string:

ids := make([]Identifier{""},25)

So there it is. Please feel free to poke holes in this approach when and if you find any.

P.S. We don't really even need zero to make this work, the zero value could still be nil but would be just as constrained as I described for zero. But having zero would be a nice distinction because then sum types would be the only type in Go with a zero value but no other representationt vs. having a nil that behaves differently for sum types than for other nillable types.

ianlancetaylor commented 5 months ago

@mikeschinkel Thanks. The idea of not permitting the type to be instantiated without a value has been suggested several times before in the various discussions of sum types. It has always been rejected. Zero values are built into the language too deeply. For one example--and just for one example, there are other problems--how do you handle a type assertion to the sum type if the type assertion fails? What value do you give to the first result of the type assertion?

mikeschinkel commented 5 months ago

@ianlancetaylor — Thank you for acknowledging.

I see your perspective in how my suggestion also results in a zero values concern.

how do you handle a type assertion to the sum type if the type assertion fails? What value do you give to the first result of the type assertion?

I expect you meant that as a rhetorical question, but since you posed the question I hope you do not mind me at least answering it.

If a type assertion fails in an assignment to a variable of a sum type then the variable would get the value of zero (now it seems zero is required, after all), as the same rules I outlined above would apply. Doing anything with that variable besides assigning it a non-zero value or testing it for equality with zero should panic.

That seems reasonable to me, at least, because a failed type assertion is a failure so accessing the value of that variable is almost certainly a logic error anyway. Right?

That scenario does bring up a question of whether or not a zero-valued sum-type could be passed to a function, and I could go either way on that. Seems less than reasonable to disallow it, but then the panic would occur elsewhere compared to where the the panic was caused.

I do respect that you and others may view those constraints as not what Go should be, and I will be accepting of that if it is the final ruling.

However, AFAICT, I still think the logic of my suggestion is valid, unless there is some other scenario that emerges that cannot be resolved in the same way as for failed type assertions. 🤷‍♂️

Merovius commented 5 months ago

@mikeschinkel IMO your suggestion is now back to the point where every sum type has a zero value of nil. It doesn't really matter if it is spelled nil or zero, as far as the contentious questions are concerned, as long as it the semantics are the same. Which it seems they mostly are.

That doesn't mean the suggestion isn't viable, it's just that it doesn't differ significantly from what we have been talking about so far.

To me, that means FWIW that there is no need to disallow make etc. either, because if there is any way to create an invalid zero value, usage of a variant needs to be prepared to deal with it. If it needs to be prepared to deal with it, might as well keep more coherency in the language and not treat them specially at the point of creation.

That scenario does bring up a question of whether or not a zero-valued sum-type could be passed to a function, and I could go either way on that.

This has been discussed above as well. It seems hard to impossible to me to disallow it, without drastical changes to Go's type system. Whether or not a variable is zero is no longer a static property but a runtime property and trying to make static assertions about those tends to pretty quickly devolve into solving the halting problem. See also the various suggestions over the year to disallow dereferencing nil-pointers statically.

mikeschinkel commented 5 months ago

@Merovius —

"IMO your suggestion is now back to the point where every sum type has a zero value of nil. It doesn't really matter if it is spelled nil or zero, as far as the contentious questions are concerned, as long as it the semantics are the same."

Admittedly there is not much difference, but there is one tangible difference; consistency.

If we allowed that sum types could just be nil then there would be the contentious question of consistency; i.e. that some variables that can contain nil would be handled differently than others. Using zero instead of nil removes that one objection, which is why I suggested it.

But yes, that is the only difference, however it could be the difference between someone objecting to sum types vs. supporting them. What percentage of people who would do each remains to be seen.

"That doesn't mean the suggestion isn't viable, it's just that it doesn't differ significantly from what we have been talking about so far."

Yes, and the difference is that the combination of things — at least on this issue — have not been discussed prior AFAICT.

"To me, that means FWIW that there is no need to disallow make etc."

With the addition of initializers in my suggestion there is no reason to disallow make() either, but I get that is likely orthogonal to your point.

"because if there is any way to create an invalid zero value, usage of a variant needs to be prepared to deal with it. If it needs to be prepared to deal with"

Yes, and that is where we disagree. My suggestion proposes making zero an exceptional case such that the vast majority of code would safely not deal with it because existence of a zero value and subsequent use would in itself be an exceptional case worthy of an immediate panic.

"might as well keep more coherency in the language and not treat them specially at the point of creation."

From a purity standpoint you are probably correct. But my understanding of Go's nature is that they have historically placed emphasis on pragmatism over purity. Otherwise there would have been no append(), copy(), etc.

Respecting the existing nature of the Go language, I am arguing that since it is impossible to find a perfect solution, maybe instead we could be pragmatic and accept a really good one?

The reason I think this approach could work is because there is only one way thus far I have discovered thus far that a sum type variable could have a zero value during non-exceptional coding practices, that one way is itself a test for validity so the compiler could ensure that it is not misused if we limit to not allowing them to be passed to funcs when they have zero values (your questioning has now convinced me this is the right approach.)

_"Whether or not a variable is zero is no longer a static property but a runtime property and trying to make static assertions about those tends to pretty quickly devolve into solving the halting problem._"

Unless I am missing something, it would seem easy to determine if a variable is used without ok being checked, for our one known non-exceptional case. That is a highly constrained problem vs trying to determine if an arbitrary program will run forever. We would have to limit to checking ok and not allowing things like if ok || myStatus()=1 {...} but that feels like a reasonable limitation — since a simple restructure of that if statement would create a knowable construct — given the value offered by a workable sum type.

So I ask this: rather than discuss in abstract terms, can you or others identify places where the compiler could not easily identify when a sum type variable received a value of zero and thus not be able to disallow it?

mrwonko commented 5 months ago

For one example--and just for one example, there are other problems--how do you handle a type assertion to the sum type if the type assertion fails?

If we had sum types, we could return an Optional sum type. So something like

var typeAsserted optional[MySumType] = nothing
typeAsserted, _ = anything.(MySumType)

(In practice, you would usually not declare the destination separately, I just did it to highlight its type.) I understand this would be somewhat inconsistent with non-sumtype-type-assertions, but it sidesteps the zero-issue.

But if we can generalize it so nothing=zero=nil, maybe we can treat every type assertion as returning an optional, and optionals are implicitly convertible to zero values where available?

Merovius commented 5 months ago

because there is only one way thus far I have discovered thus far that a sum type variable could have a zero value during non-exceptional coding practices

Ian wrote

For one example--and just for one example, there are other problems--how do you handle a type assertion to the sum type if the type assertion fails? What value do you give to the first result of the type assertion?

I feel like that should have made clear that this was just one example. As far as I can tell, to name a few others, you have not yet talked about channel-receives, map-accesses, reflect.New, extra capacity allocated by append, the statement var x T when T is a type-parameter (and any other statically disallowed code for these specific types), named returns (in particular in the presence of panic) or clear on a slice. There might be others.

I'll also note that the suggestion to disallow uninitialized values came up in this discussion before and most of this list has been posted there as well.

And while I appreciate that it is frustrating to be told that something you see as an easy solution is unworkable, I'd also ask for a little bit of trust that when people like Ian or I say things like "Zero values are built into the language too deeply", it's not just an off-the-cuff remark. We wouldn't say that, if we saw a realistic way to make it work.

In particular, listing instances of where zero values are mentioned in the spec is not meant as a request to special case solutions to them, but as a demonstration of what we mean when we say "zero values are built into the language too deeply".

icholy commented 5 months ago

This might be a bit off-topic, but have zero value semantics like this been discussed?:

func main() {
    var m map[string]string

    assert(m == zero)
    assert(m == nil)

    m = nil

    assert(m != zero)
    assert(m == nil)

    m = zero

    assert(m == zero)
    assert(m == nil)
}

func assert(b bool) {
    if !b {
        panic("assertion failed")
    }
}