Open DemiMarie opened 7 years ago
The name/syntax is largely irrelevant. Pick would be fine.
Either proposal in this thread fits the set theoretic definition.
The first type being special for the zero value is irrelevant since type theoretic sums commute, so the order is irrelevant (A + B = B + A). My proposal maintains that property, but product types also commute in theory and are considered different in practice by most languages (Go included,) so it's probably not essential.
@jimmyfrasche
I personally believe that disallowing interfaces as 'pick' members is a very big drawback. First, it would completely defeat one of the great use cases of 'pick' types - having an error be one of the members. Or you want to deal with a pick type that has either an io.Reader or a string, if you don't want to force the user to use a StringReader beforehand. But all in all, an interface is just another type, and I believe there shouldn't be type restrictions for 'pick' members. That being the case, if a pick type has 2 interface members, where one is fully enclosed by the other, that should be a compile-time error, as previously mentioned.
What I do like from your counter proposal is the fact that methods can be defined on the pick type. I don't think that it should provide a cross section of members' methods, since I don't think there would be a lot of cases where any methods would belong to all members (and you have interfaces for that anyway). And an exhaustive switch + default case is a very good idea.
@rogpeppe @jimmyfrasche Something I don't see in your proposals is why we should do this. There is a clear disadvantage to adding a new kind of type: it's a new concept that everybody who leans Go will have to learn. What is the compensating advantage? In particular, what does the new kind of type give us that we don't get from interface types?
@ianlancetaylor Robert summarized it well here: https://github.com/golang/go/issues/19412#issuecomment-288608089
@ianlancetaylor At the end of the day, it makes code more readable, and that is the prime directive of Go. Consider json.Token, it's currently defined as an interface{}, however the documentation states that it can actually be only one of specific number of types. If, on the other hand, it's written as
type Token Delim | bool | float64 | Number | string | nil
The user will be able to immediately see all the possibilities, and the tooling would be able to create an exhaustive switch automatically. Furthermore, the compiler will prevent you from sticking an unexpected type in there as well.
At the end of the day, it makes code more readable, and that is the prime directive of Go.
More features means one has to know more to understand the code. For a person of an average-only knowledge of a language its readability is necessarily inversely proportional to the number of [newly added] features.
@cznic
More features means one has to know more to understand the code.
Not always. If you can substitute "knowing more about the language" for "knowing more about poorly- or inconsistently-documented invariants in the code", that can still be a net win. (That is, global knowledge can displace the need for local knowledge.)
If better compile-time type checking is indeed the only benefit, then we can get a very similar benefit without changing the language by introducing a comment checked by vet. Something like
//vet:types Delim | bool | float64 | Number | string | nil
type Token interface{}
Now, we don't currently have any kind of vet comments so this is not an entirely serious suggestion. But I'm serious about the basic idea: if the only advantage we get is something that we can do entirely with a static analysis tool, is it really worth adding a complex new concept to the language proper?
Many, perhaps all, of the tests done by cmd/vet could be added to the language, in the sense that they could be checked by the compiler rather than by a separate static analysis tool. But for various reasons we find it useful to separate vet from the compiler. Why does this concept fall on the language side rather than the vet side?
@ianlancetaylor re checked comments: https://github.com/BurntSushi/go-sumtype
@ianlancetaylor as far as whether the change is justified, I've been actively ignoring that—or rather pushing it back. Talking about it in the abstract is vague and doesn't help me: it all sounds like "good things are good and bad things are bad" to me. I wanted to get an idea of what the type would actually be—what it's limitations are, what implications it has, what are the pros, what are the cons—so I could see how it would fit with in the language (or not!) and have an idea of how I would/could use it in programs. I think I have a good idea of what sum types would have to mean in Go now, at least from my perspective. I'm not entirely convinced they're worth it (even if I want 'em real bad), but now that I have something solid to analyze with well defined properties that I can reason about. I know that's not really an answer, per se, but it's where I'm at with this, at least.
If better compile-time type checking is indeed the only benefit, then we can get a very similar benefit without changing the language by introducing a comment checked by vet.
This is still vulnerable to the need-to-learn-new-things criticism. If I have to learn about those magic vet comments to debug/understand/use code, it's a mental tax, no matter whether we assign it to the Go-language budget or the technically-not-the-Go-language budget. If anything, magic comments are more costly because I didn't know I needed to learn them when I thought I learned the language.
@cznic I disagree. With your current assumption, you cannot be sure that a person would then understand what a channel is, or even what a function is. Yet these things exist in the language. And a new feature does not automatically mean that it would make the language harder. In this case, I would argue that it would in fact make it easier to understand, because it makes it immediately clear to the reader what a type is supposed to be, as opposed to using a black box interface{} type.
@ianlancetaylor I personally think this feature has more to do with making code easier to read and reason about. Compile time safety is a very nice feature, but not the main one. Not only would it make a type signature immediately more obvious, but its subsequent usage would also be easier to understand, and easier to write. People would no longer need to resort to panics if they receive a type that they didn't expect - that is the current behavior even in the standard library, thus though would have an easier time thinking about the usage, without being encumbered by the unknown. And I don't think it is a good idea to rely on comments and other tools (even if they are first-party) for this, because a cleaner syntax is more readable than such a comment. And comments are structureless, and much easier to mess up.
@ianlancetaylor
Why does this concept fall on the language side rather than the vet side?
You could apply that same question to any feature outside the turing-complete core, and arguably we don't want Go to be a "turing tarpit". On the other hand, we do have examples of languages that have shoved significant subsets of the actual language off into a generic "extension" syntax. (For example, "attributes" in Rust, C++, and GNU C.)
The main reason to put features in extensions or attributes instead of in a core language is to preserve syntax compatibility, including compatibility with tools that are not aware of the new feature. (Whether "compatibility with tools" actually works in practice depends strongly on what the feature actually does.)
In the context of Go, it seems like the main reason to put features in vet
is to implement changes which would not preserve Go 1 compatibility if applied to the language itself. I don't see that as an issue here.
One reason not to put features in vet
is if they need to be propagated during compilation. For example, if I write:
switch x := somepkg.SomeFunc().(type) {
…
}
will I get the proper warnings for types that aren't in the sum, across package boundaries? It's not obvious to me that vet
can do that deep a transitive analysis, so perhaps that's a reason it would need to go into the core language.
@dr2chase In general, of course, you are correct, but are you correct for this specific example? The code is completely comprehensible without knowing what the magic comment means. The magic comment doesn't change what the code does in any way. The error messages from vet should be clear.
@bcmills
Why does this concept fall on the language side rather than the vet side?
You could apply that same question to any feature outside the turing-complete core....
I don't agree. If the feature under discussion affects the compiled code, then there is an automatic argument in favor of it. In this case, the feature apparently does not affect the compiled code.
(And, yes, vet can parse the source of imported packages.)
I'm not trying to claim that my argument about vet is conclusive. But every language change starts from a negative position: a simple language is very very desirable, and a significant new feature like this inevitably makes the language more complex. You need strong arguments in favor of a language change. And from my perspective those strong arguments have not yet appeared. After all, we've thought about this issue for a long time, and it is a FAQ (https://golang.org/doc/faq#variant_types).
@ianlancetaylor
In this case, the feature apparently does not affect the compiled code.
I think that depends on the specific details? The "zero-value of the sum is the zero-value of the first type" behavior that @jimmyfrasche mentioned above (https://github.com/golang/go/issues/19412#issuecomment-289319916) certainly would.
@urandom I was writing up a long explanation of why interface and union types didn't mix without explicit type constructors, but then I realized there was a kind of sensible way to do that, so:
Quick and dirty counterproposal to my counterproposal. (Anything not explicitly mentioned is the same as my previous proposal). I'm not sure one proposal is better than the other, but this one allows interfaces and is all around more explicit:
The union has explicit "field names" hereafter called "tag names":
union { //or whatever
None, invalid struct{} //None is zero value
Good, Bad int
Err error //okay because it's explicitly named
}
There is still no embedding. It's always an error to have a type without a tag name.
Union values have a dynamic tag rather than a dynamic type.
Literal value creation: U{v}
is only valid if completely unambiguous, otherwise it has to be U{Tag: v}
.
Convertibility and assignment compatibility take tag names into account as well.
Assignment to a union is not magic. It always means assigning a compatible union value. To set the stored value, the desired tag name must be explicitly used: v.Good = 1
sets the dynamic tag to Good and the stored value to 1.
Accessing the stored value uses a tag assertion rather than a type assertion:
g := v.[Tag] //may panic
g, ok := v.[Tag] //no panic but could return zero-value, false
v.Tag is an error on the rhs since it's ambiguous.
Tag switches are like type switches, written switch v.[type]
, except that the cases are the tags of the union.
Type assertions hold with respect to the type of the dynamic tag. Type switches work similarly.
Given values a, b of some union type, a == b if their dynamic tags are the same and the stored value is the same.
Checking whether the stored value is some particular value requires a tag assertion.
If a tag name is unexported it can only be set and accessed in the package that defines the union. This means that a tag switch of a union with mixed exported and unexported tags can never be exhaustive outside the defining package without a default case. If all the tags are unexported it's a black box.
Reflection needs handle to the tag names as well.
e: Clarification for nested unions. Given
type U union {
A union {
A1 T1
A2 T2
}
B union {
B1 T3
B2 T4
}
}
var u U
The value of u is the dynamic tag A and the stored value is the anonymous union with dynamic tag A1 and its stored value is the zero value of T1.
u.B.B2 = returnsSomeT3()
is all that is necessary to switch u from the zero value, even though it moves from one of the nested unions to the other as it's all stored in one memory location. But
v := u.[A].[A2]
has two chances to panic since it tag asserts on two union values and the 2 valued version of tag assertion isn't available without splitting across multiple lines. Nested tag switches would be cleaner, in this case.
edit2: Clarification on type asserts.
Given
type U union {
Exported, unexported int
}
var u U
a type assert like u.(int)
is entirely reasonable. Within the defining package, that would always hold. However, if u
is outside the defining package u.(int)
would panic when the dynamic tag is unexported
to avoid leaking implementation details. Similarly for assertions to an interface type.
@ianlancetaylor Here are a few examples of how this feature would help:
At the heart of some packages (go/ast
for example) is one or more large sum types. It's hard to navigate these packages without understanding those types. More confusingly, sometimes a sum type is represented by an interface with methods (e.g. go/ast.Node
), other times by the empty interface (e.g. go/ast.Object.Decl
).
Compiling the protobuf oneof
feature to Go results in an unexported interface type whose only purpose is to make sure assignment to the oneof field is type-safe. That in turn requires generating a type for each branch of the oneof. Type literals for the end product are hard to read and write:
&sppb.Mutation{
Operation: &sppb.Mutation_Delete_{
Delete: &sppb.Mutation_Delete{
Table: m.table,
KeySet: keySetProto,
},
},
}
Some (though not all) oneofs could be expressed by sum types.
Sometimes a "maybe" type is exactly what one needs. For example, many Google API resource update operations permit a subset of the resource's fields to be changed. One natural way to express this in Go is by a variant of the resource struct with a "maybe" type for each field. For instance, the Google Cloud Storage ObjectAttrs resource looks like
type ObjectAttrs struct {
ContentType string
...
}
To support partial updates, the package also defines
type ObjectAttrsToUpdate struct {
ContentType optional.String
...
}
Where optional.String
looks like this (godoc):
// String is either a string or nil.
type String interface{}
This is both hard to explain and type-unsafe, but it turns out to be convenient in practice, because an ObjectAttrsToUpdate
literal looks exactly like an ObjectAttrs
literal, while encoding presence. I wish we could have written
type ObjectAttrsToUpdate struct {
ContentType string | nil
...
}
Many functions return (T, error)
with xor semantics (T is meaningful iff error is nil). Writing the return type as T | error
would clarify the semantics, increase safety, and provide more opportunities for composition. Even if we can't (for compatibility reasons) or don't want to change a function's return value, the sum type is still useful for carrying around that value, like writing it to a channel.
A go vet
annotation would admittedly help many of these cases, but not the ones where an anonymous type makes sense. I think if we had sum types we'd see a lot of
chan *Response | error
That type is short enough to write out multiple times.
@ianlancetaylor this probably isn't a great start, but here's everything you can do with unions that you can already do in Go1, because I figured it was only fair to acknowledge and summarize those arguments:
(Using my latest proposal with tags for the syntax/semantics below. Also assuming the emitted code is basically like the C code I posted much earlier in the thread.)
Sum types overlap with iota, pointers, and interfaces.
These two types are roughly equivalent:
type Stoplight union {
Green, Yellow, Red struct {}
}
func (s Stoplight) String() string {
switch s.[type] {
case Green: return "green" //etc
}
}
and
type Stoplight int
const (
Green Stoplight = iota
Yellow
Red
)
func (s Stoplight) String() string {
switch s {
case Green: return "green" //etc
}
}
The compiler would likely emit exactly the same code for both.
In the union version, the int is turned into a hidden implementation detail. With the iota version you can ask what Yellow/Red is or set a Stoplight value to -42 is, but not the with the union version—those are all compiler errors and invariants that can be taken into account during optimization. Similarly, you can write a (value) switch that fails to account for Yellow lights but with a tag switch you'd need a default case to make that explicit.
Of course, there are things you can do with iota that you can't do with union types.
These two types are roughly equivalent
type MaybeInt64 union {
None struct{}
Int64 int64
}
and
type MaybeInt64 *int64
The pointer version is more compact. The union version would need an extra bit (which in turn would likely be word sized) to store the dynamic tag, so the size of the value would likely be the same as https://golang.org/pkg/database/sql/#NullInt64
The union version more clearly documents the intent.
Of course, there are things you can do with pointers that you can't do with union types.
These two types are roughly equivalent
type AB union {
A A
B B
}
and
type AB interface {
secret()
}
func (A) secret() {}
func (B) secret() {}
The union version can't be circumvented with embedding. A and B don't need methods in common—they could in fact be primitive types or have entirely disjoint method sets, like the json.Token example @urandom posted.
It's really easy to see what you can put in an AB union versus an AB interface: the definition is the documentation (I have had to read the go/ast source multiple times to figure out what something is).
The AB union can never be nil and can be given methods outside the intersection of its constituents (this could be simulated by embedding the interface in a struct but then construction becomes more delicate and error prone).
Of course, there are things you can do with interfaces that you can't do with union types.
Maybe that overlap is too much overlap.
In each case the primary benefit of the union versions is indeed stricter compile time checking. What you can't do is more important than what you can. For the compiler that translates into stronger invariants it can use to optimize the code. For the programmer that translates into another thing you can let the compiler worry about—it'll just tell you if you're wrong. In the interface version, at the very least, there are important documentation benefits.
Clunky versions of the iota and pointer examples can be constructed using the "interface with an unexported method" strategy. For that matter, though, structs could be simulated with a map[string]interface{}
and (nonempty) interfaces with func types and method values. No one would because it's harder and less safe.
All of those features add something to the language but their absence could be worked around (painfully, and under protest).
So I'm assuming the bar isn't to demonstrate a program that cannot even be approximated in Go, but rather to demonstrate a program that is much more easily and cleanly written in Go with unions than without. So what remains to be shown is that.
@jimmyfrasche
I see no reason why the union type should have named fields. Names are only useful if you want to distinguish between different fields of the same type. However, a union must never have multiple fields of the same type, as that is quite meaningless. Thus, having names is just redundant, and leads to confusion and more typing.
In an essence, your union type should look something like:
union {
struct{}
int
err
}
The types themselves will provide the unique indentifiers that can be used to assign to a union, quite similar to the way embedded types in structs are used as identifiers.
However, in order for explicit assignments to work, one cannot be able to create a union type by specifying an unnamed type as a member, since the syntax would allow such an expression. E.g. v.struct{} = struct{}
Thus, types like raw struct, unions and funcs have to be named beforehand in order to be part of a union, and become assignable. With this in mind, a nested union will not be anything special, as the inner union will just be another member type.
Now, I'm not sure which syntax would be better.
[union|sum|pick|oneof] {
type1
package1.type2
....
}
The above seems more go-like, but is a bit verbose for such a type.
On the other hand, type1 | package1.type2
may not look like your usual go type, however it gains the benefit of using the '|' symbol, which is predominantly recognised as an OR. And it reduces verbosity without being cryptic.
@urandom if you don't have "tag names" but allow interfaces the sums collapse into an interface{}
with extra checks. They stop being sum types since you can put one thing in but get it out multiple ways. The tag names let them be sum types and hold interfaces without ambiguity.
The tag names repair a lot more than just the interface{} problem, though. They make the type much less magical and let everything be gloriously explicit without having to invent a bunch of types just to differentiate. You can have explicit assignment and type literals, as you point out.
That you can give one type more than one tag is a feature. Consider a type to measure how many successes or failures have happened in a row (1 success cancels out N failures and vice versa)
type Counter union {
Successes, Failures uint
}
without the tag names you'd need
type (
Success uint
Failures uint
Counter Successes | Failures
)
and assignment would look like c = Successes(1)
instead of c.Successes = 1
. You don't gain much.
Another example is a type that represents local or remote failure. With tag names this is easy to model:
type Failure union {
Local, Remote error
}
The providence of the error can be specified with its tag name, regardless of what the actual error is. Without tag names you'd need type Local { error }
and the same for remote, even if you allow interfaces directly in the sum.
The tag names are sort of creating special neither alias nor named types locally in the union. Having multiple "tags" with identical types isn't unique to my proposal: it's what every functional language (that I know of) does.
The ability to create unexported tags for exported types and vice versa is an interesting twist, too.
Also having separate tag and type assertions allows for some interesting code, like being able to promote a shared method to the union with a one-line wrapper.
It seems like it solves more problems than it causes and makes everything fit together much nicer. I honestly wasn't so sure when I wrote it up, but I'm becoming increasingly convinced that it's the only way to resolve all the issues with integrating sums into Go.
To expand on that somewhat, the motivating example for me was from @rogpeppe io.Reader | io.ReadCloser
. Allowing interfaces without tags, this is the same type as io.Reader
.
You can put a ReadCloser in and pull it out as a Reader. You lose the A | B means A or B property of sum types.
If you need to be specific about sometimes handling an io.ReadCloser
as an io.Reader
you need to create wrapper structs as @bcmills pointed out, type Reader struct { io.Reader }
etc. and have the type be Reader | ReadCloser
.
Even if you limit sums to interfaces with disjoint method sets, you still have this problem because one type can implement more than one of those interfaces. You lose the explicitness of sum types: they're not "A or B": they're "A or B or sometimes whichever you feel like".
Worse, if those types are from other packages they can suddenly behave differently after an update even if you were very careful to construct your program so that A is never treated the same as B.
Originally I explored disallowing interfaces to solve the problem. No one was happy with that! But it also didn't get rid of problems like a = b
meaning different things depending on the types of a and b, which I'm not comfortable with. There also had to be a lot of rules about what type is chosen in the pick when type assignability comes into play. It's a lot of magic.
You add tags and that all goes away.
With union { R io.Reader | RC io.ReadCloser }
you can explicitly say I want this ReadCloser to be considered as a Reader if that's what makes sense. No wrapper types needed. It's implicit in the definition. Regardless of the type of the tag, it's either one tag or the other.
The downside is that, if you get an io.Reader from somewhere else, say a chan receive or func call, and it might be an io.ReadCloser and you need to assign it to the proper tag you have to type assert on io.ReadCloser and test. But that makes the intent of the program much clearer—exactly what you mean is in the code.
Also because the tag assertions are different from type assertions, if you really don't care and just want an io.Reader regardless, you can use a type assertion to pull that out, regardless of tag.
This is a best-effort transliteration of a toy example into Go without unions/sums/etc. It's probably not the best example but it's the one I used to see what this would look like.
It shows the semantics in a more operational fashion, which will likely be easier to understand than some terse bullet points in a proposal.
There is quite a bit of boilerplate in the transliteration so I generally only written the first instance of several methods with a note about the repetition.
In Go with union proposal:
type fail union { //zero value: (Local, nil)
Local, Remote error
}
func (f fail) Error() string {
//Could panic if local/remote nil, but assuming
//it will be constructed purposefully
return f.(error).Error()
}
type U union { //zero value: (A, "")
A, B, C string
D, E int
F fail
}
//in a different package
func create() pkg.U {
return pkg.U{D: 7}
}
func process(u pkg.U) {
switch u := u.[type] {
case A:
handleA(u) //undefined here, just doing something with unboxed value
case B:
handleB(u)
case C:
handleC(u)
case D:
handleD(u)
case E:
handleE(u)
case F:
switch u := u.[type] {
case Local:
log.Fatal(u)
case Remote:
log.Printf("remote error %s", u)
retry()
}
}
}
Transliterated to current Go:
(notes are included about the differences between the transliteration and the above)
const ( //simulates tags, namespaced so other packages can see them without overlap
Fail_Local = iota
Fail_Remote
)
//since there are only two tags with a single type this can
//be represented precisely and safely
//the error method on the full version of fail can be
//put more succinctly with type embedding in this case
type fail struct { //zero value (Fail_Local, nil) :)
remote bool
error
}
// e, ok := f.[Local]
func (f *fail) TagAssertLocal2() (error, bool) { //same for TagAssertRemote2
if !f.remote {
return nil, false
}
return f.error, true
}
// e := f.[Local]
func (f *fail) TagAssertLocal() error { //same for TagAssertRemote
if !f.remote {
panic("invalid tag assert")
}
return f.error
}
// f.Local = err
func (f *fail) SetLocal(err error) { //same for SetRemote
f.remote = false
f.error = err
}
// simulate tag switch
func (f *fail) TagSwitch() int {
if f.remote {
return Fail_Remote
}
return Fail_Local
}
// f.(someType) needs to be written as f.TypeAssert().(someType)
func (f *fail) TypeAssert() interface{} {
return f.error
}
const (
U_A = iota
U_B
// ...
U_F
)
type U struct { //zero value (U_A, "", 0, fail{}) :(
kind int //more than two types, need an int
s string //these would all occupy the same space
i int
f fail
}
//s, ok := u.[A]
func (u *U) TagAssertA2() (string, bool) { //similar for B, etc.
if u.kind == U_A {
return u.s, true
}
return "", false
}
//s := u.[A]
func (u *U) TagAssertA() string { //similar for B, etc.
if u.kind != U_A {
panic("invalid tag assert")
}
return u.s
}
// u.A = s
func (u *U) SetA(s string) { //similar for B, etc.
//if there were any pointers or reference types
//in the union, they'd have to be nil'd out here,
//since the space isn't shared
u.kind = U_A
u.s = s
}
// special case of u.F.Local = err
func (u *U) SetF_Local(err error) { //same for SetF_Remote
u.kind = U_F
u.f.SetLocal(err)
}
func (u *U) TagSwitch() int {
return u.kind
}
func (u *U) TypeAssert() interface{} {
switch u.kind {
case U_A, U_B, U_C:
return u.s
case U_D, U_E:
return u.i
}
return u.f
}
//in a different package
func create() pkg.U {
var u pkg.U
u.SetD(7)
return u
}
func process(u pkg.U) {
switch u.TagSwitch() {
case U_A:
handleA(u.TagAssertA())
case U_B:
handleB(u.TagAssertB())
case U_C:
handleC(u.TagAssertC())
case U_D:
handleD(u.TagAssertD())
case U_E:
handleE(u.TagAssertE())
case U_F:
switch u := u.TagAssertF(); u.TagSwitch() {
case Fail_Local:
log.Fatal(u.TagAssertLocal())
case Fail_Remote:
log.Printf("remote error %s", u.TagAssertRemote())
}
}
}
@jimmyfrasche
Since the union contains tags that may have the same type, wouldn't the following syntax be better suited:
func process(u pkg.U) {
switch v := u {
case A:
handleA(v) //undefined here, just doing something with unboxed value
case B:
handleB(v)
case C:
handleC(v)
case D:
handleD(v)
case E:
handleE(v)
case F:
switch w := v {
case Local:
log.Fatal(w)
case Remote:
log.Printf("remote error %s", w)
retry()
}
}
}
The way I see it, when used with a switch, a union is quite similar to types such as int, or string. The main difference being that there are only finite 'values' that can be assigned to it, as opposed to the former types, and the switch itself is exhaustive. Thus, in this case I don't really see the need for a special syntax, reducing the mental work of the developer.
Also, under this proposal, would such code be valid:
type Foo union {
// Completely different types, no ambiguity
A string
B int
}
func Bar(f Foo) {
switch v := f {
....
}
}
....
func main() {
// No need for Bar(Foo{A: "hello world"})
Bar("hello world")
Bar(1)
}
@urandom I chose a syntax to reflect the semantics using analogies to existing Go syntax whenever possible.
With interface types you can do
var i someInterface = someValue //where someValue implements someInterface.
var j someInterface = i //this assignment is different from the last one.
That's fine and unambiguous since it doesn't matter what the type of someValue
is as long as the contract is satisfied.
When you introduce tags† on unions it can sometimes be ambiguous. Magic assignment would only be valid in certain cases. Special casing it only gets you around having to be explicit sometimes.
I don't see a point in being able to sometimes skip a step, especially when a code change can easily invalidate that special case and then you have to go back and update all the code anyway. To use your Foo/Bar example if C int
is added to Foo
then Bar(1)
has to change but not Bar("hello world")
. It complicates everything to save a few keystrokes in situations that may not be that common and makes the concepts harder to understand because sometimes they look like this and sometimes they look that—just consult this handy flowchart to see which applies to you!
† I wish I had a better name for those. There are already struct tags. I would have called them labels but Go has those, too. Calling them fields seems both more appropriate and the most confusing. If anyone wants to bikeshed this one could really use a fresh coat.
In a sense, tagged unions are more similar to a struct than an interface. They're a special kind of struct that can only have one field set at a time. Seen in that light, your Foo/Bar example would be like saying this:
type Foo struct {
A string
B int
}
func Bar(f Foo) {...}
func main() {
Bar("hello world") //same as Bar(Foo{A: "hello world", B: 0})
Bar(1) //same as Bar(Foo{A: "", B: 1})
}
While it is unambiguous in this case, I don't think it's a good idea.
Also in the proposal Bar(Foo{1})
is allowed when it is unambiguous if you really want to save keystrokes. You can also have pointers to unions so the composite literal syntax is still necessary for &Foo{"hello world"}
.
That said, unions do have a similarity to interfaces in that they have a dynamic tag of which "field" is currently set.
The switch v := u.[type] {...
nicely mirrors the switch v := i.(type) {...
for interfaces while still allowing type switches and assertions directly on union values. Maybe it should be u.[union]
to make it easier to spot, but either way the syntax isn't that heavy and it's clear what's meant.
You could make the same argument that the .(type)
is unnecessary, but when you see that you always know exactly what's happening and that fully justifies it, in my opinion.
That was my reasoning behind these choices.
@jimmyfrasche
The switch syntax seems a bit counter-intuitive to me, even after your explanations. With an interface, switch v := i.(type) {...
switches through possible types, as listed by the switch cases, and indicated by .(type)
.
However, with a union, a switch is not switching through possible types, but values. Each case represents a different possible value, where values may in fact share the same type. This is more similar to strings and int switches, where cases also list values, and their syntax is a simple switch v := u {...
. From that, to me it seems more natural that switching through the values of a union would be switch v := u { ...
, since the cases are similar, but more restrictive, than the cases for ints and strings.
@urandom that's a very good point about the syntax. The truth is it's a holdover from my previous proposal without labels so it was the type then. I just blindly copied it over without thinking. Thanks for pointing it out.
switch u {...
would work but the problem with switch v := u {...
is it looks too much like switch v := f(); v {...
(which would make error reporting more difficult—not clear which was intended).
If the union
keyword were renamed to pick
as has been suggested by @as then the tag switch could be written as switch u.[pick] {...
or switch v := u.[pick] {...
which keeps the symmetry with a type switch but loses the confusion and looks pretty nice.
Even if the implementation is switching on an int there's still implicit destructuring of the pick into dynamic tag and stored value, which I think should be explicit, regardless of grammatical rules
you know, just calling the tags fields and having it be field assert and field switch makes a great deal of sense.
edit: that would make using reflect with picks awkward, though
[Sorry for delayed response - I was away on vacation]
@ianlancetaylor wrote:
Something I don't see in your proposals is why we should do this. There is a clear disadvantage to adding a new kind of type: it's a new concept that everybody who leans Go will have to learn. What is the compensating advantage? In particular, what does the new kind of type give us that we don't get from interface types?
There are two main advantages that I see. The first is a language advantage; the second is a performance advantage.
when processing messages, particularly when read from a concurrent process, it is very useful to be able to know the complete set of messages that can be received, because each message can come with associated protocol requirements. For a given protocol, the number of possible message types might be very small, but when we use an open-ended interface to represent the messages, that invariant is not clear. Often people will use a different channel for each message type to avoid this, but that comes with its own costs.
there are times when there is a small number of known possible message types, none of which contain pointers. If we use an open-ended interface to represent them we need to incur an allocation to make interface values. Using a type that restricts the possible message types means that can be avoided and hence relieve GC pressure and increase cache locality.
A particular pain for me that sum types could solve is godoc. Take ast.Spec
for example: https://golang.org/pkg/go/ast/#Spec
Many packages manually list the possible underlying types of a named interface type, so that a user can quickly get an idea without having to look at the code or rely on name suffixes or prefixes.
If the language already knows all the possible values, this could be automated in godoc
much like enum types with iotas. They could also actually link to the types, as opposed to being just plaintext.
Edit: another example: https://github.com/mvdan/sh/commit/ebbfda50dfe167bee741460a4491ffec1006bdef
@mvdan that's an excellent, practical point for improving the story in Go1 without any language changes. Can you file a separate issue for that and reference this one?
Sorry, are you referring to just links to other names within the godoc page, but still listing them manually?
Sorry, should have been clearer.
I meant a feature request for automatically handling types that implement interfaces defined in the current package in godoc.
(I believe there's a feature request somewhere for linking manually listed names, but I don't have the time to hunt it down at the moment).
I don't wish to take over this (already very long) thread, so I've created a separate issue - see above.
@Merovius I'm replying to https://github.com/golang/go/issues/19814#issuecomment-298833986 in this issue since the AST stuff applies more to sum types than enums. Apologies for pulling you into a different issue.
First, I'd like to reiterate that I'm not sure if sum types belong in Go. I've yet to convince myself that they definitely do not belong. I'm working under the assumption that they do in order to explore the idea and see whether they fit. I'm willing to be convinced either way, though.
Second, you mentioned gradual code repair in your comment. Adding a new term to a sum type is by definition a breaking change, on par with adding a new method to an interface or removing a field from a struct. But this is the correct and desired behavior.
Let's consider the example of an AST, implemented with a Node interface, that adds a new kind of node. Let's say the AST is defined in an external project and you're importing that in a package in your project, which walks the AST.
There are a number of cases:
With interface-based AST only case 2.1.1 works correctly. This is coincidence as much as anything. Gradual code repair doesn't work. The AST hast to bump its version and your code needs to bump its version.
An exhaustiveness linter would help but since the linter can't examine all interface types it needs to be told in some manner that a particular interface needs to be checked. That either means an in source comment or some sort of config file in your repo. If it's an in source comment, since by definition the AST is defined in a separate project you are at the mercy of that project to tag the interface for exhaustiveness checking. This only works at scale if there is a single exhaustiveness linter that the entire community agrees upon and always uses.
With a sum-based AST, you still need to use versioning. The only difference in this case is that the exhaustiveness linter is built into the compiler.
Neither helps with 2.2, but what could?
There's a simpler, AST-adjacent, case where sum types would be useful: tokens. Say you're writing a lexer for a simpler calculator. There are tokens like *
that don't have any values associated with them and tokens like Var
that have a string representing the name, and tokens like Val
that hold a float64.
You could implement this with interfaces but it would be tiresome. You'd probably do something like this, though:
package token
type Type int
const (
Times Type = iota
// ...
Var
Val
)
type Value struct {
Type
Name string // only valid if Type == Var
Number float64 // only valid if Type == Val
}
An exhaustiveness linter on iota-based enums could ensure an illegal Type is never used, but it wouldn't work too well against someone assigning to Name when Type == Times or using Number when Type == Var. As the number and kind of tokens grow it only gets worse. Really the best you could do here is add a method, Valid() error
, that checks all the constraints and a bunch of documentation explaining when you can do what.
A sum type easily encodes all of those constraints and the definition would be all the documentation needed. Adding a new kind of token would be a breaking change but everything I said about AST's still applies here.
I think more tooling is necessary. I'm just not convinced it's sufficient.
@jimmyfrasche
Second, you mentioned gradual code repair in your comment. Adding a new term to a sum type is by definition a breaking change, on par with adding a new method to an interface or removing a field from a struct.
No, it is not on par. You can do both of those changes in a gradual repair model (for interfaces: 1. Add new method to all implementations, 2. Add method to interface. For struct fields: 1. Remove all usages of field, 2. Remove field). Adding a case in a sum type can not work in a gradual repair model; if you add it do the lib first, it would break all users, as they don't check exhaustively anymore, but you can't add it to the users first, because the new case doesn't exist yet. Same goes for removal.
It's not about whether or not it's a breaking change, it's about whether it's a breaking change that can be orchestrated with minimal interruption.
But this is the correct and desired behavior.
Exactly. Sum types, by their very definition and every reason people want them, are fundamentally incompatible with the idea of gradual code repair.
With interface-based AST only case 2.1.1 works correctly.
No, it also works correctly in case 1.2 (failing at runtime for unrecognized grammar is perfectly fine. I probably wouldn't want to panic, though, but just return an error) and also in a whole lot of cases of 2.1. The rest is a fundamental issue with upgrading software; if you add a new feature to a library, users of your lib need to change code to make use of it. It doesn't mean your software is incorrect until it does, though.
The AST hast to bump its version and your code needs to bump its version.
I do not see how this follows from what you are saying, at all. To me, saying "this new grammar won't work with all tools yet, but it is available to the compiler" is fine. Just as "if you run this tool on this new grammar, it will fail at runtime" is fine. At its very worst, this only adds another step to the gradual repair process: a) Add the new node to the AST package and parser. b) Fix tools using the AST package to take advantage of the new node. c) Update code to use the new node. Yes, the new node will only become usable, after a) and b) are done; but in every step of this process, without any breakages, everything will still compile and work correctly.
I'm not saying you'll be automatically fine in a world of gradual code repair and no exhaustive compiler checks. It will still require careful planning and execution, you will still probably break unmaintained reverse dependencies and there might still be changes you might not be able to do at all (though I can't think of any). But at least a) there is a gradual upgrade path and b) the decision of whether this should break your tool at runtime, or not, is up to the author of the tool. They can decide what to do in a case that is unknown.
An exhaustiveness linter would help but since the linter can't examine all interface types it needs to be told in some manner that a particular interface needs to be checked.
Why? I would argue that it's fine for switchlint™ to complain about any type-switch without a default-case; after all, you'd expect the code to work with any interface definition, so not having code in place to work with unknown implementations is likely a problem anyway. Yes, there are exceptions to this rule, but exceptions can already be manually ignored.
I'd probably be more on board with enforcing "every type-switch should require a default case, even if it's empty" in the compiler, than with actual sum types. It would both enable and force people to make the decision of what their code should do when faced with an unknown choice.
You could implement this with interfaces but it would be tiresome.
shrug it's a one-time effort in a case that very rarely comes up. Seems fine to me.
And FWIW, I'm currently only arguing against the exhaustive checking notion of sum types. I don't yet have any strong opinions about the added convenience of saying "any of these structurally defined types".
@Merovius I'm going to have to think further on your excellent points about gradual code repair. In the meantime:
I'm currently only arguing against the exhaustive checking notion of sum types.
You can explicitly opt out of exhaustiveness checks with a default case (well, effectively: the default makes it exhaustive by adding a case that covers "anything else, whatever that may be"). You still have a choice, but you must make it explicitly.
I would argue that it's fine for switchlint™ to complain about any type-switch without a default-case; after all, you'd expect the code to work with any interface definition, so not having code in place to work with unknown implementations is likely a problem anyway. Yes, there are exceptions to this rule, but exceptions can already be manually ignored.
That's an interesting idea. While it would hit sum types simulated with interface and enums simulated with const/iota, it doesn't tell you that you missed a known case, just that you didn't handle the unknown case. Regardless, it seems noisy. Consider:
switch {
case n < 0:
case n == 0:
case n > 0:
}
That's exhaustive if n is integral (for floats it's missing n != n
) but without encoding a lot of information about types it's probably easier to just flag that as missing default. For something like:
switch {
case p[0](a, b):
case p[1](a, b):
//...
case p[N](a, b):
}
even if the p[i]
form an equivalence relation on the types of a
and b
it's not going to be able to prove that, so it must flag the switch as missing a default case, which means a way to silence it with a manifest, an annotation in the source, a wrapper script to egrep -v
out the whitelisted, or an unnecessary default on the switch which falsely implies that the p[i]
are not exhaustive.
At any rate that would be trivial linter to implement if the "always complain about no default in all circumstances" route is taken. It would be interesting to do so and run it on go-corpus and see how noisy and/or useful it is in practice.
Alternate token implementations:
//Type defined as before
type SimpleToken { Type }
type StringToken { Type; Value string }
type NumberToken { Type; Value float64 }
type Interface interface {
//some method common to all these types, maybe just token() Interface
}
That gets rid of the possibility of defining an illegal token state where something has both a string and a number value but doesn't disallow creating a StringToken
with a type that should be a SimpleToken
or vice versa.
To do that with interfaces you need to define one type per token (type Plus struct{}
, type Mul struct{}
, etc.) and most of the definitions are exactly the same exact for the type name. One time effort or not that's a lot of work (though well suited for code generation in this case).
I suppose you could have a "hierarchy" of token interfaces to partition the kinds of tokens based on the allowable values: (Assuming in this example that there is more than one kind of token that can contain a number or string, etc.)
type SimpleToken int //implements token.Interface
const (
Plus SimpleToken = iota
// ...
}
type NumericToken interface {
Interface
Value() float64
nt() NumericToken
}
type IntToken struct { //implements NumericToken, and a FloatToken
type StringToken interface { // for Var and Func and Const, etc.
Interface
Value() string
st() StringToken
}
Regardless, it means each token requires a pointer deference to access its value, unlike with the struct or sum type which only require pointers when strings are involved. So with appropriate linters and improvements to godoc the big win for sum types in this case is related to minimizing allocations while disallowing illegal states and the amount of typing (in the keyboard sense), which doesn't seem unimportant.
You can explicitly opt out of exhaustiveness checks with a default case (well, effectively: the default makes it exhaustive by adding a case that covers "anything else, whatever that may be"). You still have a choice, but you must make it explicitly.
So, it seems like either way, both of us will have the choice to opt in or out of exhaustive checking :)
it doesn't tell you that you missed a known case, just that you didn't handle the unknown case.
Effectively, I believe, the compiler already does a whole-program analysis to determine what concrete types are used in what interfaces I think? I would at least expect it to, at least for for non-interface type assertions (i.e. type-assertions that are not asserting to an interface type, but to a concrete type), generate the function tables used in interfaces at compile time. But, honestly, this is argued from first principles, I have no idea about the actual implementation.
In any case, it should be pretty easy, to a) list any concrete type defined in a whole program and b) for any type-switch, filter them on whether they implement that interface. If you use something like this, you'd end up with a reliable list. I think.
I'm not 100% convinced that a tool can be written that is as reliable as actually explicitly stating the options, but I'm convinced that you could cover 90% of the cases and you could definitely write a tool that does this outside of the compiler, given the correct annotations (i.e. make sum-types a pragma-like comment, not an actual type). Not a great solution, admittedly.
Regardless, it seems noisy. Consider:
I think this is being unfair. The cases you are mentioning have nothing at all to do with sum-types. If I where to write such a tool, I'd restrict it to type-switches and switches with an expression, as those seem to be the way sum types would be handled too.
Alternate token implementations:
Why not a marker-method? You don't need a type-field, you get that for free from the interface representation. If you are concerned about repeating the marker-method over and over again; define an unexported struct{}, give it that marker method and embed it in each implementation, for zero extra cost and less typing per option than your method.
Regardless, it means each token requires a pointer deference to access its value
Yes. It's a real cost, but I don't think it outweighs basically any other argument.
I think this is being unfair.
That's true.
I wrote a quick-and-dirty version and ran it on the stdlib. Checking any switch statement had 1956 hits, restricting it to skip the switch {
form reduced that count to 1677. I haven't inspected any of those locations to see if the result is meaningful.
https://github.com/jimmyfrasche/switchlint
There's certainly a great deal of room for improvement. It's not terribly sophisticated. Pull requests welcome.
(I'll reply to the rest later)
edit: wrong markup format
I think this is a (quite biased) summary of everything so far (and narcissistically assuming my second proposal)
Pros
Cons
1 + 1 + ⋯ + 1
Alternatives
1 + 1 + ⋯ + 1
Regardless
For gradual repair, and that is a big one, I think the only option is for external packages to opt out of the exhaustiveness checks. This does imply that it must be legal to have an "unnecessary" default case only concerned with future proofing even though you otherwise match everything else. I believe that that's implicitly true now, and if not easy enough to specify.
There could be an announcement from a package maintainer that "hey we're going to be adding a new member to this sum type in the next version, make sure you can handle it" and then a switchlint tool could find any cases that need to be opted out.
Not as straightforward as other cases, but still quite doable.
When writing a program that uses an externally defined sum type you could comment out the default to make sure you didn't miss any known cases and then uncomment it before committing. Or there could be a tool to let you know that the default is "unnecessary" which tells you that you got everything known and are future-proofed against the unknown.
Let's say we want to opt-in to exhaustiveness checking with a linter when using interface types simulating sum types, regardless of the package they're defined in.
@Merovius your betterSumType() BetterSumType
trick is very cool, but it means that switches have to happen in the defining package (or you expose something like
func CallBeforeSwitches(b BetterSumType) (BetterSumType, bool) {
if b == nil {
return nil, false
}
b = b.betterSumType()
if b == nil {
return nil, false
}
return b, true
}
and also lint that that's called every time).
What are the criteria necessary to check that all switches in a program are exhaustive?
It can't be the empty interface, because then anything's game. So it needs at least one method.
If the interface has no unexported methods, any type could implement it so exhaustiveness would depend on all the packages up the callgraph of each switch. It's possible to import a package, implements its interface, and then send that value to one of the package's functions; so a switch in that function wouldn't be able to be exhaustive without creating an import cycle. So it needs at least one unexported method. (This subsumes the previous criterion).
Embedding would mess up the property we're looking for, so we need to ensure that none of the importers of the package ever embed the interface or any of the types that implement it at any point. A really fancy linter may be able to tell that sometimes embedding is okay if we never call a certain function that creates an embedded value or none of the embedded interfaces ever "escape" the API boundary of the package.
To be thorough we either need to check that the zero value of the interface is never passed around or enforce that an exhaustive switch check case nil
as well. (The latter is easier but the former is preferred since including nil turns a "type A or type B or type C" sum into a "nil or type A or type B or type C" sum).
Let's say we have a linter, with all those abilities, even the optional ones, that can verify these semantics for any tree of imports and any given interface within that tree.
Now let's say we have project with a dependency D. We want to make sure an interface defined in one of the packages of D is exhaustive in our project. Let's say it does.
Now, we need to add a new dependency to our project D′. If D′ imports the package in D that defined the interface type in question but does not use this linter, it can easily destroy the invariants that need to hold for us to use exhaustive switches.
For that matter, let's say D just passed the linter coincidentally not because the maintainer runs it. An upgrade to D could just as easily destroy the invariants as D′.
Even if the linter can say "right now this is 100% exhaustive 👍" that can change without us doing anything.
An exhaustiveness checker for "iota enums" seems easier.
For all type t u
where u
is integral and t
is used as a const
with either individually specified values or iota
such that the zero value for u
is included among these constants.
Notes:
1 << iota
may be treated as a powerset, I believe at least most of the time, but would probably require extra conditions, especially surrounding bitwise complement. For the time being, they will not be consideredFor some shorthand, let's call min(t)
the constant such that for any other constant, C
, min(t) <= C
, and, similarly, let's call max(t)
the constant such that for any other constant, C
, C <= max(t)
.
To ensure t
is used exhaustively we need to ensure that
t
are always the named constants (or 0 in certain idiomatic positions, like function invocation)t
, v
, outside of min(t) <= v <= max(t)
t
are never used in arithmetic operations +
, /
, etc. A possible exception could be when the result is clamped between min(t)
and max(t)
immediately afterward, but that could be hard to detect in general so it may require an annotation in comments and should probably be restricted to the package that defines t
.t
or a default case.This still requires verifying all the packages in the import tree and can be invalidated as easily, although it is less likely to be invalidated in idiomatic code.
My understanding is that this, similar to type aliases, won't be breaking changes, so why hold it up for Go 2?
Type aliases don't introduce a new keyword, which is a definite breaking change. There also seems to be a moratorium on even minor language changes and this would be a major change. Even just retrofitting all marshal/unmarshal routines to handle reflected sum values would be a huge ordeal.
Type alias are fixing an issue for which there was no workaround for. Sum types provide a benefit in type safety, but it isn't a show stopper not having them.
Just one (minor) point in favour of something like @rogpeppe's original proposal. In package http
, there's the interface type Handler
and a function type that implements it, HandlerFunc
. Right now, in order to pass a function to http.Handle
, you explicitly have to convert it to a HandlerFunc
. If http.Handle
instead accepted an argument of type HandlerFunc | Handler
, it could accept any function/closure assignable to HandlerFunc
directly. The union effectively serves as a type hint telling the compiler how values with unnamed types can be converted to the interface type. Since HandlerFunc
implements Handler
, the union type would behave exactly like Handler
otherwise.
@griesemer in response to your comment in the enum thread, https://github.com/golang/go/issues/19814#issuecomment-322752526, I think my proposal earlier in this thread https://github.com/golang/go/issues/19412#issuecomment-289588569 addresses the question of how sum types ("swift style enums") would have to work in Go. As much as I would like them, I don't know if they would be a necessary addition to Go, but I do think if they were added they'd have to look/operate a lot like that.
That post's not complete and there's clarifications throughout this thread, before and after, but I don't mind reiterating those points or summarizing since this thread is quite long.
If you have a sum type simultated by an interface with a type tag and absolutely cannot have it be circumvented by embedding, this is the best defense I've come up with: https://play.golang.org/p/FqdKfFojp-
Another possible approach is this: https://play.golang.org/p/p2tFm984S8
This is a proposal for sum types, also known as discriminated unions. Sum types in Go should essentially act like interfaces, except that:
Sum types can be matched with a switch statement. The compiler checks that all variants are matched. Inside the arms of the switch statement, the value can be used as if it is of the variant that was matched.