golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
123.83k stars 17.65k forks source link

proposal: Go 2: generic native types (GNTs) #32863

Closed rcoreilly closed 4 years ago

rcoreilly commented 5 years ago

This proposal is to add generic native types (GNTs) to Go: generic versions of each of the native builtin types (e.g., map = generic map, slice = generic slice, number = generic number, float = generic float, signed = generic signed integer, unsigned = generic unsigned integer, etc).

Edit: Can also use generic types to specify container types, e.g., []number is a slice with number elements, map[string]number, etc.

GNTs support a substantial proportion of generic programming functionality, without adding any new syntax to the language: generic code looks identical to regular Go code, except for the use of these generic type names, and a few extra functions that access the type parameters of container objects such as maps and slices (e.g., elem(x) is the type of the elment in a slice or map; key(m) is the type of the key of a map; field(s, fieldName) is the type of the given named field in a struct; return(f) is the type of return value from function, and type(x) is the type of a generic var, e.g., for use in return signature).

The generic struct is the most challenging case, because of its unlimited number of type parameters. The proposed strategy is to adopt the interface concept, but specialized for the struct type, as a struct interface -- see https://github.com/golang/go/issues/23796 for discussion of a related proposal, where most of the objections were about "polluting" the existing interface concept with struct-specific functionality -- this problem is avoided by having a special type of interface only for structs, which provides all the benefits from that proposal, as well as supporting the definition of a generic struct type.

Edit: It might be clearer to use prototype instead of struct interface due to the differences between this type and the existing interface -- they do share some properties but also differ significantly.

Here are some examples of some of the main generics use-cases under this proposal:

func Keys(m map) []key(m) {
    s := make([]key(m), len(m))
    idx := 0
    for k := range m {
        s[idx] = k
        idx++
    }
    return s
}
// note: all args in same list (x,y) must be of same concrete type
func Min(x, y number) type(x) {
  if x < y {
     return x
  }
  return y
}
// A struct interface is like an interface, specialized for struct -- any struct
// with the specified field names and types gets the generic methods,
// instantiated for the concrete type
type Vec2 struct interface {
    X, Y number // or float
}

Edit: or prototype version:

// A prototype provides an abstract implementation based on a struct,
// using generically-typed fields, and methods operating on those fields.
// Any concrete struct with the specified field names and types matches the prototype
// (similar to how interfaces are "magically" matched, except based on fields,
// not methods).
// The concrete struct then gets the generic methods from the prototype,
// instantiated for the concrete field types
type Vec2 prototype {
    X, Y number // or float
}
func (v Vec2) Add(u Vec2) Vec2 {
    return Vec2{v.X + u.X, v.Y + u.Y}
}

// Vec2f32 instantiates the generic Vec2 struct interface [prototype] for the float32 type
type Vec2f32 struct {
    X, Y float32
}

func myfunc() {
    v1 := Vec2f32{1,2}
    v2 := Vec2f32{3,4}
    sum := v1.Add(v2) // this is automatically instantiated from prototype
}
package graph

// Edge defines a connection between two nodes, as a struct interface.
// Any type with fields named From and To that are pointers to a 
// concrete struct that implements the Node struct interface will satisfy
// the Edge interface.
type Edge struct interface {
    From, To *Node 
}

// Node defines a node in a graph, which has Edges connecting it to other
// nodes.  Any concrete struct with a slice of structs satisfying the 
// Edge struct interface, named Edges, will satisfy this interface.
type Node struct interface {
    Edges []Edge // concrete slice of a struct interface type..
}

// Unlike the draft proposal, methods can be defined with struct interface args
// just like any interface type can be passed around currently.  A boxing penalty
// is likely but perhaps not inevitable.
func (n *Node) ShortestPathTo(to *Node) []Edge {
    for _, e := range n.Edges {
        if e.To == to {
            return []Edge{e}
        }
        se := e.To.ShortestPathTo(to) // all just std interface virtualization method calls..
        if len(se) > 0 {
            return se // actually would need to sort by path length and return shortest, but whatever..
        }
    }
    return nil
}

In summary, GNTs leverage the "contracts" associated with the existing native types in the language, instead of introducing an entirely new syntax to specify novel such contracts. This makes programming with these types immediately intuitive and natural for any Go programmer, and the code as shown in the above examples inherits the simplicity and elegance of the existing Go syntax, as compared to other generics proposals that require additional type parameters etc.

Edit: The key difference between GNTs and the draft proposal (and other similar approaches using separate type parameters) is that these other approaches split the type information into two separate parts, while GNTs keep the type specification unitary and in one place (where it normally is for non-generic code). For example, in the draft proposal:

func Print2(type T1, T2)(s1 []T1, s2 []T2) { ... }

You have to go back and forth between the type args and the regular args to understand what the types of s1 and s2 are -- the information is split across these two places. In contrast, under the GNT proposal:

func Print2(s1 slice, s2 slice) { ... }

the type is fully specified (albeit still generic) in one place.

To use a colorful metaphor, the draft design is like a horcrux that splits key information painfully across multiple places, whereas GNT's preserve the "natural" unity of type specification: you don't have to go searching across multiple hiding locations to find the type :) More scientifically, reducing cognitive load by not having to integrate across these separate pieces of information is a well-established principle: https://en.wikipedia.org/wiki/Split_attention_effect

The major limitation is that generic functionality is limited to these existing native types. You cannot define Min / Max functions that work across all native numbers and on non-native numbers such as big.Int. The bet here is that GNTs solve 80% or more of the use-cases in the cleanest, simplest way -- i.e., the defining feature of Go.

Several of the required keywords are repurposed from existing ones, and thus would have no compatibility impact:

These new ones would create potential conflicts:

See GNTs gist for a full discussion of the proposal -- would be happy to write a design doc according to standard template if requested.

rcoreilly commented 5 years ago

The compiler should know that it is dealing with a concrete instantiation of a generic type, so I would think that it could provide a suitably specific error message.

Also, I would guess that the majority of generic code would use the prototype (struct interface) approach, in which case there is a clear specific point at which the generic type is instantiated into a concrete one, so all error messages etc would be localized there, and clearly traceable to that point in the code.

rcoreilly commented 5 years ago

Just checking back in on this -- has this ever been discussed among the core team? Is it DOA or might there still be a chance? I still think it represents an opportunity to do something truly Go-level innovative, simple, and elegant (with commensurate limits), but if a more conventional, general-purpose approach is favored, then that probably makes sense too. Anyway, would be great to hear any more specific comments if available.

ianlancetaylor commented 5 years ago

I just re-read this issue, and I have to say that I do not find it to be simple.

Adding new categories like number, signed, unsigned, float means adding a new set of names and concepts that every Go programmer must learn. You suggest that this is simpler than listing types in a contract. I don't agree. Listing types in a contract, as suggested in the current generics design draft, adds only one new concept to the language: the ability to list types. In practice, people will then rely on the standard library for things like contracts.SignedInteger. I would prefer to put new concepts in the standard library than in the language proper.

That said, defining new compile-time-type-safe data structures remains my major concern. As far as I can tell, the idea here is that a struct interface automatically adds methods to types that follow a certain pattern, but only in the case where a certain package is imported. Nothing else in Go works like that. It seems to mean that a change in one package can silently break a distant package, as it can cause methods to be added to, or removed from, an existing type. That does not seem simple and it does not seem to me to be particularly Go-like.

An 80% solution should introduce a minimum of new concepts to the language. For that matter, the same is true for a 100% solution. I don't see that this proposal meets that goal.

rcoreilly commented 5 years ago

Simple is all relative I guess. The definition I was using is that no new syntax is added, and the resulting code looks exactly like current Go code. I think people would find the generic code to be highly readable and "simple" to understand, per the examples above. There have been a number of objections to the draft design about the new syntax with all the extra parens etc being a major barrier. E.g., Dave Cheney's blog: https://dave.cheney.net/tag/generics I personally find C++ template code nearly unreadable because of all the extra parameters, and there is solid cognitive science research that says that people have a very limited ability to process multiple things at the same time, so adding additional type parameters just pushes the cognitive limits in a serious way (regardless if you use parens or <> or whatever). I really think this is a big issue and is the single most important advantage to this design.

Per some discussion above, the proposal is that struct interface (or prototype) only adds methods for types that directly include the package that defines the prototype. This prevents any silent indirect damage. I don't think it is too difficult to understand this principle, even if it is not currently true of other Go code.

As for the objection about adding the new type categories, that is certainly valid. something has to be added to get the new functionality. I personally don't think that these would be difficult to understand, and given that these type categories are likely to be widely used under any proposal, people will end up having to learn these new categories one way or another. So in some ways just biting the bullet and adding simple, short, clear keywords to the language itself might be better than putting it off into a longer package name like contracts.SignedInteger. That is a lot to type and read compared to signed. Also, it is not like these categories are ever going to change -- there are just a few obvious such categories given the properties of numbers and the Go type system, so enshrining them in the language doesn't seem particularly risky in that respect.

Anyway, lots of judgment calls and again reasonable people will differ. But overall my feeling is that this represents the kind of simplicity argument that has driven Go decisions in the past, and has made the language so special -- the number one consideration has been to keep the code clean and easy to read and understand, and I see this as a way to achieve that goal in the realm of generics, whereas the draft proposal will introduce a lot of extra cognitive load, and is overall very similar to the way things have been done in other languages. Here's a chance to do something different!

ianlancetaylor commented 5 years ago

Adding struct interface is new syntax.

When you point out that there are no type arguments at the call site, that leads me to wonder how for

func AddTo(s []number, a number) {
    for i := range s {
        s[i] += a
    }
}

we can express the fact that a must have the same type as the elements of s. That is, when a function has different arguments of type number, sometimes those have to be the same type, sometimes they don't. How do we express that?

rcoreilly commented 5 years ago

For many cases, there is a convenient solution:

// note: all args in same list (x,y) must be of same concrete type
func Min(x, y number) type(x) {
  if x < y {
     return x
  }
  return y
}

If you don't want to enforce that constraint, then use separate type specifiers:x number, y number. This doesn't work for your example, but there are two potential solutions.

  1. You could use the type(x) expression to specify the slice type:
func AddTo(s []type(a), a number) {
    for i := range s {
        s[i] += a
    }
}

This is more readable I think than requiring explicit type args for everything, even though it is just a bit more complex and more indirect in terms of mental load than just writing number.

  1. Thus, as a "pragmatic" special case, we could simply state that any container element type that matches that of another non-container type, implies the same concrete type. This would likely be correct for the vast majority of uses -- how often would you want to not have that be the case?

Anyway, there is certainly room for debate about option 2 but hopefully option 1 is reasonably satisfactory in any case.

rcoreilly commented 5 years ago

and would you call prototype new syntax, or just another new generic type name? I kinda like prototype better than struct interface at this point, given that the behavior is sufficiently different from interface, and it is also shorter to type.

ianlancetaylor commented 5 years ago

If prototype is a new keyword then it is new syntax by definition.

func AddTo(s []type(a), a number) {

The scoping here troubles me. Currently other than at package scope Go names are always scoped so that references must follow the definition. That changes here.

rcoreilly commented 5 years ago

I forgot about the elem() expression, which allows it to be written in the right order and with the proper semantics that the type is really coming from the container:

func AddTo(s []number, a elem(s)) {

In fact, it would probably be much more likely for this kind of code to appear as a method on a type rather than a standalone function like that.

type NumSlice []number

func (s *NumSlice) AddTo(a elem(s)) {

This would presumably be a fairly familiar pattern so seeing that elem(s) would become expected and easily processed. Likewise for key(m) etc.

As to whether the backward-order version should still be allowed, even if not encouraged, it is probably the case that the parser's multiple passes would be able to handle the forward reference within that same argument scope, so that would have to be a judgment / design call.

rcoreilly commented 5 years ago

and when I was referring to new syntax above, I was talking about something that would change the parse tree in a fundamental way, like adding a whole new set of type parameters to functions, as compared to just adding a new keyword that can be plugged in where e.g., previously you would have had float64 and now you have number -- that seems like a qualitatively different type of new syntax. I guess prototype is a bit more than number but it still looks mostly just like struct from the parsing perspective.

rcoreilly commented 4 years ago

I'm adding this to the overall intro, to highlight the key difference between this proposal and the draft proposal and other standard approaches to generics using separate type arguments: The draft proposal et al split the type information into two separate parts, while this proposal keeps the type specification unitary and in one place (where it normally is for non-generic code). For example, in the draft proposal:

func Print2(type T1, T2)(s1 []T1, s2 []T2) { ... }

You have to go back and forth between the type args and the regular args to understand what the types of s1 and s2 are -- the information is split across these two places. In contrast, under the GNT proposal:

func Print2(s1 slice, s2 slice) { ... }

the type is fully specified (albeit still generic) in one place.

To use a colorful metaphor, the draft design is like a horcrux that splits key information painfully across multiple places, whereas GNT's preserve the "natural" unity of type specification: you don't have to go searching across multiple hiding locations to find the type :)

rcoreilly commented 4 years ago

link added to first issue description (intro): https://en.wikipedia.org/wiki/Split_attention_effect

rcoreilly commented 4 years ago

Closing this in favor of the much simpler #39669 which is essentially a syntactic variant of the current draft proposal.