golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
123.98k stars 17.67k forks source link

proposal: spec: tuples as sugar for structs #63221

Open jimmyfrasche opened 1 year ago

jimmyfrasche commented 1 year ago

Updates:

This proposal adds basic tuples to Go with one piece of sugar and two builtins.

The sugar is for struct(T0, T1, …, Tn) to be shorthand for a struct type with n fields where the ith field has type Ti and is named fmt.Sprintf("F%d", i). For example, struct(int, string) is a more compact way to write struct { F0 int; F1 string }.

This gives us a tuple type notation for and answers all questions about how they behave (as the desugared struct they are equivalent to). By naming the fields F0 and so on, this both provides accessors for the individual elements and states that all tuple fields are exported.

The variadic pack builtin returns an anonymous tuple, with the appropriate types and values from its arguments. In particular, by ordinary function call rules, this allows conversion of a function with multiple returns into a tuple. So pack(1, "x") is equivalent to struct(int, string){1, "x"} and, given func f() (int, error), the statement t := pack(f()) produces the same value for t as the below:

n, err := f()
t := struct(int, error){n, err}

The unpack builtin takes any struct value and returns all of fields in the order of their definition, skipping _ fields and unexported fields from a different package. (This has to be somewhat more generally defined as tuples aren't a separate concept in the language under this proposal.) This is always the inverse of pack. Example:

// in goroutine 1
c <- pack(cmd_repeat, n)

// in goroutine 2
cmd, payload := unpack(<-c)

The struct() sugar let's us write pairs, triples, and so on for values of mixed types without having to worry about names. The pack and unpack builtins make it easier to produce and consume these values.

No changes are needed to the public API of reflect or go/types to handle tuples as they're just structs, though helpers to determine if a given struct is "tuple-y" may be useful. go/ast would need a flag in StructType noting when a struct used the tuple syntax but as long as the implicit field names are explicitly added by the parser. The only place this would be needed is for converting an AST back to a string.

The only potential danger here is unpack. If it's used on a non-tuple struct type from a different package it would be a breaking change for that package to add an additional exported field. Go 1 compat should be updated to say that this is a acceptable just as it says that adding a field that breaks an unkeyed struct literal is acceptable. Additionally, a go vet check should be added that limits unpack to structs with exclusively "F0", "F1", …, "Fn" field names. This can be relaxed at a later time.

This is a polished version of an earlier comment of mine: https://github.com/golang/go/issues/33080#issuecomment-612543798 In the years since I've written many one-off types that could have just been tuples and experimented with generics+code generation to fill in the gap. There have been multiple calls for tuples in random threads here and there and a few proposals:

jimmyfrasche commented 11 months ago

@dsnet a reflect.Pack/Unpack may be useful when using reflect specifically with tuples even if they're not strictly necessary

DeedleFake commented 11 months ago

Random thought: Make the field names something visible, but awful and non-idiomatic, like Tuple_Field_0, etc., and then add a go vet warning with a comment about pack() and unpack() for direct accesses of fields with that name pattern.

jimmyfrasche commented 11 months ago

Arguably Fn already satisfies that.

I don't think it would be unreasonable to access the fields, for example, to sort a slice of tuples so I think effectively outlawing it overshoots somewhat.

DeedleFake commented 11 months ago

Hmmm... What if structs allowed numeric field names, without syntactic support for it? You wouldn't be able to define a numeric field name in a normal struct, but when defining a tuple the resulting struct type definition would get numeric fields instead of named ones. Then, allow structs to be indexed into, i.e. v[1], but require the index to be a constant integer. Internally it would still be a struct. reflect would also see it as a struct, and reflect could be allowed to access struct fields with numeric names normally like any other struct field. pack() and unpack() could then be made to only work with these special numeric fields, removing the question of how to use unpack() with unexported fields from other packages.

In other words,

type Tuple struct(int, string)

would effectively be equivalent to

type Tuple struct {
  0 int
  1 string
}

except that the second would be syntactically illegal because you can't start an identifier with a number. pack() would work as proposed originally, but unpack() would, conceptually, start unpack all of the fields starting with the one named 0 and incrementing until it runs out of fields, ignoring all others. It shouldn't be possible for there to be others, but it would give it a clean rule that way. reflect deals with struct field names as strings anyways, so it could just be allowed to deal with them manually through reflect like they were any other exported field.

v[N] would be syntactic sugar for v.N. N would have to be a constant, but unlike a normal field access it would be allowed to be a named constant. Other than that it would behave exactly the same in all respects as v.N, complete with compile-time checks for field existence.

jimmyfrasche commented 11 months ago

I could imagine tools and libraries doing the wrong thing by failing to special case numeric field names but not well enough to come up with a specific example.

In that vein I idly wonder if it would be possible to decree that the fields follow an unexported scheme like f%d BUT that anything all uses of the struct() syntax are treated as if they're declared in the same package so that their identity isn't bound to their origin. I don't think that's a great idea as the simpler solution of the F%d scheme being much simpler than alternatives.

dsnet commented 11 months ago

I could imagine tools and libraries doing the wrong thing by failing to special case numeric field names but not well enough to come up with a specific example.

I suspect some reflection libraries may only look at exported fields by checking whether the first rune of a name is unicode.IsUpper since the Go spec calls out uppercase identifiers as exported. There is now a reflect.StructField.IsExported method, but that was added more recently (Go 1.17). (I'm assuming fields of a tuple should be exported, but that's debatable.)

jimmyfrasche commented 11 months ago

One argument for numeric fields is that they could be unexported but made exported later. Though yes, I'd prefer they be exported for the rare case where you do want to just take a quick peek in the crate without having to take the whole thing apart.

The argument against doesn't just include reflect but go/ast and co. There is now an assumption that the fields are all legal identifiers so violating that could cause some fun problems.

griesemer commented 11 months ago

A numeric field would still be stored as a string. Only go/parser would have to be adjusted if one were to allow actually writing such fields. If t is a tuple, the 1st field could be accessed as t.1. Again, only the parser has to be adjusted.

jimmyfrasche commented 11 months ago

Only the parser has to be adjusted for the standard library to work. My concern, perhaps unfounded, is tools compiled against the new version whose logic hasn't been updated and doing something wrong because it expects the field name to be a valid Go identifier or as @dsnet points out not realizing that "2" is exported because it has an ad hoc check instead of using one of the various IsExported predicates. If that's the only change that needs to be made to such code it's probably not that bad.

jimmyfrasche commented 11 months ago

Since the standard selector syntax code generation accessing fields would work if the name is left unaltered. There could be issues if it assumes that it can write out the field name in a new struct definition or create a variable using the field name as a prefix. Other than that the only bugs I can think of it causing are incorrectly categorizing it as exported.

jimmyfrasche commented 11 months ago

Assuming numeric field names, presumably given

type tup struct(int, int)
var s struct {
  tup
 a, b int
}

both s.1 and s.a are legal

dsnet commented 11 months ago

Would s.0777, s.0b10101, s.0xbad_face, or s.5_543 be legal? All of them are integer literals per the spec, but I'd be horrified to ever see that in source code. If no, it unfortunately means we need to declare a subset of the integer literal grammar for this.

jimmyfrasche commented 11 months ago

it's a natural grammar

griesemer commented 11 months ago

@dsnet I'd just allow decimal integers. A leading zero is only permitted if the field name is 0. So s.0, s.1, s.42 but certainly not s.0o8 etc. These are names, not numbers, and we'd get to decide the syntax. The point would be to distinguish them from regular identifiers.

jimmyfrasche commented 11 months ago

There are two proposals for field access in generics:

  1. (implicit) any field shared by the type set is allowed
  2. (explicit) require a new kind of F T constraint meaning types may only be in the type set if they have a field F of type T

The F%d scheme works with either. The t.1 syntax works with the implicit but not the explicit proposal unless you are allowed to declare numeric fields in an interface but not a struct.

jimmyfrasche commented 11 months ago

Assuming numeric field names, presumably given

type tup struct(int, int)
var s struct {
  tup
 a, b int
}

both s.1 and s.a are legal

Would unpack(s) be legal and return the same thing as unpack(s.tup)?

jimmyfrasche commented 11 months ago

Library changes for numeric struct fields:

add a predicate to go/token to check if a string matches ^(0|[1-9][0-9]*)$

change token.IsExported to check if the first rune is uppercase or the above predicate is satisfied (ast.IsExported just calls token.IsExported)

in go/ast, note that Ident may contain a numeric string when used in a Field in a FieldList used in a StructType parsed from a tuple or in a SelectorExpr.Sel; possibly add an IsTuple method to StructType

Make a note in go/types.Tuple that there are tuples now but these are unrelated and change the unexported isExported predicate to match the token version; possibly add an IsTuple method to Struct

go/parser,printer,format do not need any visible changes

The reflect IsExported predicates do not need to be changed and would work as-is. Possibly add an IsTuple predicate and Pack/Unpack helpers.

Pros:

No "accidental tuples" so encoding/json could always output a tuple struct as a list without having to worry about backwards compat.

Safe to define unpack to be tuple-only without any contortions.

Undeniably stylish.

Cons:

Possible to cause issues in tools that use go/ast and libraries that use reflect that do their own checks for exportedness or require that field names are identifiers.

More complicated.

Conclusion:

I am starting to lean toward this. It's costlier than the simple F%d scheme but it confers benefit in kind. It may require changes throughout the ecosystem but they'd be extremely niche and most likely very simple to change and could be changed in a way that continues to work with older versions of Go.

Good idea, @DeedleFake!

dsnet commented 11 months ago

^[1-9][0-9]*$ implies that we are 1-indexed instead of 0-indexed, right? Personally, it feels like tuple fields should be 0-indexed to be consistent with how slices and arrays work.

jimmyfrasche commented 11 months ago

oops, yes: corrected to: ^(0|[1-9][0-9]*)$

dsnet commented 11 months ago

possibly add an IsTuple method to Struct

If tuple T is embedded in struct S, what does reflect.TypeFor[S]().IsTuple report? I assume the answer is false? It does feel a bit odd that fields are forwarded to the parent, but not the tuple-ness of the child. Something about this just feels off.

jimmyfrasche commented 11 months ago

Why? Embedding isn't inheritance so there's no IS-A relationship. Metonymy aside, I wouldn't say that my car is an engine though it contains one last I checked.

jimmyfrasche commented 11 months ago

A more interesting embedding case is two tuples of different length:

type T1 struct(int)
type T2 struct(int, int)
var s struct { T1; T2; f int }

There's an s.1 but no s.0 because that selector is ambiguous.

If unpack is defined to return only numeric fields unpack(s) would only return s.1.

[warning: this section is not something I'm recommending, just saying this as part of an argument] There's nothing technically blocking allowing users to add numeric fields to struct. A major reason to use them over an identifier scheme is that it avoids incorrectly opting in existing structs. If user defined numeric fields are added at the same time as or later than tuples, there is no such concern. That would allow

var n = struct {
  4 int
  2 int
}{-1, 1}

If unpack is defined to return only numeric fields unpack(n) could be defined to return 1, -1 or -1, 1.

Given those two things it seems important to

jimmyfrasche commented 11 months ago

64613 proposes s... for what is essentially unpack(s) in this proposal. It also extends composite literal rules to allow S{f()} which covers pack(f()) though it requires having a struct type to pack into. This generalizes some of the mechanism here but does not cover all of them. Notably it doesn't allow working with unspecified fields. It is simple to adapt this proposal:

  1. just use ... instead of unpack
  2. keep struct(T0, T1, ..., Tn) for tuple types
  3. either keep pack without unpack or use struct(v0, v1, ..., vN) for constructing tuple values
DeedleFake commented 11 months ago

I generally agree, but I think that I prefer unpack() over .... It reads much nicer in a few places and removes all question of operator precedence with, for example, something like <-c....

jimmyfrasche commented 7 months ago

66651 proposes variadic generics that would allow this proposal to be written as ordinary code: (along with generic type aliases which are coming soon, see: #46477)

package tuple

type Of[T... any] = struct { F T }

func Pack[T... any](v T) Of[T] {
  return Of[T]{v}
}

func Unpack[T... any](t Of[T]) T {
  return t.F
}