golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
123.85k stars 17.65k forks source link

proposal: remove notion of core types from the language spec. #70128

Open griesemer opened 2 days ago

griesemer commented 2 days ago

This issue replaces the investigative issue #63940 with a concrete proposal. To go straight to the proposal text, skip the Background and Motivation section.

Background

The Go 1.18 release introduced generics and with that a number of new concepts, including type parameters and type constraints. A type constraint acts as the "type" of a type parameter by describing the type parameter's properties, similarly to how a struct type describes the properties of a variable of that struct type.

The language requires that any concrete type that instantiates a specific type parameter satisfies the type parameter's constraint. Because of this requirement one can be certain that a value whose type is a type parameter possesses all of that type parameter's properties, no matter what actual, concrete type is used to instantiate the type parameter.

In Go, type constraints are described through a mix of method and type requirements which together define a type set: a type set comprises all the types which satisfy all the requirements. Specifically, Go 1.18 uses a generalized form of interfaces for this purpose. An interface enumerates a set of methods and types, and the type set described by such an interface consists of all the types that implement those methods and that are included in the enumerated types.

For instance, the interface

type Constraint interface {
    ~[]byte | ~string
    Hash() uint64
}

consists of all the (possibly named) []byte and string types that also implement the Hash method.

Given these descriptions of type sets, which in turn describe the properties of type parameters, it is possible to write down the rules that govern operations on operands of type parameter type.

For instance, the rules for index expressions state that (among other things) for an operand of type parameter type P:

  • The index expression a[x] must be valid for values of all types in P's type set.
  • The element types of all types in P's type set must be identical. In this context, the element type of a string type is byte.

These rules enable the following code (playground):

func at[P Constraint](x P, i int) byte {
    return x[i]
}

The indexing operation x[i] is permitted because the type of x is P, and P's type constraint (type set) contains []byte and string types for which indexing with i is valid.

Motivation

The rules for index expressions have specific clauses for when the type of an operand is a type parameter. Similarly, the rules for unary and binary operations also have such clauses. For instance, in the section on Arithmetic operators, the spec says:

If the operand type is a type parameter, the operator must apply to each type in that type set.

This rule allows for the operator + to add two operands of (identical) type parameter type, as long as + is valid for any type in the respective type parameter's constraint.

This type set-based and individualized approach permits the most flexible application of operations on operands of type parameter type, and is in line with what the original generic proposal (Type Parameters Proposal) intended: an operation involving operands of generic type (i.e., whose type is a type parameter) should be valid if it is valid for any type in the respective type set(s).

Because of time constraints and the subtlety involved in devising appropriate rules for each language feature that may interact with generic operands, this approach was not chosen for many language features. For instance, for Send statements, the spec requires that

The channel expression's core type must be a channel, the channel direction must permit send operations, and the type of the value to be sent must be assignable to the channel's element type.

This rule relies on the notion of a core type. Core types offer a short cut for the spec: if a type is not a type parameter, its core type is just its underlying type. But for a type parameter, a core type exists if and only if all types in the type parameter's type set (that is, the type set described by the type parameter's constraint interface) have the same underlying type; that single type is the core type of the type parameter. For instance, interface{ ~[]int } has a core type ([]int), but the Constraint interface from above does not. (In reality, the definition of core types is subtly more complicated, see below.)

The notion of a core type is a generalization of the notion of an underlying type. Because of that, pre-generics most spec rules that relied on underlying types now rely on core types, with a few important exceptions like the ones mentioned earlier. If the rules for index expressions were relying on core types, the at example above would not be valid code. Because the rules for Slice expressions do rely on core types, slicing an operand of type P constrained by Constraint is not permitted, even though it could be valid and might be useful.

When it comes to channel operations and certain built-in calls (append, copy) the simplistic definition of core types is insufficient. The actual rules have adjustments that allow for differing channel directions and type sets containing both []byte and string types. These adjustments make the definition of core types rather complicated, and are only present to work around unacceptable restrictions that would be imposed by the language otherwise.

Proposal

Summary: Remove the notion of core types from the language specification in favor of dedicated prose in each section that previously relied on core types.

For example, rather than using a rule based on core types in the section on slice expressions, the proposal is to use appropriate prose similar to what is used for index expressions, which does not rely on core types (and which is more flexible as a result).

The proposed approach is as follows:

The proposed changes to the spec can be reviewed in CL 621919 and are considered part of this proposal. (The exact prose is up for discussion and expected to be fine-tuned.)

[!NOTE] CL 621919 still contains references to core types in the section on type inference and unification. We plan to rewrite those sections as needed and remove those references as well. Since these parts of the spec are highly technical and detailed, we are less concerned about their exact prose: these sections are unlikely to be consulted by non-experts in the first place. To get started, we may simply replicate the notion of core types "in line" until we understand better what changes to type inference preserve backward-compatibility.

Discussion

Removing the notion of core types from the language specification has multiple benefits:

The changes are designed to be 100% backward-compatible.

Implementation

The relevant implementation changes primarily affect the compiler's type checker. The proposed changes to for-range statements currently include an implementation restriction for the range-over-func case; loosening or removing that restriction may require compiler back-end changes for an efficient implementation (currently not planned).

The relevant type checker changes have been prototyped and can be found in a stack of CLs ending in CL 618376.

Impact

Because the changes are designed to be 100% backward-compatible, implementing this proposal is expected to be unnoticeable for existing Go code.

Some code that currently is not permitted will become valid with this change. For instance, slice expressions, composite literals, and for-range statements will accept generic operands and types with less restricted type sets.

Analysis tools may need to be updated. We believe that this can be done incrementally, on a (language) feature-by-feature basis.

Tentative time line

This time line assumes that this proposal is uncontroversial and accepted fairly quickly.

Future directions

The use of core types in the spec implied a somewhat rigid framework within which rules for language features were considered.

For instance, proposal #48522 is about permitting access to a struct field that is present in all structs in a type set (example). This is currently not permitted and the proposal was closed in favor of #63940, the precursor issue for this proposal.

If we accept this proposal, we will follow up with a proposal for more flexible field access, along the lines of #48522.

Type inference and type unification also rely on core types. Removing this dependency may enable type inference in some cases (such as #69153) where it currently fails.

gabyhelp commented 2 days ago

Related Issues and Documentation

(Emoji vote if this was helpful or unhelpful; more detailed feedback welcome in this discussion.)

gopherbot commented 2 days ago

Change https://go.dev/cl/621919 mentions this issue: spec: remove notion of core types

gopherbot commented 2 days ago

Change https://go.dev/cl/604116 mentions this issue: go/types, types2: remove need for core type in append calls

gopherbot commented 2 days ago

Change https://go.dev/cl/602696 mentions this issue: go/types, types2: remove need for core type in send statements

gopherbot commented 2 days ago

Change https://go.dev/cl/613636 mentions this issue: go/types, types2: remove need for core type in for-range statements

gopherbot commented 2 days ago

Change https://go.dev/cl/602695 mentions this issue: go/types, types2: remove need for core type in receive operations

gopherbot commented 2 days ago

Change https://go.dev/cl/604278 mentions this issue: go/types, types2: remove need for coreString function

gopherbot commented 2 days ago

Change https://go.dev/cl/604277 mentions this issue: go/types, types2: remove need for core type in copy calls

gopherbot commented 2 days ago

Change https://go.dev/cl/603403 mentions this issue: go/types, types2: remove need for core type in function calls

gopherbot commented 2 days ago

Change https://go.dev/cl/603516 mentions this issue: go/types, types2: remove need for core type in make calls

gopherbot commented 2 days ago

Change https://go.dev/cl/618376 mentions this issue: go/types, types2: remove need for core type in composite literals

gopherbot commented 2 days ago

Change https://go.dev/cl/603402 mentions this issue: go/types, types2: remove need for core type in slice expressions

gopherbot commented 2 days ago

Change https://go.dev/cl/604279 mentions this issue: go/types, types2: remove need for core type in unsafe.Slice/SliceData calls

willfaught commented 2 days ago

Sounds good in concept. Can you give us a rough idea of one such additional paragraph added in place of reference to core types?

Merovius commented 2 days ago

@willfaught The proposal text links to CL 621919 as part of the proposal, which includes those paragraphs.

Merovius commented 2 days ago

@griesemer I'm not sure whether it is more appropriate to comment on the CL, or here. I'll do it here, for now, because it seems the more broadly accessible forum. I have a concern about the prose for range expressions:

In this case, the range expression must be valid for each underlying type U in the type parameter's type set. For each such U, the corresponding range expression produces 0, 1, or 2 iteration values, depending on the type. The actual number of iteration values produced by the range clause is the largest number (0, 1, or 2) of iteration values (counted from left to right) that have identical types across the range expressions over all the types U.

AIUI this would make this (currently invalid) code valid:

func main() {
    s := `♥`
    fmt.Println(F(s))
    fmt.Println(F([]byte(s)))
}

type C interface { ~string | ~[]byte }

func F[S C](s S) []int {
    var out []int
    for i := range s {
        out = append(out, i)
    }
    return out
}

It would print [0] and [0, 1, 2]. I'm concerned about this difference in meaning. That is, the proposed change is concerned with the type of the produced values, but even if the type is the same, the semantics might be quite different.

Of course, this isn't just an issue for []byte and string - the same would apply if we added chan int and map[int]T to the type sets. But I'd argue that it is relatively natural to try to write generic functions that generalize over ~[]byte and ~string specifically. In fact, it's probably the main reason why we'd want to allow range over type parameter typed values in the first place. But any such function might have a relatively silent footgun that is only exposed if you test it specifically with non-ASCII strings.

Personally, I think I might prefer not expanding allowed ranges at all, instead changing the restrictions to "all types in the type set must have the same underlying type". At least for now. I'll note that this would still allow to write those generic functions generalizing over ~[]byte and ~string - you'd just have to write them using a 3-clause for-loop and len/indexing, which is unambiguous.

zigo101 commented 2 days ago

I don't think this is a problem in reality, even if don't consider the example is some artificial. The code should behave differently on different kinds of type arguments. It would be strange if it is not.

Let's just remove the restrictions, otherwise, the use scenarios of Go custom generics are too few.

zephyrtronium commented 2 days ago

Somewhat of a nitpick. The CL contains statements like, "Given an expression f of function type F, ... if the type of f is a type parameter, ...." My impression is that if F is a type parameter, then it is an interface type, so it cannot be a function type. See also #57310.

Would it be reasonable to instead use phrasing like, "Given an expression f, if every type in the type set of f has underlying type F which is a function type, ..."? Would this also be an opportunity to resolve #57310?

griesemer commented 1 day ago

@Merovius It would certainly be fine to keep for-range as is, but I believe requiring that "all types in the type set must have the same underlying type" would not be backward-compatible: because of how core types are defined, this code is currently valid:

func _[P chan int | <-chan int](ch P) {
    for x := range ch {
        fmt.Println(x)
    }
}

but chan int and <-chan int don't have the same underlying type.

Perhaps we could just say that all types must be of the same kind (so only channels, only slices, only funcs, and so forth). Or we could perhaps disallow mixing byte slices and strings because they behave rather differently with range.

Please feel free to comment on the respective CL, that way we can keep the comments and adjustments local. Thanks.

griesemer commented 1 day ago

@zephyrtronium Thanks for the precise observation. I agree that the prose is not quite consistent. To be fine-tuned. Not sure about tying this to resolving https://github.com/golang/go/issues/57310.