golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
122.68k stars 17.49k forks source link

proposal: Go 2: permit types to say they may only be created by containing package #43123

Closed timruffles closed 3 years ago

timruffles commented 3 years ago

It would be useful to permit types to say they may only be initialized by their defining package, for two reasons:

Currently, it's easy for create values of either of these that violate their invariants, especially via zero-value initialization, and that allows bugs to creep in. Existing ways to attempt to prevent invalid values being created are limited and unidiomatic.

This would be a backwards-compatible change - only programs that opted a type into this new behaviour would be affected.

This proposal calls this feature 'package created types', or PCTs.

Answers to language-changes.md questions. * Would you consider yourself a novice, intermediate, or experienced Go programmer? * Experienced - 18 months writing Go at Github, 12 months at Plaid * What other languages do you have experience with? * C, JS, TypeScript, Python, Ruby, Erlang, Clojure, Objective C * Would this change make Go easier or harder to learn, and why? * Slightly harder, it's one more feature. However the compiler error messages would be able to propose a clear fix: use an exported function or value of the package that exported this type to initialize it. Unmarshalling error messages at runtime would be similar * Has this idea, or one like it, been proposed before?/If so, how does this proposal differ? - https://github.com/golang/go/issues/28939 - contrast: that proposal requires to changing the behaviour of existing initializations. This proposal would not change the behaviour of existing code. - https://github.com/golang/go/issues/32076 - contrast: that proposal attempts to make some function calls implicit, this proposal doesn't * Who does this proposal help, and why? * Any codebase that wishes to ensure invariants are fulfilled on the type level, which provides useful guarantees when reasoning through how a piece of the code behaves * This is especially useful for medium to large teams, where ensuring, for instance, every time a value of a certain type is initialized a constructor is used, becomes extremely hard * Is this change backward compatible? * Yes - no existing code would be affected. * What is the cost of this proposal? (Every language change has a cost). * How many tools (such as vet, gopls, gofmt, goimports, etc.) would be affected? * I believe this would only be a change to the compiler. gofmt would depend on how indicating types was implemented - an additional keyword for a type would require an extra, simple, formatting rule * What is the compile time cost? * This would be an additional check for each initialisation at compile time, and map or slice type at type-checking time (ensuring they don't have values that are PCTs, or reference a PCT) * What is the run time cost? * Standard library unmarshalling code needs an additional check for types that don't implement their own `Unmarshal` function (to error on package created types that lack one) * Is the goal of this change a performance improvement? * No * Does this affect error handling? * No * Is this about generics? * No Answered by the proposal: * What is the proposed change? * Please describe as precisely as possible the change to the language. * What would change in the language spec? * Please also describe the change informally, as in a class teaching Go. * Show example code before and after the change. * Can you describe a possible implementation? * Orthogonality: how does this change interact or overlap with existing features?

Motivation

Let’s consider an example of a data type that represents a "natural interval" defined as an interval with the invariants From > 0 and To > From.

package interval

type Natural struct {
  From int
  To int
}

Currently, a function accepting an interval.Natural can’t guarantee it is valid according to the invariants above. Invalid or zero interval values can be created outside the package and passed in a number of ways:

package otherpkg

import "interval"

func withInterval(intval interval.Natural) {
  /* some code that wants to rely on interval.Natural's invariants being respected by values passed as intval  */
}

// directly initialized with invalid values
withInterval(interval.Natural{To: -7})

// the following examples are all ways to get a zero value of the type

var s interval.Natural
withInterval(s) // zero value

// referencing an uninitialized - zero - struct field
withInterval(struct{s interval.Natural}{}.s) 

nat := new(interval.Natural)
withInterval(*nat) // access the zero-value pointed at by nat

var ms map[string]interval.Natural
withInterval(ms["anything"]) // zero-value

intervals := make([]interval.Natural, 0, 10)
withInterval(intervals[0]) // zero-value

ch := make(chan interval.T)
close(ch)
withInterval(<-ch) // receive zero value

The author of interval could define a validating constructor, which would only return valid natural ranges. However, there is no way to require it's always used, and thus it doesn't provide any guarantees.

package interval

func New(from, to int) (Natural, error) { /* validation rules */ }

If it was possible to know that every interval.Natural value had been created by the interval package, the withInterval method would be able to rely on the interval package having been able to enforce its invariants at creation time (if the underlying type is immutable, e.g an integer, the guarantees are stronger still).

Enums

Enums are generally new types over an underling primitive, e.g an int or string:

package interval

type Kind int

const (
   OrdinalInterval = iota
   DayInterval
   // ... etc
)

Again, this doesn't provide safety if other packages embed, or explicitly create, enum values:

package otherpkg

import "interval"

var invalidOne = interval.Kind(47)

var invalidTwo = struct{
    kind interval.Kind
}{
    kind: 47,
}.kind

Therefore, code working with enums can't be sure they are valid.

Desired behaviour

In this example, interval.Natural and interval.Kind are types whose values must be initialized by its package:

package interval

package type Interval struct {}
package type Kind int

the examples below assume the following code:

package interval

// since Interval is a PCT, it should export a construct if it wishes other packages
// to be able to create Interval values
func New(from, to int) (Interval, error) { /* ... */ }

// Kind is an enum, so exports a range of values. As it's a PCT, these are the only
// values of Kind that any other package can use, or expect.
const (
   OrdinalInterval = iota
   DayInterval
   // ... etc
)

Initialization #

PCTs cannot be initialized outside their defining package, which restricts explicit and implicit initialization:

package otherpkg

import "interval"

var one interval.Natural // ERROR: 'interval.Natural' must be created by interval package
two := interval.Natural{} // ERROR: 'interval.Natural' must be created by interval package
three := interval.Natural{5, 10} // ERROR: 'interval.Natural' must be created by interval package

four  := new(interval.Natural) // ERROR: 'interval.Natual' must be created by interval package

five := interval.Kind(47) // ERROR: 'interval.Kind' must be created by interval package

new can only be passed a package-created-type in its defining package.

The examples above in which attempts were made to initialize non-zero values of Interval can be achieved via the exported constructor. The attempts to initialize zero-values of PCTs outside their defining package are impossible, an explicit goal of this proposal:

package otherpkg

import "interval"

// one, two and four are now impossible, it's explicitly a goal that we can't create zero interval.Natural values 
// outside of the interval package

three, err := interval.New(5,10)

s2, err := interval.New(10, 20)
if err != nil {
    // handle
}

// with no exported function returning an interval.Kind, the otherpkg must use one of interval's exported values
five := interval.DayInterval

Structs #

Types that have a PCT as a field must explicitly initialized those fields. Thus any struct that contains a package created type will become liable to similar restrictions on its initialization:

package otherpkg
import "interval"

type someType struct {
  First interval.Natural
  Second interval.Natural
}

var s someType{} // ERROR: someType's First field is a package-created type, interval.Natural, which must be initialized by the interval package

Maps, slices and channels #

Package created types must use pointers to be valid map values, or slice/channel element, i.e []*interval.Natural is okay, but []interval.Natural is not. This ensures the semantics of maps, slices and channels are unaffected by this proposal, while achieving its goal of restricting value initialisation to the defining package.

package otherpkg
import "interval"

// see below for details on map/slice/channels
var intervalMap map[string]interval.Natural // ERROR: interval.Natural is a package-created type, and cannot be a map value (use map[string]*interval.Natural)

intervalSlice := make([]interval.Natural, 0, 10) // ERROR: interval.Natural is a package-created type, and cannot be a slice element (use []*interval.Natural)

ch := make(chan interval.T) // ERROR interval.Natural is a package-created type, and cannot be a channel element (use chan(*interval.Natural))

To fix the errors the author would need to do use pointers to the PCT as the element type:

package otherpkg
import "interval"

var intervalMap map[string]*interval.Natural // pointer to type means zero-value is nil, not a zero interval.Natural
var intervalSlice []*interval.Natural
var intervalCh chan *interval.Natural

Operators #

Where a package-created-type's underlying type is a type on which operators are valid (e.g, an int, rather than a struct), it will be a compiler error to attempt an operation with a PCT on the left-hand side (which determines the type of the expression evaluation):

package otherpkg
import "interval"

invalidKind := interval.DayInterval + interval.HourInterval // ERROR interval.Kind is a package-created type, and cannot be created outside of the interval package
unitless := 42 + interval.DayInterval // ok, as it evaluates to an int not an interval.Kind 

Mechanics

Indicating a type is package created

This proposal has no opinion on the best syntax to indicate a given type in a package should have 'package created type' behaviour. A indicative proposal would be prefixing indicating a type was package create by preceeding it with the package keyword:

package interval

// indicates Interval is package created, with the behaviour above
package type Interval struct {/* ... */}
package type Kind int

Unmarshalling

Unmarshalling packages could be made package created types (PCT) aware. They’d use reflection to check whether a given component of a unmarshalling target was a PCT, and return an error if that type did not export an appropriate unmarshalling function.

Reflect

Reflect would be extended with:

func (v Value) IsPackageType() bool

Reflect's New(typ Type) function, and others, would be updated to panic on invalid creation of zero values (similar to panicing on accessing unexported struct fields), or of zero values of types that reference PCTs (this would need to be a recursive check, e.g A has a field of B that has a field of C that's a PCT, means you can't New() a A).

Compiler implementation

I don't know enough (anything) about Go's compiler to have any opinions on how this could be implemented. I assume it would involve updating the code that compiled/type-checked initializations.

Issues with alternatives #

Unexported types

Unexported types cannot be stored in structures, maps, or slices in other packages:

type wantsToStoreInterval {
  interval interval.natural // error: Unexported type 'natural' usage
}

var _ map[string]interval.natural // error: Unexported type 'natural' usage

For unexported structs with exported methods there is a workaround - an appropriate interface can be defined on the consumer side and used to store the values. However, there is nothing to stop other packages defining methods with the same names, and again breaking the invariants.

Further, interface wrapper and accessors are not idiomatic. In Go, data and enums are usually concrete and access is not mediated via getters, unlike Java etc.

// defining package must export methods to allow unexported `natural` to be stored within interface values in other packages
// which define GetFrom and GetTo
func (n natural) GetFrom() int {
    return n.From
}
func (n natural) GetTo() int {
    return n.From
}

If the defining package of the unexported type doesn't define methods, the consuming package would have to wrap the values in an interface{}:

type wantsToStoreInterval {
  interval interface{}
}

func (w wantsToStoreInterval) doSomething() {
   // note: will panic if any other type of value is stored in w.interval, or it's not initialized
  fmt.Println(w.interval.(interval.natural).From)
}

This is far from idiomatic, and imposes runtime cost, and risk of panics.

Interfaces

Interfaces with unexported methods prevent other packages from defining their own implementation. This can get closer to guaranteeing a value was created by its package.

Again, however, this has meant data and enum members have to be wrapped in an interface value. That isn’t idiomatic in Go - data is typically ‘raw’, without indirection. It also incorrectly suggests to readers it is an abstract data type, which is a more common use for interface-wrapped data in Go. And we've now had to define a nearly identical struct and interface.

package interval

type NaturalInterval interface {
    GetFrom() int
    GetTo() int

    private() // prevents otherpkg from fully implementing Interval
}

func NewNatural(from, to int) (NaturalInterval, error) {
    // ...
}

Discipline: always use package's constructor

Practically today, many codebases would define NewNatural(from, to int) (interval.Natural, error) and trust that NewNatural is always used, and that zero or partially initialized interval.Natural values are never created.

However, this isn't a solution, it's accepting the status quo. It's equivalent to arguing that Go's memory safety isn't valuable as you can "just always remember to check your bounds". In a large codebase, with many engineers, the probability of such rules being violated somewhere approaches 1.

Linting

Linting can implement some of the above (though not changes to stdlib marshaling/unmarshaling), but it'd be faster and more widely supported if it was a compiler feature.

Relation to other proposals

ianlancetaylor commented 3 years ago

For language change proposals, please fill out the template at https://go.googlesource.com/proposal/+/refs/heads/master/go2-language-changes.md .

When you are done, please reply to the issue with @gopherbot please remove label WaitingForInfo.

Thanks!

timruffles commented 3 years ago

@gopherbot please remove label WaitingForInfo

ianlancetaylor commented 3 years ago

@timruffles You haven't filled out the template.

timruffles commented 3 years ago

@ianlancetaylor yes I have, I should have said explicitly. It's near the top of the proposal:

Screenshot 2020-12-22 at 18 28 50
ianlancetaylor commented 3 years ago

Ah, sorry, I did miss that.

deanveloper commented 3 years ago

I really like this proposal and think that it solves the "use my constructor" problem in a very good manner. However, here is some criticism:

Zero values

Zero values for PCTs should always be valid. Forcing the zero value to be valid for a PCT resolves a lot of the "strange" parts about this proposal (ie map/slice/new/etc not working). This also helps keep maintainers consistent with the idea that every type's zero value should be useful.

Syntax

The example given introduces a new keyword and (in my opinion) doesn't work well.

type Interval struct {/* ... */}

packageCreated Interval;

The reason that I don't think it works well is that anything to do with a type should be defined alongside the type itself, for instance something like type Interval package struct { ... } (also, using package avoids creating a new keyword). The reason for this is that, at scale as more types are added to a file, it may get increasingly difficult to tell what types are PCTs and what types aren't.

Unmarshalling

I think you were a bit strict with your sentiment that PCTs should never be unmarshalled. PCTs which specify their own unmarshallers (ie by implementing json.Unmarshaller) should be unmarshalled.

Operators

Probably good to specify that many operators are invalid on PCTs (ie +, *, or any other operators which result in values of the same type)

griesemer commented 3 years ago

Here's a restatement of the problem you're trying to address: How can we define a user-defined type and ensure that any value of such a type satisfies certain user-defined invariants (outside the package defining the type)?

As you have already shown, existing Go mechanisms do not fully address this problem. @deanveloper pointed out additional issues.

Here's another one: Once I have a variable x holding such a type for which I want to ensure invariants, can one make a copy of it? That is, can I say x2 := x? Generally, we can't allow this because an invariant of the type of T might be that there's an exact count of x's (e.g., exactly one). This is perhaps the most annoying aspect of this problem; C++ has copy constructors to address this (and it gets complicated really fast).

Given the restrictions you already mention for channel and map types, why aren't there any restrictions for the use of "PCT"s in other composite types? What about the zero value of PCTs?

timruffles commented 3 years ago

@deanveloper thanks for taking the time to read the proposal!

Operators - thanks for pointing out they should be clarified.

I've taken your syntax suggesting on board, but reversed the order to package type. I imagine that shouldn't be too hard to parse.

Unmarshaling - the proposal doesn't suggest PCTs are unmarshalable, it suggests that unmarshaling code can be made PCT aware by using:

reflection to check whether a given component of a unmarshalling target was a PCT, and return an error if that type did not export an appropriate unmarshalling function.

Zero-values: your suggestion seems impossible to me. This proposal is about language level guarantees. The language can't know if your application invariants can be supported by Go's zero-value semantics.

timruffles commented 3 years ago

@griesemer thanks for reviewing!

Here's a restatement of the problem: How can we define a user-defined type and ensure that any value of such a type satisfies certain user-defined invariants (outside the package defining the type)?

Agreed 👍

Here's another one: Once I have a variable x holding such a type for which I want to ensure invariants, can one make a copy of it?

I think it's fine to allow copies. You're right that explicit at-most-N-values-of-type constraints can't be enforced, but I think keeping the proposal simple is preferable.

Given the restrictions you already mention for channel and map types, why aren't there any restrictions for the use of "PCT"s in other composite types?

I had examples of struct and channel behaviour, but I can see my examples were ginormous and hard to parse. I've split them up to be clearer - channels, structs.

What about the zero value of PCTs?

Not sure I understand you here, but it's an explicit goal of this proposal to make it impossible to create zero-values of a PCT outside of its defining package. That's behind the restrictions on initialization, and on maps, slices and channel element types; structs; and operators (thanks @deanveloper ). Without these restriction I can't see any way to achieve the desired benefits.

deanveloper commented 3 years ago

I've taken your syntax suggesting on board, but reversed the order to package type. I imagine that shouldn't be too hard to parse.

It's not about parsing, it's about readability and consistency. The first word of a top-level declaration is always the thing being declared. package being the first word means we are defining a package, func means defining a func, etc. So when defining a type, it needs to start with type. type MyType package UnderlyingType was the best that I could come up with while keeping type as the first word.

Zero-values: your suggestion seems impossible to me. This proposal is about language level guarantees. The language can't know if your application invariants can be supported by Go's zero-value semantics.

What I am suggesting is that the language forces the zero-value of a type to always be valid. The package maintainer should make sure that the zero value of the type is always useful. The zero value is predictable, so the maintainer knows what needs to be done in order to assure that it still always works.

Forcing zero-values of PCTs to always be valid makes many parts of this proposal more appealing. A type that can't be used in maps, lists, struct members, etc. isn't appealing to me, but all of these issues are solved if the zero value is forced to be valid.

ianlancetaylor commented 3 years ago

As discussed above, this proposal does not support zero values, and does not provide any special mechanism for copying a value (or avoiding copying a value). Without those features this functionality is not a good fit for the language and would be quite difficult to use in practice. Therefore, this is a likely decline. Leaving open for four weeks for final comments.

timruffles commented 3 years ago

difficult to use in practice

All that's required is to remember:

I interact with multiple codebases where an attempt is made to follow those rules already, for a subset of types.

this proposal does not support zero values

This is symmetrical with the statement, 'this proposal enforces use of constructors outside the package'. Zero values are fine within the package.

timruffles commented 3 years ago

special mechanism for copying a value (or avoiding copying a value)

I'll have a think about this. But to me if copying is a problem, mutability would surely be a more pressing concern (i.e there's nothing to stop someone zeroing out all the fields of any type, or making them invalid)? Given that this proposal already accepts that mutability is a fact of Go, I think it's a reasonable omission. I can't see that restrictions on copying are something anyone is crying out for; indeed, it only seems to have been mentioned in feedback to this proposal.

deanveloper commented 3 years ago

This is symmetrical with the statement, 'this proposal enforces use of constructors outside the package'. Zero values are fine within the package.

Correct - but the underlying issue of both of these statements is "i can't use the values in slices, map values, structs, or channels"

deanveloper commented 3 years ago

Another issue relating to zero values:

package a

type Foo package struct{}

// =====

package b

import "a"
import "io"

func bar(arg io.Reader) {
    if foo, ok := s.(a.Foo); ok {

    }
    // what is `foo` equal to?
}
timruffles commented 3 years ago

@deanveloper give the semantics of type assertions, it's fine to have it as a zero-value of the the PCT. Sure, it's a way to create zero values of the PCT outside the package but I don't think it conflicts with its goals in practice. Generally you don't use the value of a type assertion when it is not successful - I've not seen it done once.

Again, you certainly can use PCTs in maps etc, you simply use a pointer to them rather than directly referencing them.

timruffles commented 3 years ago

@ianlancetaylor just in case there is an ambiguity, this proposal does not require special behaviour for zero values of PCTs. So it certainly 'supports' zero values, and has carefully considered how to avoid changing their semantics. It is simply a way to opt in to compiler checking to ensure you don't create zero-values outside of the defining package (besides type assertions which I've addressed above).

Another way of looking at it is that you can already write a program that acts as if you had marked a subset of its types as package created in Go right now. This proposal just makes it easier by having the compiler check you're doing so.

akavel commented 3 years ago

One practical concern I'm having with regards to this, is that I suspect most codebases would immediately start using this feature by default for most types, as it's easier and may feel "safer" to "seal" a type by default in such a way than to try and think about zero-values. Which I believe would result in a major de facto change in semantics/look-and-feel of "typical" Go code. I'm not sure this would be necessarily a bad change, more that currently it's hard for me to imagine how such a language would handle on a daily basis, assuming most types were "sealed" by default.

ianlancetaylor commented 3 years ago

@timruffles The thing is, as @deanveloper has been saying, zero values are really deeply baked into Go. If you can't create a zero value of a type, there are very basic things you can't do, like append to a slice of that type. So what advantage are we getting this proposal? It's already possible to have an unexported type, or a struct with unexported fields. This proposal is letting us name that type, but then we can't do basic operations with it. How is the gain of being able to name the type worth the pain of not being able to actually use it in a normal way?

timruffles commented 3 years ago

@ianlancetaylor the benefit is the same as Go's compiler rejecting "2" + 3 or int32(3) + float64(3.7): it makes it easier to write correct programs.

The proposal clearly explains how the types can be used with slices etc. Namely, you use a pointer to the type to ensure no zero-values are created:

intvs := []*interval.Natural{}
intvs = append(intvs, interval.NewNatural(1,2))

Again, this is something you could do with the language currently, this proposal just enforces the pattern (much like you can simply attempt to avoid mixed type arithmetic in JS: it still ends up happening in large programs).

It's already possible to have an unexported type, or a struct with unexported fields.

The proposal already covers why I don't think they're acceptable alternatives.

ianlancetaylor commented 3 years ago

The fact that there is a workaround to avoid generating zero values doesn't change the fact that ordinary Go usage requires that it be possible to create zero values. In effect types that use this approach can't be used in the same way as ordinary Go types. That is a heavy cost. The benefit we get here does not outweigh that cost.

griesemer commented 3 years ago

No change in consensus.