golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
124.33k stars 17.7k forks source link

proposal: spec: add integer types with explicit overflow behavior, and remove unchecked operations on built-in integers #30209

Open bcmills opened 5 years ago

bcmills commented 5 years ago

This proposal is intended to accomplish the same goals as #19624, but in a way that complies with the migration strategy outlined in https://github.com/golang/proposal/blob/master/design/28221-go2-transitions.md#language-redefinitions.

See #19624 for background.

Summary This proposal would:

  1. Add three new packages: checked, wrapped, and unbounded, containing integer types with distinguished semantics to be defined in the language spec.
  2. Add the , ok form (as described in #19624) to explicitly check for overflow in expressions of any bounded numeric type.
  3. Remove the ability to perform unchecked arithmetic operations on the predeclared integer types.

New packages

The three new packages each provide a set of integer types:

// Package checked defines integer types whose arithmetic operations and conversions
// panic on overflow.
// Bitwise operations and logical shifts will not trigger a panic.
package checked

type Uint8 <builtin>
type Uint16 <builtin>
[…]

type Int8 <builtin>
[…]

type Byte = Uint8
type Rune <builtin>

type Int <builtin>
type Uint <builtin>
type Uintptr <builtin>
// Package wrapped defines integer types whose arithmetic operations and conversions wrap
// using two's-complement arithmetic.
package wrapped

type Uint8 <builtin>
[…]

type Int8 <builtin>
[…]

type Byte = Uint8
type Rune <builtin>

type Int <builtin>
type Uint <builtin>
type Uintptr <builtin>
// Package unbounded defines an arbitrary-precision integer type with unbounded range.
// Unbounded types do not support bitwise XOR, complement, or clear operations.
package unbounded

type Int <builtin>

Defined types

Defined types that have checked, wrapped, or unbounded types as their underlying type have the same operations and behavior as the underlying type.

Defined types that have builtin integer types as their underlying type have the same behavior as the underlying type, except that they are not mutually-assignable with checked or wrapped types.

Checked assignment

A checked assignment uses the form x, ok = <expression> or x, ok := <expression>, where <expression> can comprise any number of arithmetic, bitwise, logical, and/or conversion operations yielding a checked, wrapped, unbounded, user-defined, or builtin integer type. The operations in a checked assignment do not panic even if they involve checked or builtin integer types. The ok result, which may be assigned to any boolean type, indicates whether any arithmetic operation or conversion within the expression overflowed. The x result has the type of <expression> and a value computed using two's-complement wrapping.

Bitwise operations and logical shifts do not overflow, and therefore do not set the ok result. (Recall that signed shifts are defined to be arithmetic, while unsigned shifts are defined to be logical.)

Unchecked arithmetic operations on builtin integer types are a compile-time error.

var x int32 = 1<<31 - 1
y := x + 1  // compile-time error: `x + 1` is not checked for overflow
var x int32 = 1<<31 - 1
y, ok := x + 1  // y = -2147483648; ok = false
var x checked.Int32 = 1<<31 - 1
y := x + 1  // run-time panic: `x + 1` overflows
var x checked.Int32 = 1<<31 - 1
y, ok := x + 1  // y = -2147483648; ok = false
var x wrapped.Int32 = 1<<31 - 1
y := x + 1  // y = -2147483648
var x wrapped.Int32 = 1<<31 - 1
y, ok := x + 1  // y = -2147483648; ok = false
var x int32 = 1<<30
y, ok := x<<1  // y = -2147483648; ok = false
// signed shift is arithmetic, and shifting into the sign bit overflows
var x uint32 = 1<<31
y, ok := x<<1  // y = 0; ok = true
// unsigned shift is logical, and by definition cannot overflow

Conversion

Any integer type can be explicitly converted to a checked, wrapped, or unbounded integer type. A checked, wrapped, or unbounded integer type can be converted to a builtin type only if either the conversion is checked or the destination type can represent all possible values of the source type.

var x checked.Int32 = 1<<31 - 1
var y = int32(x)  // y = 2147483647
var x checked.Uint32 = 1<<31 - 1
var y = int32(x)  // compile-time error: conversion from checked.Uint32 may overflow int32
var x checked.Uint32 = 1<<31
y, _ = int32(x)  // y = -2147483648

An unchecked conversion to a checked type panics if the value cannot be represented in the destination type.

var x int64 = 1<<31
y := checked.Int32(x)  // run-time panic: `x` overflows checked.Int32
var x int64 = 1<<31
y, ok := checked.Int32(x)  // y = -2147483648; ok = false

A conversion to a wrapped type wraps if the value cannot be represented in the destination type, even if the operand is of a larger checked type.

var x checked.Int64 = 1<<31
y := wrapped.Int32(x)  // y = -2147483648

A conversion to unbounded.Int from any integer type always succeeds. The conversion is applied after the operand is fully evaluated.

var x int64 = 1<<31
y := unbounded.Int(x)  // y = 2147483648
var x checked.Int32 = 1<<31-1
y := unbounded.Int(x+1)  // run-time panic: `x+1` overflows checked.Int32
var x wrapped.Int32 = 1<<31-1
y := unbounded.Int(x+1)  // y = -2147483648
var x checked.Int32 = 1<<31-1
y, ok := unbounded.Int(x+1)  // y = -2147483648; ok = false

Assignability

Each sized type in the checked and wrapped package is mutually assignable with the corresponding builtin sized type, but not with the type in the opposing package, nor with defined types of any underlying integer type.

var x wrapped.Int32 = 1<<31 - 1
var y int32
[…]
y = x + 1  // y = -2147483648
var x checked.Int32 = 1<<31-1
var y wrapped.Int32
[…]
y = x + 1  // compile-time error: checked.Int32 is not assignable to wrapped.Int32
type MyInt32 wrapped.Int32
var x wrapped.Int32 = 1<<31 - 1
var y MyInt32
[…]
y = x + 1  // compile-time error: wrapped.Int32 is not assignable to MyInt32
type MyInt32 int32
var x wrapped.Int32 = 1<<31 - 1
var y MyInt32
[…]
y = x + 1  // compile-time error: wrapped.Int32 is not assignable to MyInt32
var f func() int32
var g func(int32)
var x checked.Int32
x = f()
g(x + 1)  // ok: x+1 is checked, then passed to g as an int32

This allows functions to perform operations on checked or wrapped types, but to expose and use the corresponding builtin types at API boundaries (with less syntactic overhead for all involved).

Arithmetic operators

The type of an arithmetic expression depends on its operands. If both operands are of checked, wrapped, unbounded, or defined types, then they must have exactly the same type. If one operand is a checked, wrapped, or unbounded integer or a defined type with one of those types as its underlying type, and the other is either a builtin integer or untyped constant assignable to the first, then the result of the operation is the checked, wrapped, unbounded, or defined type.

If both operands are of a builtin integer type or a defined type with a builtin integer as its underlying type, the expression must be a checked assignment, bitwise operator, or logical shift, and its result is the same type as the operands.

Constants

An untyped integer constant can be assigned to any checked, wrapped, or unbounded integer type that can represent its value.

Unfortunately, in order to comply with https://github.com/golang/proposal/blob/master/design/28221-go2-transitions.md#language-changes, the inferred type of a variable initialized from an untyped constant must remain the built-in int.

bcmills commented 5 years ago

Note that part (1) would be trivial if we had operator overloading, and part (3) could really just be a vet check.

The assignability rule could perhaps be eliminated from the proposal without causing too much damage. That would make the code much more verbose, but not fundamentally alter its structure.

Similarly, the conversion check might be implementable using generic constructor functions, and can be implemented (inefficiently and without type-safety) using reflection.

IMO, the interesting part of this proposal is part (2), the aggregation of overflow checks to the whole expression. That would otherwise require constructing an overflow-tracking object, using its methods in place of the existing arithmetic expressions, and remembering to check its result at the end — a lot of syntax, and much more error-prone than a regular old , ok.

ericlagergren commented 5 years ago

How come the unbounded package only has a signed integer? An unsigned integer would be beneficial for some things I have in mind.

bcmills commented 5 years ago

@ericlagergren An unbounded signed integer type can represent all of the unsigned values too, and is closed under subtraction.

In contrast, an unsigned type carries the risk of overflow: what would the behavior be for unbounded.Uint(0) - unbounded.Uint(1)? The whole point of the unbounded package is that its behavior on overflow is “not possible by definition”.

josharian commented 5 years ago

I’ll admit I haven’t read this too carefully, but I hope you’ll permit me some questions anyway:

bcmills commented 5 years ago
  • Why have wrapped? How does it differ from what we have now?

The wrapped types have the same semantics as the builtin types today.

The wrapped package exists so that we can remove the arithmetic operators from the built-in types. Otherwise, it's much too easy to forget to convert a value to checked and end up with (undiagnosed) wrapping behavior where you really meant for the operations to never overflow.

bcmills commented 5 years ago
  • In the same vein, what do you think about adding saturating? (I think this was discussed elsewhere as well.)

I think that would be fine. I don't know of many cases where it's actually useful, but it doesn't seem actively harmful either.

bcmills commented 5 years ago
  • I see the value in package checked as a namespace, but ISTM we could dispense with unbounded in favor of a new predeclared type, like integer or infinint (joking).

Indeed. I don't feel strongly about that either way — I'm happy to go with whatever makes the proposal more palatable to the folks making the final decision. 🙂

tv42 commented 5 years ago

I'm sorry I haven't read everything carefully but how exactly is this backwards compatible?

Unchecked arithmetic operations on builtin integer types are a compile-time error.

var x int32 = 1<<31 - 1
y := x + 1  // compile-time error: `x + 1` is not checked for overflow

EDIT: I guess it's not intended to be? Is there any idea of how much existing code this would break?

ianlancetaylor commented 5 years ago

Unchecked arithmetic operations on builtin integer types are a compile-time error.

If I'm reading this correctly essentially every existing Go program would become invalid, because they all have arithmetic operations on builtin integer types. That seems like a very heavy lift.

networkimprov commented 5 years ago

@ianlancetaylor do you consider backwards-incompatible features which are trivially accommodated by go fix to be problematic? (Not sure whether this particular concept is easily go-fixable...)

ianlancetaylor commented 5 years ago

Running go fix can certainly help, but it's not a panacea. We also have to consider all existing books, documentation, tutorials, etc.

dpinela commented 5 years ago

I'd prefer to see plain int become arbitrary-precision (#19623). Usually when I'm writing code that uses integers I don't want to choose whether I want it them to wrap or panic on overflow. I'd rather just get the (mathematically) correct result by default without having to type unbounded.Int everywhere to get it, and use the fixed-size types - either the existing ones or the ones in this proposal - if I need to optimise.

Even in the latter case, I feel like this proposal would make bounded arithmetic code overly verbose.

As an aside, why disallow XOR and complement on unbounded ints? Assuming they're signed, aren't both operations well-defined and guaranteed to yield a finite result?

bcmills commented 5 years ago

If I'm reading this correctly essentially every existing Go program would become invalid, because they all have arithmetic operations on builtin integer types.

Probably every program, but not every package: in particular, packages that use range loops rather than indices for iteration would be mostly unaffected. (The point of the breaking change is to prompt code owners to make an explicit decision about the overflow behavior they intend.)

Also note that it would be possible to adopt parts (1) and (2) of this proposal — the new packages and checked assignment — without the breaking change of part (3). That would provide a much smaller benefit, but at a much smaller corresponding cost.

bcmills commented 5 years ago

I'd prefer to see plain int become arbitrary-precision (#19623).

Note that that proposal does not comply with the migration strategy outlined in https://github.com/golang/proposal/blob/master/design/28221-go2-transitions.md#language-redefinitions.

bcmills commented 5 years ago

As an aside, why disallow XOR and complement on unbounded ints? Assuming they're signed, aren't both operations well-defined and guaranteed to yield a finite result?

Interesting question. Mostly I think it's weird to apply two's-complement arithmetic when the sign bit is infinitely far away, but I suppose it is still well-defined. Perhaps they would be ok after all.

ericlagergren commented 5 years ago

It’s basically a big.Int, and that has XOR.

beoran commented 5 years ago

I am aginst part 3 of the proposal, since that would take us too far. But I am in favor of part 1 and 2, particularly for checked types. I don't care too much about wrapping types. Now, if we combine part 1 and 2 with my ranged types proposal here: https://gist.github.com/beoran/83526ce0c1ff2971a9119d103822533a, then we could write:

// Package checked defines integer types whose arithmetic operations and conversions
// panic on overflow.
package checked

type Uint8 = range uint8[0:255]
type Uint16 = range uint16[0:65535]
type Int8 = range int8[-127:127]
type Int16 = range int16[-32767..32768]

So, with my proposal, that will include the result, ok := form, the checked part of this proposal becomes a package that is user-implementable. And, arguably, this approach is more useful because you can also have checked types with a narrower range of values.

bcmills commented 5 years ago

@beoran I like the generalization to ranged types, but note that there is one subtle mismatch between this proposal and yours. Your proposal requires:

For all ranged types, whenever possible the compiler checks at compile time that any assignments to a variable of a ranged type respects either the bounds or is one of the enumerated values, and emits a compile error if the value of a variable or constant can be proven to be not in range.

This proposal explicitly does not: the overflow check in this proposal is always dynamic, because that's much easier to implement portably, and it is possible that the part of the code in which the overflow occurs may not actually be reachable (for example, because it is on a code path that is only reachable on an architecture with a different-sized uintptr).

beoran commented 5 years ago

I see, ease of implementation is also an important element. I am willing to change my proposal to allow the compiler to make the check always dynamic for an assignment to a variable in a function.

For an assignment to a constant, the range check should always be performed at compile time.

I not sure how the range check should be done for package level variables. Sure, a panic is possible but might be confusing.

beoran commented 5 years ago

I amended the proposal. Now I propose the range check is on compile type for constants, and run time for variables, although the compiler may still do the range checking at compile time (implementation defined). I also detailed the spec of len(range), worked out other details more. I also disallowed complex underlying types for range types and for range boundaries since for complex64 and complex128, the meaning of "range" becomes quite complex, indeed.

ianlancetaylor commented 5 years ago

I would argue that the comma-ok idea suggested here, part 2 of the proposal, should not be done initially. I argue this on two points. First, I think that few people will actually want that feature. Second, for those people who really do want it, they can already write it by using a deferred function that calls recover. So I think it's appropriate to simply postpone that part of the proposal indefinitely, until we have a clearer understanding of how often would want to use it.

bcmills commented 5 years ago

I would be ok with omitting the , ok form for now.

I think its major use-cases are for down-casting checked types (for which an explicit range check isn't a lot of extra code) and for omitting double-conversions when writing isolated arithmetic expressions in mostly-non-arithmetic (or mostly-checked) blocks of code, but the alternatives using explicit conversions for those use-cases don't seem terrible.

bcmills commented 5 years ago

33342 found a neat overflow in the runtime package:

https://github.com/golang/go/blob/5b15510d96b00662327fbd3eb860d767834dfadc/src/runtime/malloc.go#L1078

beoran commented 5 years ago

This is why I feel ranged types are an important feature of Ada and other Wirth-like languages: they help writing correct programs without overflow. As it stands now, in Go, like other C inspired languages, it is still too easy to accidentally cause overflow. Go would do well to adopt range types to prevent this.

mentalisttraceur commented 5 years ago

My immediate impression is that having logical shifts not effect the ok in checked assignments is... well I don't mean that personally, but I had a very visceral "this will negatively impact people" reaction that made me really want to call it "asinine".

It is inconsistent with the semantics/"feel" of every other operation on integers that can cause logical result bits outside the size of the integer type, in a non-obvious way that I guarantee people will accidentally forget.

(From all my experience carefully paying attention to how human cognition interacts with code, I predict we'd see non-negligible amounts of situations where people accidentally intuitively rely on qux, ok := foo << bar to set ok to false if foo has significant ones shifted out for unsigned integers too. I would bet a lot of money on it if I could be guaranteed that we would know about it whenever it happens.)

Also, if we wanted to do a logical shift while checking that we shifted bits out, we'd have to do our own before-doing-the-operation checks. But if instead logical shift did effect the ok in checked assignment, and we wanted to ignore if we shifted bits out, the worst case is we'd just have to pull shift subexpressions out into their own assignments. So the worst case code impact seems way worse with logical shift not effecting the ok of checked assignment.

Maybe that doesn't matter because most code just doesn't do the kind of bit twiddling that would make this matter, and it looks from the other comments that the checked assignment part of the proposal is maybe already on the chopping block anyway, but I wanted to voice that.

tv42 commented 4 years ago

// Package unbounded defines an arbitrary-precision integer type with unbounded range.
// Unbounded types do not support bitwise XOR, complement, or clear operations.

XOR was discussed earlier (and since math/big.Int has, I think this should have it too).

Why no clear? Once again, math/big.Int has this: https://golang.org/pkg/math/big/#Int.SetBit

ericlagergren commented 3 years ago

I would argue that the comma-ok idea suggested here, part 2 of the proposal, should not be done initially. I argue this on two points.

I’d like to push back on this. I think the comma-ok idea is very useful, even if just by itself.

First, I think that few people will actually want that feature.

I think more folks would use checked arithmetic were it more approachable. There isn’t any easy way to do it right now.

You can use math/bits, but it returns the results in the opposite order you’d expect for checked multiplication: (hi, lo) vs (x, ok). And it doesn’t make your intentions clear.

You can use a third-party package, but now you’re writing x, ok := checked.Add(a, b) which gets very tiresome if you have more than just two integers, like a + b c / d. (math/bits suffers from this as well.) Worse, it makes the code more difficult to read, and one of Go’s strengths is its readability. I’ve written a lot* of math/big code and even I frequently lose my place trying to follow code with more than a little arithmetic.

Second, for those people who really do want it, they can already write it by using a deferred function that calls recover.

I think using defer and recover to handle overflow is too clever and too subtle.

It also makes the code more difficult to follow since defer blocks are almost always far away from the code causing the panic. It’ll be difficult to handle some overflows in one way, and some in others.

Additionally, using defer and recover has a much more significant performance impact than comma-ok. Open-coded defers are pretty cheap, but they’re not free. However, comma-ok is free: the only cost is checking the overflow flag, which you have to do anyway with the defer recover strategy.

catenacyber commented 1 year ago

This looks interesting.

Would there be a way, with Go 1, to build code which makes integer overflows panic ? Or should I just try to parse AST and add checks before each arithmetic operations ?

nirui commented 11 months ago

Can this be implemented as a simple bound checks? It could be more versatile since it should allow users to specify their custom bound range.

It don't even have to be new types (though I like Beoran's suggestion), just few functions should do it, and no need to change how the language works.

Simplified idea:

// Define all supported integer types
type Integer interface { ~int | ~uint | ~int8... }

// Integer operations, will set `ok` to `false` if `result` is out of specified bound (via `min` and `max`)
func BoundedAdd[I Integer](a I, b I, min I, max I) (result I, ok bool)
func BoundedMinus[I Integer](a I, b I, min I, max I) (result I, ok bool)
func BoundedMultiply[I Integer](a I, b I, min I, max I) (result I, ok bool)
func BoundedDivide[I Integer](a I, b I, min I, max I) (result I, ok bool)
func...

// Handy function, panics if `ok` == false
func Must(result I, ok bool) (result I)

This might also allow us to mix different Integer types, and operate on them safely:

func Cast[I Integer, O Integer](in I) (out O, ok bool)

// BoundedAdd how takes two different integer types `I` and `O`, and output the result as a `O` type value
func BoundedAdd[I Integer, O Integer](base O, add I, min O, max O) (result O, ok bool)

EDIT: It's basically the same as the one proposed in 60858 and 24853. Should have checked before posting, sorry.

findleyr commented 3 weeks ago

While this is an interesting idea, we can't realistically make such a large breaking change to the behavior of int. A similar argument is made regarding arbitrary precision ints in https://github.com/golang/go/issues/19623#issuecomment-2460950122. By contrast, issue #30613 proposes a potential path forward for checked ints that does not change the behavior of existing programs.

Therefore, this is a likely decline. Leaving open for four weeks for final comments. — @findleyr for the language proposal review group