golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
124.29k stars 17.7k forks source link

proposal: spec: change int to be arbitrary precision #19623

Open robpike opened 7 years ago

robpike commented 7 years ago

An idea that has been kicking around for years, but never written down:

The current definition of int (and correspondingly uint) is that it is either 32 or 64 bits. This causes a variety of problems that are small but annoying and add up:

I propose that for Go 2 we make a profound change to the language and have int and uint be arbitrary precision. It can be done efficiently - many other languages have done so - and with the new compiler it should be possible to avoid the overhead completely in many cases. (The usual solution is to represent an integer as a word with one bit reserved; for instance if clear, the word points to a big.Int or equivalent, while if set the bit is just cleared or shifted out.)

The advantages are many:

Most important, I think it makes Go a lot more interesting. No language in its domain has this feature, and the advantages of security and simplicity it would bring are significant.

Merovius commented 7 years ago

Could you please discuss any range-expression changes for channels in a separate issue/thread? It's at most tangentially related to this proposal and thus adds noise for people who are subscribed because they are interested in the discussion of arbitrary precision ints.

zichong commented 7 years ago

same as @faiface, I also fall back to Python when deal with big numbers. Once I had a mind to extend the "math/big" package with an Eval function.

func Eval(s string) (*Float, error)

not good but at least clearer

// numbers here can be big numbers
n, err := Eval("(1 * 2 * 3 + 4 * 5) + 6")

If this can be done efficiently in language level, why not

mahdix commented 7 years ago

Why does this new int need to be immutable?

wanton7 commented 7 years ago

I really like this suggestion, but if this is added I think native int's need to be added as well. Maybe something like nint and nuint or intn and uintn.

ghost commented 7 years ago

I really like this suggestion, but if this is added I think native int's need to be added as well. Maybe something like nint and nuint or intn and uintn.

If it was up to me I would name the arbitrary-precision integer types as:

pierrre commented 7 years ago

Does it make sense to have "unsigned" integer if it's arbitrary precision ?

tgulacsi commented 7 years ago

No. What is auint(0)-1 ?

Pierre Durand notifications@github.com ezt írta (időpont: 2017. jún. 5., H 15:49):

Does it make sense to have "unsigned" integer if it's arbitrary precision ?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/golang/go/issues/19623#issuecomment-306191784, or mute the thread https://github.com/notifications/unsubscribe-auth/AAPoSkoYNpsKh-zq6Owiu2YBekMYrCkIks5sBAdigaJpZM4Mi7Wo .

bcmills commented 7 years ago

@pierrre, @tgulacsi, we've been over this already (https://github.com/golang/go/issues/19623#issuecomment-288229046).

bcmills commented 7 years ago

@mahdix

Why does this new int need to be immutable?

Because all of the other numeric types are "immutable", in the sense that they do not expose changes through aliased values. For all of the existing numeric types.

x := y
x++

does not change the value of y. It would be surprising for that to be any different for arbitrary-precision ints.

ghost commented 7 years ago

I don't know what kind of software other people unknown to me are writing, but it seems to me that in most software in most use cases integer values never exceed 64 bits (32 bits). That is: integer values never exceed the bitwidth of the 64-bit address space (of the 32-bit address space).

The need for arbitrary precision integers is relatively small, especially in a performance-sensitive systems language compiled to machine code like Go, but it is nice if the language provides arbitrary precision integers.

robpike commented 7 years ago

It's not about need, it's about quality of language.

nathany commented 7 years ago

Presumably arbitrary precision constants would map nicely to ints/uints in Go 2.0 with this proposal. That would be certainly make the language more friendly, and do away with this caveat:

There is a caveat to constants of unusual size. Though the Go compiler utilizes the big package for untyped numeric constants, constants and big.Int values aren’t interchangeable. Listing 8.2 displayed a big.Int containing 24 quintillion, but you can’t display the distance constant due to an overflow error.

fmt.Println("Andromeda Galaxy is", distance, "km away.") // *

* constant 24000000000000000000 overflows int

- from Lesson 8, Get Programming with Go

Being able to pass a numeric constant or literal to a function and not worry about overflows or conversions would be very nice indeed.

nathany commented 7 years ago

The int and uint types were intended to be of the "natural" size for a given platform, typically the register size. If we have true integers we loose that naturally sized type - yet we make use of it in a few places where we want integer algorithms to work well for a given platform - @griesemer, https://github.com/golang/go/issues/19623#issuecomment-287903210

If needed, @wanton7's suggestion sounds good to me:

I really like this suggestion, but if this is added I think native int's need to be added as well. Maybe something like nint and nuint or intn and uintn. - @wanton7, https://github.com/golang/go/issues/19623#issuecomment-306189100

intn for "natural" or "native" and int32, int64, intn

overflow when constants like math.MaxUint64 is automatically promoted to int type - @robpike

Would it be useful/possible to have a math.MaxIntn constant determined at compile time?

themue commented 7 years ago

Simply let int as it is but add a new type natural for explicit usage of it.

omeid commented 7 years ago

Why not introduce a new type instead?

wanton7 commented 7 years ago

@omeid Because to have more robust software in general (no overflows) you need to have it as default int type, so that everyone will use it. It's like vaccination, everyone needs to get on board to eradicate the disease.

smasher164 commented 7 years ago

Since the majority of libraries would still use int instead of mpint, integrating two libraries, one of which uses int and the other one uses mpint would become cumbersome.

@faiface Would #19412 address this concern? An addition to the type-system that lets a library accept both int and mpint could allow:

type integral union {
    int
    mpint
}
func acceptsEither(v integral)
omeid commented 7 years ago

@wanton7 Calling something with inconsistent, or generally slow, perfs robust is bold claim that requires bold justifications.

wanton7 commented 7 years ago

@omeid I'm not native english speaker so robust to me means more secure and stable software. I did write about overflowing so not sure how you could misunderstood me. Also I didn't write about performance at all and don't know what it has do to with software robustness, maybe you can explain it to me?

tillulen commented 7 years ago

Upd.: This is a response to a comment that has been deleted.

No. There is no strict definition that is universally accepted, but in software, ‘robust’ generally means ‘correct in the face of failure or adverse conditions’, ‘hard to break’ [WikiWikiWeb], ‘capable of performing without failure under a wide range of conditions’ [Merriam-Webster], ‘able to cope with errors during execution and cope with erroneous input’ [Wikipedia], ‘able to recover from errors; unlikely to fail, reliable’ [OED].

I don’t think there is any connotation of efficiency or high performance. In fact, it is common to assume that the safety checks required to make a system robust may incur a performance penalty.

MichaelTJones commented 7 years ago

I don't like the specifics of the proposal--let's do more--but i like its motivation which is important in its several aspects:

Some of my software does extensive computations using extended precision real and complex numbers (via math.big). Much of my software does extensive computation using extended precision integers (my own assembler). Some of my software does unlimited precision integer computation using math.big. Based on my own experience, it seems that this take on the proposal here could be better:

  1. Add int128 as a supported type. (super simple, handy CPU instructions with carry/borrow, fixed size, easily passed on stack, etc.)

In fact...consider int{number of bits = 8*2**bytes} (as in int8, int32, int128, int256) to have the following meaning: on machines where that's a natural hardware size, the natural code is generated, but on machines where that is not natural, then slower but still fast multiprecision code is generated. (Probably add/sub inline and mul/div via functions)

  1. Add float128 as a supported type. This is not so simple in terms how to implement.

You can do a fast but imperfect job with doubled-precision. I have libraries for this, several exist. 4x slower for 128 bits using the normal FP hardware and paired doubles. the precision is perfect and all IEEE modes are right, but, the exponent range is less than IEEE 128-bit FP.

Better would be the slower but IEEE-perfect big.Float in the 128-bit precision size. This is notably slower, but still fast, and harder to fit in. Maybe the answer is a special less-general internal implementation of big.float that is a fixed structure amenable to passing as a value, works as rvalue or rvalue, etc.

and by extension, consider float{number of bits = 8*2**bytes} (as in float32, float64, float128, float256, ...) to have the following meaning: on machines where that's a natural hardware size, the natural code is generated, but on machines where that is not natural, then slower call to math.Float is made, though perhaps reimplemented for thus purpose in a fixed-container of bytes form.

  1. Instead of reinterpreting 'int', i propose this: int stays as the natural hardware int size and new types, 'integer', 'floating' (or 'real'), and 'rational' are added. integer means arbitrary precision integer, floating (or real) means arbitrary precision floating point, and rational means the ratio of arbitrary precision integers.

This is not a fast path. The use of unsized integer, floating, and rational variables is a siren that what is happening may not be super fast. However, it is understood that in exchange they will be super general. Such an integer is what I think Rob wanted for generality...it is the typed variable counterpart of the way numbers are parsed in Go1.

There are subtleties to this. They are a challenge but surmountable:

First, type conversions might "fail" so it is necessary to consider how "x:=int32(integer(1<<800))" should work. Same for floating or float128 and above to float64, since the exponent range may overflow--should they go to zeroes and infinity? should it fail? should some form of type switch be used so that you can say convert if applicable else do this other thing?

Second, what do naked numbers mean. is x:=1<<70 an int128? an integer? a failure? Any can work, 'integer' seems natural but that's not a proposal, just an example of a question that needs answers as part of the design.

Summary:

This modification to the proposal at issue yields a wonderful programming environment, allowing code with integer, rational, and floating variables to be used naturally; supports fast-path hardware friendly coding with float128 and int128 (and larger sizes, handy for crypto and other purposes), and changes no meaning for the existing variable type int.

gautamcgoel commented 7 years ago

OCaml internally uses 63 bit ints, with the last bit indicating whether the 64 bit value should be interpreted as an int or a header for some boxed value. This is similar to some of the suggestions regarding how we should represent arbitrary precision ints in Go. Of course, such a choice has performance implications. We can learn from the experiences obtained using OCaml; for example, see:

https://blog.janestreet.com/what-is-gained-and-lost-with-63-bit-integers/

These benchmarks show that we can expect a slowdown of at least a factor of 2-3 on regular 64-bit arithmetic. Personally, I think it's worth it.

bakul commented 7 years ago
  1. Should the encoding (2 or 3 bits or something else) to accomodate bigints be left up to the implementation? This is what Scheme/Lisp impls do.
  2. To avoid surprises it may be worth adding types integer and uinteger for arbitrary precision that any integral type can be converted to "without ceremony". Note that current {,u}int{,8,16,32,64} essentially do modular arithmetic (even though normally we tend to ignore this fact) so fundamentally different from arbitrary precision ints.
  3. Why have arbitrary precision unsigned integer type at all? Converting /to/ arb. prec requires no extra work. Converting /from/ arb. prec requires that such a number can be /narrowed/ to the lesser type, which can include sign checking.
  4. Might as well as add rational types (e.g 355/113)! Point being, rather than just look at arb prec. int and possibly do piecemeal evolution, why not take another look at all the numeric types (the "numeric tower") and learn/steal from what CL and Scheme have done?
griesemer commented 7 years ago

@bakul Regarding your point 1: Different implementations may choose different internal representations, and implementations may evolve. If we were to do anything here, the encoding absolutely should be left to the implementation.

christophberger commented 7 years ago

@bakul

Why have arbitrary precision unsigned integer type at all?

See bullet point 3 of the initial comment: Current int types can silently overflow. This is a common source of severe bugs. There are two options for avoiding silent (u)int overflow:

(1) make ints and uints error out on overflows (difficult, creates new problems, e.g. the need for testing for errors after every addition or multiplication), or

(2) make them arbitrary precision (u)ints (easier & more intuitive for the user).

(Algorithms that require overflow as part of their logic should use the sized variants anyway (from (u)int8 to (u)int64.)

bcmills commented 7 years ago

@christophberger

make ints and uints error out on overflows … e.g. the need for testing for errors after every addition or multiplication

My current proposal in #19624 would allow you to check for overflow on an entire “expression consisting of arithmetic operators and / or conversions between integer types” at once. You really only need one check for overflow anywhere in the expression, in much the same way that you can do arbitrarily many floating-point operations with only one NaN check at the end.

It's true that arbitrary-precision arithmetic lets you combine checks even more broadly (across function calls, for example), but that flexibility also increases the run-time overhead when values exceed the expected range.

christophberger commented 7 years ago

@bcmills

Agreed, a behavior analogous to NaN (that is, allowing to postpone the overflow check until the end of a long calculation) simplifies handling overflow errors.

I also agree on the possible runtime overhead, but when calculations do exceed the range (which should occur pretty rarely in many common scenarios), I guess that most users would prefer observing a slowdown rather than errors.

bakul commented 7 years ago

@christophberger

To clarify, my point was that an arb prec. int would be a superset of arb prec uint so why have two types. Implementation wise I can see a difference (particularly if you use a tagged word). But I am coming at this from a different angle: adding bignums just to deal with overflow seems a big change for a little gain. Introducing bignums should be based on their overall usefulness. And from that perspective the rest of the numeric tower should also be considered (e.g. rationals and bigfloats). Finally from a user perspective unsigned bignum doesn't bring anything more useful (except that one more place you run into potential type mismatch).

christophberger commented 7 years ago

@bakul

Thanks for clarifying, I missed the "unsigned" in your 3rd item somehow. Indeed, as per this comment, unsigned arbitrary precision integers are redundant and also fail when calculating uint(0)-1 (as the only sane result for this is a runtime panic).

Regarding signed arbitrary precision integers, I think the gain is more than just "little" - the initial comment already lists three advantages (the first one of the four-bullet list does not count).

In the end, it depends on whether this can be implemented with acceptable efforts and without complicating the runtime (the GC in particular) too much.

nathany commented 7 years ago

I don’t see this proposal and #19624 as mutually exclusive. Arbitrary precision integers would interop well with constants and avoid overflows as an easy to use numeric type. Meanwhile int64 and friends give more control over memory layout... and could be extended to give control over overflow behaviour (wrap, saturate, panic, error, etc.).

magical commented 6 years ago

Possibly relevant: v8 just added an experimental BigInt type with first-class language support https://v8project.blogspot.com/2018/05/bigint.html

The announcement doesn't go too deep into implementation details, so it isn't clear whether they are using a tagged representation or not.

robaho commented 6 years ago

Don’t you need to do this for float/double as well then... analogous to BigInteger and BigDecimal in Java? Why does this need to be part of the language... could just copy those classes from the JDK, but sparse structs could be implemented to make them even more efficient.

Merovius commented 6 years ago

@robaho IMO math.Log(x) is a natural operation to want to do for floating point numbers but not one that can be done with arbitrary precision. I.e. you won't get around the need to have a fixed precision "natural" floating point type.

lpar commented 6 years ago

No reason why you can't have math.Log(x, prec) to compute a log to specified precision using arbitrary precision values.

tgulacsi commented 6 years ago

That's the same as having x already holding its precision.

mathew notifications@github.com ezt írta (időpont: 2018. aug. 16., Cs, 15:57):

No reason why you can't have math.Log(x, prec) to compute a log to specified precision using arbitrary precision values.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/golang/go/issues/19623#issuecomment-413554173, or mute the thread https://github.com/notifications/unsubscribe-auth/AAPoSqfVe4yZ57cJ2_56dw9CNXDGlsTKks5uRXpPgaJpZM4Mi7Wo .

lpar commented 6 years ago

For that operation, yes. That's the point. A single operation being limited precision doesn't mean you have to cripple your data type to support it.

tgulacsi commented 6 years ago

You'll have to "cripple your data type", as some numbers cannot be represented with rational numbers or with finite number of digits / bits. (Pi, 1/3, √2)

mathew notifications@github.com ezt írta (időpont: 2018. aug. 16., Cs, 16:07):

For that operation, yes. That's the point. A single operation being limited precision doesn't mean you have to cripple your data type to support it.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/golang/go/issues/19623#issuecomment-413557329, or mute the thread https://github.com/notifications/unsubscribe-auth/AAPoSu2ApCNArOT1HPhyVevuA76pZAllks5uRXydgaJpZM4Mi7Wo .

lpar commented 6 years ago

By that argument, since we can't represent pi exactly as a BigDecimal, we should remove BigDecimal from the language and only have floats.

That's a silly argument against BigDecimal.

ianlancetaylor commented 6 years ago

Let's please move the discussion about arbitrary size floating point types to a different issue. This issue is about changing int to be an arbitrary sized integer type. In particular note that the language has no float type. Thanks.

lpar commented 6 years ago

Since the language doesn't have an integer type either, how about making integer the type of arbitrary-precision ints rather than changing int? That would avoid breaking existing code.

robaho commented 6 years ago

@ianlancetaylor sorry, wasn't trying to hijack/confuse. I was only only trying to point out that why not have double and double64 then, with double being arbitrary sized. just seems wrong special casing it for int (rather than just using a BigInteger class). I would think operator overloading would be cleaner and more useful (although not a big fan of most uses of operator overloading...), but that is another can of worms

TuomLarsen commented 5 years ago

What would happen to big.Int if this gets accepted?

ianlancetaylor commented 5 years ago

@TuomLarsen It's implementation would become trivial. If we ever do a math/big/v2, it might be removed.

TuomLarsen commented 5 years ago

@ianlancetaylor On the other hand, it seems to me that one of the advantages of big.Int is that is provides greater control for managing memory. What would be the equivalent of Int.Set? Does = do deep or shallow copy? I'm asking because perhaps "dest ← op1, op2" of big.Int is more flexible but its functionality would overlap somewhat with this proposal.

MichaelTJones commented 5 years ago

This issue was a concern about big in early Go days but the pool management scheme suggests a new try is in order.

On Thu, Nov 29, 2018 at 3:30 PM TuomLarsen notifications@github.com wrote:

@ianlancetaylor https://github.com/ianlancetaylor On the other hand, it seems to me that one of the advantages of big.Int is that is provides greater control for managing memory. What would be the equivalent of Int.Set? Does = do deep or shallow copy? I'm asking because perhaps "dest ← op1, op2" of big.Int is more flexible but its functionality would overlap somewhat with this proposal.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/golang/go/issues/19623#issuecomment-443032772, or mute the thread https://github.com/notifications/unsubscribe-auth/AHgypRSYph7CMslf5mAV9v4W_yWV_4egks5u0G4BgaJpZM4Mi7Wo .

--

Michael T. Jonesmichael.jones@gmail.com michael.jones@gmail.com

robaho commented 5 years ago

Upon reflection, I think this change would break to far from the languages C/system heritage. Better handled in libraries IMO.

faiface commented 5 years ago

@TuomLarsen I'd expect it'd work just like with strings today. Both strings and ints are immutable, so = would do a shallow copy, but all operations (+, +=, etc) would make a new instance by default. Of course, compiler would optimize cases where it's unnecessary to make a new instance and instead rewrite an old one.

nathany commented 5 years ago

@TuomLarsen It's implementation would become trivial. If we ever do a math/big/v2, it might be removed. https://github.com/golang/go/issues/19623#issuecomment-443023550

What is the interplay between this proposal and #20654 math/big: support for constant-time arithmetic?

griesemer commented 5 years ago

@nathany #20654 is about specialized large integer arithmetic where operations (such as +, *, etc.) always take the "same" amount of time independent of the input values. For instance, if huge represents a large integer value, huge + huge must take the same time as huge + 0 (even though in the latter the algorithm might be able to tell right away that there's nothing to do because one argument is 0). Making such operations' runtime independent of the arguments makes it impossible to "guess" arguments based on runtime. When such guessing is possible, this "side channel" (the timing of operations) becomes a valuable data source for crypto hackers.

There's no need for such constant-time operations in the language implementation. Just the opposite is true: we don't want constant-time operations because this proposal assumes that we can do all kinds of tricks to avoid the cost of a large integer operation: Most integers will be small, and in those cases the costs of operations should be close to what they are now (and inlined in the generated code).

I see #20654 as a proposal for a specialized library, built from the ground up (and tested as such) for crypto. I think it would be very hard to retro-fit big/int to support both (optimized and constant-time operations) and be sure to get it right in all cases.

TuomLarsen commented 5 years ago

I would like to add another data point which I just encountered. Provided there was an exponentiation operator, one could write

-3*x**5*y

instead of

new(big.Int).Mul(big.NewInt(-3), new(big.Int).Mul(new(big.Int).Exp(x, big.NewInt(5), nil), y))