proposal: spec: change int to be arbitrary precision

robpike commented 7 years ago

An idea that has been kicking around for years, but never written down:

The current definition of int (and correspondingly uint) is that it is either 32 or 64 bits. This causes a variety of problems that are small but annoying and add up:

overflow when constants like math.MaxUint64 is automatically promoted to int type
maximum size of a byte slice is only half the address space on a 32-bit machine
int values can overflow silently, yet no one depends on this working. (Those who want overflow use sized types.)
great care must be taken with conversion between potentially large values, as information can be lost silently
many more

I propose that for Go 2 we make a profound change to the language and have int and uint be arbitrary precision. It can be done efficiently - many other languages have done so - and with the new compiler it should be possible to avoid the overhead completely in many cases. (The usual solution is to represent an integer as a word with one bit reserved; for instance if clear, the word points to a big.Int or equivalent, while if set the bit is just cleared or shifted out.)

The advantages are many:

int (and uint, but I'll stop mentioning it now) become very powerful types
overflow becomes impossible, simplifying and securing many calculations
the default type for len etc. can now capture any size without overflow
we could permit any integer type to be converted to int without ceremony, simplifying some arithmetical calculations

Most important, I think it makes Go a lot more interesting. No language in its domain has this feature, and the advantages of security and simplicity it would bring are significant.

griesemer commented 7 years ago

I'm a big fan of this, myself. It would elevate int (and uint) to "unrestricted" (for lack of a better word) types, and only the types with explicit sizes (int16, int32, etc.) would be subject to wrap around.

In many cases (loop iteration variables) the compiler may be able to prove that an int is in a certain range and could produce optimal code. In many other cases, the use of int is not speed-critical.

randall77 commented 7 years ago

Let's put this and #19624 in the Thunderdome! Two proposals enter, one proposal leaves...

griesemer commented 7 years ago

A minor related point about this: The int and uint types were intended to be of the "natural" size for a given platform, typically the register size. If we have true integers we loose that naturally sized type - yet we make use of it in a few places where we want integer algorithms to work well for a given platform (strconv.Itoa comes to mind). We may want to consider introducing an unsigned "word" type instead (in the past we have also used uintptr in that role, but that may not necessarily guaranteed to be of register size on all platforms).

randall77 commented 7 years ago

Representing an int in a single machine word will be tricky. We run across the same problem we had with scalars being in direct interfaces - the GC sees a word which is sometimes a pointer and sometimes isn't. The only easy solution I see is to reserve some top bit(s) to set for raw integers and disallow those top bits for any heap pointer.

bcmills commented 7 years ago

@randall77

Two proposals enter, one proposal leaves...

Actually, I think they're completely compatible. I recommend both!

bcmills commented 7 years ago

@randall77

the GC sees a word which is sometimes a pointer and sometimes isn't

Could we fix that with better stack or heap maps? Instead of each word being "either a pointer or a non-pointer", it would be "a pointer, a non-pointer, or an int". I suppose that would require two bits instead of one per address, though ― might bloat the maps a bit.

FWIW, the ML family of languages worked around that issue by making the native types int31 and int63 instead of int32 and int64. I do not recommend that approach.

randall77 commented 7 years ago

@bcmills Yes, two bits per word should work. That just buys us more flexibility to put the distinguishing bits somewhere other than the top bits of the word - not sure if that is worth it.

rasky commented 7 years ago

I love this proposal in abstract, but I'm very concerned about the performance impact. I think it's not by chance that "no language in its domain has this feature". If we use bound-checking elimination has a similar problem, Go compiler isn't very good at it even nowadays, it basically just handles obvious cases, and doesn't even have a proper VRP pass (the one proposed was abandoned because of compile time concerns). Stuff like a simple multiplication would become a call into the runtime in the general case, and I would surprised if the Go compiler could avoid them in most cases, if we exclude obvious cases like clearly bounded for loops.

griesemer commented 7 years ago

@rasky Languages likes Smalltalk and Lisp (and more recently, JavaScript) have pioneered the implementation of such integers - they can be implemented surprisingly efficiently in the common case. A typical implementation reserves the 2 least significant bits as "tags" - making an int on a 64bit platform effectively 62bit in the common (small) case - which is likely more than plenty in most cases (and where it isn't it's either because we need very large integers, or we should be using int64).

One way of using the tag bits is as follows: 00 means small integer (smi) 01 means that the value is a pointer p to an arbitrary precision integer (at p-1) 10 typically the result of an operation (see below) 11 reserved for other uses or unused

Given this encoding, if we have two int values x and y, we can optimistically assume they are smis. For instance x + y would be translated into a single ADD machine instruction, followed by a test of the tag bits. If they are not 00, one (or both) of the operands were large ints and then a runtime call is needed. Also, if there was an overflow, a runtime call is needed. This does add 3 instructions in the fast path, but not more (one conditional jump if overflow, one test of tag bits, one conditional jump if tag bits are not 0 - perhaps more depending on how the runtime call is achieved or if there's a conditional call instruction). It's not cheap, but it's much less expensive than a runtime call in the common case. If both operands were smis, the tag bits remain 00, and the result doesn't need further work. The same is true for subtraction. Multiplication requires a bit more work but is also more expensive. Finally, division is the most complicated one, but also the most expensive operation in general.

It might be worthwhile performing a little experiment where one generates this additional code for each integer addition, using dummy conditional branches that will never be taken (or just jump to the end of the instruction sequence) and see what the performance impact is.

DmitriyMV commented 7 years ago

Not a fan of this proposal - currently its quite simple to argue about resulting code and its performance characteristics when doing simple arithmetics. Also - even if losing two bits on 64 bit platform is not important, on 32 bit one it is.

Maybe we could have an arbitrary precision ints implemented in new built-in type (like we do with complex currently)?

dmitshur commented 7 years ago

Can you discuss how such int variables would represented in mediums other than RAM, and marshaled/unmarshaled?

Encoding to JSON should be easy and map really well. As far as I know, JSON spec does not place restrictions on size of numbers, so a really large int would encode as as base 10 number (and vice versa for decoding). E.g.:

{"number": 12312323123131231312312312312312321312313123123123123123123}

Would map to an int with value 12312323123131231312312312312312321312313123123123123123123.

What about something like encoding/gob or encoding/binary?

griesemer commented 7 years ago

@shurcooL 1) printing is obvious (the usual human-readable forms) - applies to JSON 2) encoding/gob could use a private internal representation 3) encoding/binary is for fixed-width numbers at the moment, the proposed int's wouldn't be - but a var-int encoding could work (though the current format would probably be inefficient).

Re: 3, note that the compiler's import/export format already encodes arbitrary-sized integer values because of precise constants.

magical commented 7 years ago

@shurcooL @griesemer I believe encoding/gob already uses a variable-length encoding for all integer types.

jasonwatkinspdx commented 7 years ago

Go should have an arbitrary precision number type that is more convenient than math.Big. That type should not attempt to masquerade as int/uint, as these aren't just used semantically as "number" but more so as "number compatible with c/foolang code that uses the natural local word size".

The root problem here is that golang's design prevents a library from defining an arbitrary precision type that is as semantically and syntactically convenient as a type provided by the runtime/compiler with internal knowledge and cludges. Solve that problem with golang 2.0, or instead we will find ourselves years from now with countless ad hoc accretions like this.

Edit: I'm a fan of this design/feature in scripting languages. I don't see how it works as the base int type in a systems language to replace c/c++/java. I absolutely think we should have a great and convenient arbitrary precision number type, and think the road to that is a golang where library defined types are not second class to ad hoc boundary flaunting extensions of the runtime and compiler provided types.

bcmills commented 7 years ago

@jasonwatkinspdx

It's true that one of the roles of int in Go 1 is as an analogue to C's ssize_t. But it doesn't actually matter whether int is exactly the size of the address space or merely sufficient to cover the address space.

Perhaps more importantly, C APIs don't actually use ssize_t all that often: they use size_t instead. You have to range-check conversions from C.size_t to int, because the latter is potentially only half the range of the former. Under this proposal, you'd need the range check in the other direction instead, because C.size_t would now be smaller than int. The only way to avoid the need for a range check entirely is by making the Go "size" type have exactly the same range as C.size_t, and at that point you're looking at a fairly high risk of overflow bugs (especially from decrementing past zero).

jasonwatkinspdx commented 7 years ago

@bcmills

But it doesn't actually matter whether int is exactly the size of the address space or merely sufficient to cover the address space.

It does matter. People make design decisions concerning the size of the address space and how indexes into it can be represented in serializations or other transformed values. Making the type that spans the address space arb precision offloads many complexities to anyone that wants to ship data to other systems.

What does a c abi compatible library consumer of golang look like in a world where int/uint is arb precision? Is it really better if the golang side is blind to any issue at the type level and the c side has no choice but panic?

I do see the value of these types, I just don't want them conflated with int/uint. I'm entirely in favor of a numeric tower in golang being the default numeric type, I just don't want it to pretend to have the same name as the 32bit/64bit machine/isa determined types.

bcmills commented 7 years ago

People make design decisions concerning the size of the address space and how indexes into it can be represented in serializations or other transformed values.

Can you give some examples? I'm having a hard time thinking of anything other than a syscall that spans a process boundary but should be sized to the address space, and programs using raw syscalls generally need to validate their arguments anyway.

What does a c abi compatible library consumer of golang look like in a world where int/uint is arb precision?

The same thing it looks like today: using C.size_t instead of int.

bronze1man commented 7 years ago

@robpike It looks harder to reduce CPU of golang program when ints become arbitrary precision. 😝 I do not need arbitrary precision ints by default, but I do need golang program to use less CPU to save my money with gce vm server.

jasonwatkinspdx commented 7 years ago

programs using raw syscalls generally need to validate their arguments anyway.

I want to capture that in the types, not in a dynamic value range check. I don't think this is an unreasonable expectation of a language that markets itself as a replacement for c/c++/java for systems programming.

cznic commented 7 years ago

I honestly thought it's 11 days more than it really is.

I don't want to lose normal int performance. I don't believe there's a space/time effective way for an arbitrary precision int to be, in many cases, comparable in performance to the fixed width int.

I have nothing against adding an arbitrary precision mpint type (name does not matter), which the compiler accepts mixed in expressions with normal ints providing the conversions as needed. IOW, it would be really nice to be able to use standard operators with arbitrary precision integers, yes, but please only explicitly. Please leave int alone.

ainar-g commented 7 years ago

What about floats? JSON can have values like 12345678901234567890.12345678901234567890, with many digits before and after the dot, but IIRC no Go type can accurately represent those. That is, no Go type can do maths with it, I know about json.Number.

Personally, I would keep int as it is and instead added a number type that could represent any number with or without a fractional part.

bronze1man commented 7 years ago

Please do not make int operation slower by default. I do not need arbitrary precision ints by default, but I do need golang program to use less CPU to save my money with gce vm server.

I do not need some like mpint with Infix expression eithor, *big.Int is enough for me.

mvdan commented 7 years ago

I don't understand the performance concerns - even if the performance hit would be noticeable in some scenarios, could you not use sized types like int32 or the unsigned word (like the current uint) mentioned by @griesemer?

faiface commented 7 years ago

I am 100% for this proposal. Worrying about the performance impact of arbitrary precision ints is like worrying about the performance impact of array bounds checks, in my opinion. It's negligible if implemented correctly.

I also like this because of how inconvenient the big package is and how I always fall back to Python when I'm dealing with big integers.

tgulacsi commented 7 years ago

For an arbitrary precision uint, what is uint(0)-1 ? Maybe mpint is enough.

faiface commented 7 years ago

Imho mpint is a very bad idea. No one would use it, except for when they really need it, because int is nicer and feels more standard. Since the majority of libraries would still use int instead of mpint, integrating two libraries, one of which uses int and the other one uses mpint would become cumbersome.

Also, it would be entirely unclear whether to use int or mpint in common situations, which contradicts Go's philosophy, which empathizes clarity and tries to reduce programmer's mental overhead.

DmitriyMV commented 7 years ago

@faiface If we are talking about Go 2, by then we should have a better type system anyway. At the same time I would really glad if my arithmetic operations would not suddenly allocate memory if a combination of my integer inputs has resulted in N > n^64. Overflow I can live with - this is my problem.

Again - currently instance of something like

type MyIntCombination struct {
    FirstInt     int
    SecondInt   int
    ThirdInt    int
}

has determined (during compile time) size and has good performance because of the alignment. If they suddenly become pointers - the cache locality would be lost.

All of this can be solved using machine dependent int, but I fear that this change would cause fragmentation in our community between those who prefer explicitly sized ints and the ones who prefer the arbitrary precision. For me, It would mean the strict restriction on using int anywhere in my company code.

But, alas, without working prototype it's all just a speculation. We all remember how alias proposal went.

RLH commented 7 years ago

While the cost of doing an arithmetic operation on two arbitrary precision ints is important there is also the cost to the garbage collector that needs to be considered. We probably need an extra bit on each heap word to indicate whether it is a tagged word. I suspect we can be clever and it won't be a full bit per word but being clever has a way of complicating things. Even if we are memory clever, if we have large arrays of arbitrary precision ints the GC will need to visit each of them to verify that they are immediates and not pointers. In dynamically typed languages like Lisp, JavaScript, and Smalltalk this isn't a big deal because the GC had to visit the words anyway so the GC's marginal cost was minimal.

Furthermore, the cost to the GC is somewhat amplified because it happens during the mark phase when the GC is already using CPU otherwise available to the application. The less time the GC is in the mark phase the easier it is for the application to get the CPU it needs to meet its deadlines, the less floating garbage, and so forth. This cost is hard to measure but it is real.

Getting a sense of these systemic costs isn't intractable but needs to be done before we can claim we know the full cost of arbitrary precision ints.

On Tue, Mar 21, 2017 at 7:49 AM, Tamás Gulácsi notifications@github.com wrote:

For an arbitrary precision uint, what is uint(0)-1 ? Maybe mpint is enough.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/golang/go/issues/19623#issuecomment-288055090, or mute the thread https://github.com/notifications/unsubscribe-auth/AA7Wn_Uq14yr57DLGVxTB0LPA0ubrdVcks5rn7lngaJpZM4Mi7Wo .

champioj commented 7 years ago

Edit: I was confused if the new int would be immutable or not. It is immutable.

griesemer commented 7 years ago

@champioj It needs to be immutable, otherwise it's going to be impossible to use correctly.

bcmills commented 7 years ago

@tgulacsi

For an arbitrary precision uint, what is uint(0) - 1?

A very good question. I would argue that that implies one of two things, and the choice depends on whether the difference between uint and int is one of space-efficiency or program invariants.

If the difference is due to space-efficiency, then uint should not exist. An arbitrary-precision int is capable of representing all of the values of an arbitrary-precision uint without loss, so uint is redundant.

If the difference is due to program invariants — the ability to express "a nonnegative integer" — then subtracting a positive value from uint(0) should panic (and IMO that would argue for also adopting #19624 for consistency).

nnemkin commented 7 years ago

Being fast and close to the metal is a great advantage of Go1. Many other languages start with hard to optimize semantics and suffer bad performance forever, shackled by backward compatibility.

It's true that JIT compiled languages can get away with complex semantics and still be relatively fast: they speculate, profile, optimize and reoptimize at runtime. Yet they rarely if ever reach C speeds and always at a cost of great complexity.

Offline compiler can't speculate much, unlikely paths still have to be compiled. This extra code hanging off integer arithmetic ops will inhibit optimizations, bloat the binaries and slow everything down universally.

Hidden cost of int will be especially glaring if fast and unencumbered int64 is easily available, but unnecessarily ugly (casts).

Other languages in Go's domain don't have this feature because it's too expensive (C), few people need it (Java, JavaScript) and there are library types with overloaded operators (C++, C#, Scala, Rust, Swift, D).

Languages that do have built-in arbitrary-precision arithmetic are very high level, dynamically typed, relatively slow and often math focused (Python, Ruby, Erlang, Haskell, Scheme, Mathematica and other CAS).

Re: cited benefits.

Security and correctness: dangerous operations on int (overflow, truncation) should just trap, because they are likely to be bugs or fatal errors. Silently propagating astronomical values throughout the program isn't any better than silently truncating them. The underlying problem will be obscured either way. I'm sure many Python programmers will agree.
Representing object sizes: objects larger than half of address space should be simply forbidden. Maybe heaps larger than that should be forbidden too. It's not a big deal considering other GC limitations.
Safety of potentially overflowing operations: I'd reach for int128 or int256 first.

bronze1man commented 7 years ago

@nnemkin

If I can easy disable this functional at program level and package level and function level,I think the arbitrary precision can add to golang 2.0 and make them default. Bloat the binaries to 120% is not a big problem for me.

lpar commented 7 years ago

I like this idea.

I also think floats should be decimal and arbitrary precision by default, since so many programmers use binary fixed precision floats in inappropriate situations, such as for currency values. There should be binfloat64 and binfloat32 for the special purpose high speed approximate binary floating point numbers of IEEE 754.

Merovius commented 7 years ago

@lpar I disagree about the floats. Decimal floats have the same, fundamentally unavoidable problem. You are just going to swap people being confused about f/10 with people being confused about f/3 (or sqrt(f), or…). Computers just fundamentally can't handle real numbers and trying harder to pretend they can won't solve this.

bcmills commented 7 years ago

@lpar What would it mean for floats to be arbitrary-precision? In particular, how would you indicate the precision of a float literal? (big.NewFloat in today's standard library implicitly uses the precision of float64.)

It's also worth noting that, unlike int, there is no float type in the language today: users of float must already choose float32 or float64.

Some other kind of floating-point representation — interval arithmetic or unums or some similar approach — might be interesting to consider as alternatives, but that belongs in a separate proposal: it's more-or-less independent of the choice of representation of the int type.

lpar commented 7 years ago

Sorry, didn't mean to imply that floats belonged in the same proposal. That was just an aside, I was floating the idea of decimal floats in case anyone else thinks it would be good to knock out another common source of error as a separate proposal.

tumdum commented 7 years ago

Why make it Go2? Would it not be better to have that kind of numeric type today in Go1? Select a name ('integer' or 'bigint') that does not collide (often?) with existing go code and use it to introduce such type today. This way if and when Go2 happens we will have experience with dealing with such type :)

bradfitz commented 7 years ago

@tumdum, because we would only do this if we could do removals and simplifications at the same time to pay for its addition. We don't want to be a language that just keeps accreting features.

dr2chase commented 7 years ago

@bcmills One choices for arbitrary-precision floats is "Constructive Reals" which have become much harder to demo since browsers quit supporting Java. See http://www.hboehm.info/crcalc/ .

Or also https://android.googlesource.com/platform/external/crcalc/

Basic idea is that a + b returns a function from requested precision to a string of digits. Crcalc does lots more than just addition. One slightly non-intuitive problem with constructive reals is that certain operations take longer than the naive programmer might expect -- if a and b are actually equal but in a way that the built-in hacks cannot determine, (full) comparison is a non-terminating operation.

I have made practical use of these; if you were interested in knowing things about the quality of floating point operations (e.g., transcendental functions) you can use constructive reals to obtain N digits of truth. A real live numerical analysis prof (John Dennis, then at Rice U.) once proposed using them in error term calculations, because checking the error often costs a factor of N fewer operations than computing the "answer", and using exact arithmetic to calculate the error counts out a layer of estimation, resulting in a tighter error bound. We actually implemented this in a Fortran interpreter, and one amusing and not-too-surprising result is that constructive arithmetic on poorly conditioned matrices costs more (because you have to pull more bits out to prevent cancellation errors).

griesemer commented 7 years ago

Please let's keep this focused on the subject at hand (arbitrary precision ints).

typeless commented 7 years ago

I am afraid that compiler could not help much if the arbitrary precision integers are used in function parameters. Doesn't that mean the parameters must always be boxed?

That said, I do strongly hope an integer type, besides int and uint, be added to Go1. A lot of code could then be simplified.

griesemer commented 7 years ago

@typeless An arbitrary precision int type would always use one word (32 or 64bit) when passed as a parameter. See https://github.com/golang/go/issues/19623#issuecomment-287930352 for a possible implementation.

I think there will be a cost for programs overall, but I also think the fears of huge slow-down are overblown.

zigo101 commented 7 years ago

Is the following proposal feasible if this feature gets implemented?

make "range aChannel" also return two values, one message index and one message value, so that the forms of all "range aContainer" be consistent.

faiface commented 7 years ago

@Go101 No. The problem with index over channel is not there, it's just useless. If a channel emmited 1,000,000 messages a second, a int64 counter would overflow after ~300,000 years.

zigo101 commented 7 years ago

@faiface

a int64 counter would overflow after ~300,000 years.

Yes, I think this is one reason in theory why "range channel" only return one value in Go1.

The type of the index returned by range is int, not int64. So if this proposal is implemented, then there will be not the channel message index overflow problem any more.

The index is not totally useless. It is really useful sometimes.

faiface commented 7 years ago

@Go101 int is equivalent to int64 on most machines. When is it not useless?

zigo101 commented 7 years ago

For example, there are M senders and N receivers, when one of the receivers receives a K+th messages, it will notify all senders and receivers to stop messaging. Or print a log for every K messages.

bcmills commented 7 years ago

@Go101 Questions of overflow notwithstanding, assigning a unique index to every send and receive operation would introduce a sequential bottleneck (similar to the one in #17973). The spec as it stands today only requires a happens-before relationship between sends and the corresponding receives, allowing for looser synchronization and better parallelism.

With a fixed-size int type you could at least hope for making the counter update an atomic instruction, but with arbitrary-precision int even that goes out the window.

zigo101 commented 7 years ago

@bcmills I'm not very understanding the issue your referred. But I think there is no needs to assign a unique index to every send. Just adding a numReceiveds field for the internal channel struct is ok.

golang / go

proposal: spec: change int to be arbitrary precision #19623