Open robpike opened 7 years ago
I'm a big fan of this, myself. It would elevate int
(and uint
) to "unrestricted" (for lack of a better word) types, and only the types with explicit sizes (int16
, int32
, etc.) would be subject to wrap around.
In many cases (loop iteration variables) the compiler may be able to prove that an int
is in a certain range and could produce optimal code. In many other cases, the use of int
is not speed-critical.
Let's put this and #19624 in the Thunderdome! Two proposals enter, one proposal leaves...
A minor related point about this: The int
and uint
types were intended to be of the "natural" size for a given platform, typically the register size. If we have true integers we loose that naturally sized type - yet we make use of it in a few places where we want integer algorithms to work well for a given platform (strconv.Itoa
comes to mind). We may want to consider introducing an unsigned "word" type instead (in the past we have also used uintptr
in that role, but that may not necessarily guaranteed to be of register size on all platforms).
Representing an int in a single machine word will be tricky. We run across the same problem we had with scalars being in direct interfaces - the GC sees a word which is sometimes a pointer and sometimes isn't. The only easy solution I see is to reserve some top bit(s) to set for raw integers and disallow those top bits for any heap pointer.
@randall77
Two proposals enter, one proposal leaves...
Actually, I think they're completely compatible. I recommend both!
@randall77
the GC sees a word which is sometimes a pointer and sometimes isn't
Could we fix that with better stack or heap maps? Instead of each word being "either a pointer or a non-pointer", it would be "a pointer, a non-pointer, or an int". I suppose that would require two bits instead of one per address, though ― might bloat the maps a bit.
FWIW, the ML family of languages worked around that issue by making the native types int31
and int63
instead of int32
and int64
. I do not recommend that approach.
@bcmills Yes, two bits per word should work. That just buys us more flexibility to put the distinguishing bits somewhere other than the top bits of the word - not sure if that is worth it.
I love this proposal in abstract, but I'm very concerned about the performance impact. I think it's not by chance that "no language in its domain has this feature". If we use bound-checking elimination has a similar problem, Go compiler isn't very good at it even nowadays, it basically just handles obvious cases, and doesn't even have a proper VRP pass (the one proposed was abandoned because of compile time concerns). Stuff like a simple multiplication would become a call into the runtime in the general case, and I would surprised if the Go compiler could avoid them in most cases, if we exclude obvious cases like clearly bounded for loops.
@rasky Languages likes Smalltalk and Lisp (and more recently, JavaScript) have pioneered the implementation of such integers - they can be implemented surprisingly efficiently in the common case. A typical implementation reserves the 2 least significant bits as "tags" - making an int
on a 64bit platform effectively 62bit in the common (small) case - which is likely more than plenty in most cases (and where it isn't it's either because we need very large integers, or we should be using int64
).
One way of using the tag bits is as follows: 00 means small integer (smi) 01 means that the value is a pointer p to an arbitrary precision integer (at p-1) 10 typically the result of an operation (see below) 11 reserved for other uses or unused
Given this encoding, if we have two int
values x
and y
, we can optimistically assume they are smis. For instance x + y
would be translated into a single ADD
machine instruction, followed by a test of the tag bits. If they are not 00, one (or both) of the operands were large ints and then a runtime call is needed. Also, if there was an overflow, a runtime call is needed. This does add 3 instructions in the fast path, but not more (one conditional jump if overflow, one test of tag bits, one conditional jump if tag bits are not 0 - perhaps more depending on how the runtime call is achieved or if there's a conditional call instruction). It's not cheap, but it's much less expensive than a runtime call in the common case. If both operands were smis, the tag bits remain 00, and the result doesn't need further work. The same is true for subtraction. Multiplication requires a bit more work but is also more expensive. Finally, division is the most complicated one, but also the most expensive operation in general.
It might be worthwhile performing a little experiment where one generates this additional code for each integer addition, using dummy conditional branches that will never be taken (or just jump to the end of the instruction sequence) and see what the performance impact is.
Not a fan of this proposal - currently its quite simple to argue about resulting code and its performance characteristics when doing simple arithmetics. Also - even if losing two bits on 64 bit platform is not important, on 32 bit one it is.
Maybe we could have an arbitrary precision ints implemented in new built-in type (like we do with complex currently)?
Can you discuss how such int
variables would represented in mediums other than RAM, and marshaled/unmarshaled?
Encoding to JSON should be easy and map really well. As far as I know, JSON spec does not place restrictions on size of numbers, so a really large int
would encode as as base 10 number (and vice versa for decoding). E.g.:
{"number": 12312323123131231312312312312312321312313123123123123123123}
Would map to an int
with value 12312323123131231312312312312312321312313123123123123123123
.
What about something like encoding/gob
or encoding/binary
?
@shurcooL
1) printing is obvious (the usual human-readable forms) - applies to JSON
2) encoding/gob could use a private internal representation
3) encoding/binary is for fixed-width numbers at the moment, the proposed int
's wouldn't be - but a var-int encoding could work (though the current format would probably be inefficient).
Re: 3, note that the compiler's import/export format already encodes arbitrary-sized integer values because of precise constants.
@shurcooL @griesemer I believe encoding/gob already uses a variable-length encoding for all integer types.
Go should have an arbitrary precision number type that is more convenient than math.Big. That type should not attempt to masquerade as int/uint, as these aren't just used semantically as "number" but more so as "number compatible with c/foolang code that uses the natural local word size".
The root problem here is that golang's design prevents a library from defining an arbitrary precision type that is as semantically and syntactically convenient as a type provided by the runtime/compiler with internal knowledge and cludges. Solve that problem with golang 2.0, or instead we will find ourselves years from now with countless ad hoc accretions like this.
Edit: I'm a fan of this design/feature in scripting languages. I don't see how it works as the base int type in a systems language to replace c/c++/java. I absolutely think we should have a great and convenient arbitrary precision number type, and think the road to that is a golang where library defined types are not second class to ad hoc boundary flaunting extensions of the runtime and compiler provided types.
@jasonwatkinspdx
It's true that one of the roles of int
in Go 1 is as an analogue to C's ssize_t
. But it doesn't actually matter whether int
is exactly the size of the address space or merely sufficient to cover the address space.
Perhaps more importantly, C APIs don't actually use ssize_t
all that often: they use size_t
instead.
You have to range-check conversions from C.size_t
to int
, because the latter is potentially only half the range of the former. Under this proposal, you'd need the range check in the other direction instead, because C.size_t
would now be smaller than int
. The only way to avoid the need for a range check entirely is by making the Go "size" type have exactly the same range as C.size_t
, and at that point you're looking at a fairly high risk of overflow bugs (especially from decrementing past zero).
@bcmills
But it doesn't actually matter whether int is exactly the size of the address space or merely sufficient to cover the address space.
It does matter. People make design decisions concerning the size of the address space and how indexes into it can be represented in serializations or other transformed values. Making the type that spans the address space arb precision offloads many complexities to anyone that wants to ship data to other systems.
What does a c abi compatible library consumer of golang look like in a world where int/uint is arb precision? Is it really better if the golang side is blind to any issue at the type level and the c side has no choice but panic?
I do see the value of these types, I just don't want them conflated with int/uint. I'm entirely in favor of a numeric tower in golang being the default numeric type, I just don't want it to pretend to have the same name as the 32bit/64bit machine/isa determined types.
People make design decisions concerning the size of the address space and how indexes into it can be represented in serializations or other transformed values.
Can you give some examples? I'm having a hard time thinking of anything other than a syscall that spans a process boundary but should be sized to the address space, and programs using raw syscalls generally need to validate their arguments anyway.
What does a c abi compatible library consumer of golang look like in a world where int/uint is arb precision?
The same thing it looks like today: using C.size_t
instead of int
.
@robpike It looks harder to reduce CPU of golang program when ints become arbitrary precision. 😝 I do not need arbitrary precision ints by default, but I do need golang program to use less CPU to save my money with gce vm server.
programs using raw syscalls generally need to validate their arguments anyway.
I want to capture that in the types, not in a dynamic value range check. I don't think this is an unreasonable expectation of a language that markets itself as a replacement for c/c++/java for systems programming.
I honestly thought it's 11 days more than it really is.
I don't want to lose normal int
performance. I don't believe there's a space/time effective way for an arbitrary precision int
to be, in many cases, comparable in performance to the fixed width int
.
I have nothing against adding an arbitrary precision mpint
type (name does not matter), which the compiler accepts mixed in expressions with normal int
s providing the conversions as needed. IOW, it would be really nice to be able to use standard operators with arbitrary precision integers, yes, but please only explicitly. Please leave int
alone.
What about floats? JSON can have values like 12345678901234567890.12345678901234567890
, with many digits before and after the dot, but IIRC no Go type can accurately represent those. That is, no Go type can do maths with it, I know about json.Number
.
Personally, I would keep int
as it is and instead added a number
type that could represent any number with or without a fractional part.
Please do not make int operation slower by default. I do not need arbitrary precision ints by default, but I do need golang program to use less CPU to save my money with gce vm server.
I do not need some like mpint
with Infix expression eithor, *big.Int
is enough for me.
I don't understand the performance concerns - even if the performance hit would be noticeable in some scenarios, could you not use sized types like int32
or the unsigned word (like the current uint
) mentioned by @griesemer?
I am 100% for this proposal. Worrying about the performance impact of arbitrary precision ints is like worrying about the performance impact of array bounds checks, in my opinion. It's negligible if implemented correctly.
I also like this because of how inconvenient the big package is and how I always fall back to Python when I'm dealing with big integers.
For an arbitrary precision uint, what is uint(0)-1
?
Maybe mpint is enough.
Imho mpint
is a very bad idea. No one would use it, except for when they really need it, because int
is nicer and feels more standard. Since the majority of libraries would still use int
instead of mpint
, integrating two libraries, one of which uses int
and the other one uses mpint
would become cumbersome.
Also, it would be entirely unclear whether to use int
or mpint
in common situations, which contradicts Go's philosophy, which empathizes clarity and tries to reduce programmer's mental overhead.
@faiface If we are talking about Go 2, by then we should have a better type system anyway. At the same time I would really glad if my arithmetic operations would not suddenly allocate memory if a combination of my integer inputs has resulted in N > n^64. Overflow I can live with - this is my problem.
Again - currently instance of something like
type MyIntCombination struct {
FirstInt int
SecondInt int
ThirdInt int
}
has determined (during compile time) size and has good performance because of the alignment. If they suddenly become pointers - the cache locality would be lost.
All of this can be solved using machine dependent int, but I fear that this change would cause fragmentation in our community between those who prefer explicitly sized ints and the ones who prefer the arbitrary precision. For me, It would mean the strict restriction on using int
anywhere in my company code.
But, alas, without working prototype it's all just a speculation. We all remember how alias
proposal went.
While the cost of doing an arithmetic operation on two arbitrary precision ints is important there is also the cost to the garbage collector that needs to be considered. We probably need an extra bit on each heap word to indicate whether it is a tagged word. I suspect we can be clever and it won't be a full bit per word but being clever has a way of complicating things. Even if we are memory clever, if we have large arrays of arbitrary precision ints the GC will need to visit each of them to verify that they are immediates and not pointers. In dynamically typed languages like Lisp, JavaScript, and Smalltalk this isn't a big deal because the GC had to visit the words anyway so the GC's marginal cost was minimal.
Furthermore, the cost to the GC is somewhat amplified because it happens during the mark phase when the GC is already using CPU otherwise available to the application. The less time the GC is in the mark phase the easier it is for the application to get the CPU it needs to meet its deadlines, the less floating garbage, and so forth. This cost is hard to measure but it is real.
Getting a sense of these systemic costs isn't intractable but needs to be done before we can claim we know the full cost of arbitrary precision ints.
On Tue, Mar 21, 2017 at 7:49 AM, Tamás Gulácsi notifications@github.com wrote:
For an arbitrary precision uint, what is uint(0)-1 ? Maybe mpint is enough.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/golang/go/issues/19623#issuecomment-288055090, or mute the thread https://github.com/notifications/unsubscribe-auth/AA7Wn_Uq14yr57DLGVxTB0LPA0ubrdVcks5rn7lngaJpZM4Mi7Wo .
Edit: I was confused if the new int
would be immutable or not.
It is immutable.
@champioj It needs to be immutable, otherwise it's going to be impossible to use correctly.
@tgulacsi
For an arbitrary precision uint, what is
uint(0) - 1
?
A very good question. I would argue that that implies one of two things, and the choice depends on whether the difference between uint
and int
is one of space-efficiency or program invariants.
If the difference is due to space-efficiency, then uint
should not exist. An arbitrary-precision int
is capable of representing all of the values of an arbitrary-precision uint
without loss, so uint
is redundant.
If the difference is due to program invariants — the ability to express "a nonnegative integer" — then subtracting a positive value from uint(0)
should panic (and IMO that would argue for also adopting #19624 for consistency).
Being fast and close to the metal is a great advantage of Go1. Many other languages start with hard to optimize semantics and suffer bad performance forever, shackled by backward compatibility.
It's true that JIT compiled languages can get away with complex semantics and still be relatively fast: they speculate, profile, optimize and reoptimize at runtime. Yet they rarely if ever reach C speeds and always at a cost of great complexity.
Offline compiler can't speculate much, unlikely paths still have to be compiled. This extra code hanging off integer arithmetic ops will inhibit optimizations, bloat the binaries and slow everything down universally.
Hidden cost of int
will be especially glaring if fast and unencumbered int64
is easily available, but unnecessarily ugly (casts).
Other languages in Go's domain don't have this feature because it's too expensive (C), few people need it (Java, JavaScript) and there are library types with overloaded operators (C++, C#, Scala, Rust, Swift, D).
Languages that do have built-in arbitrary-precision arithmetic are very high level, dynamically typed, relatively slow and often math focused (Python, Ruby, Erlang, Haskell, Scheme, Mathematica and other CAS).
Re: cited benefits.
Security and correctness: dangerous operations on int
(overflow, truncation) should just trap, because they are likely to be bugs or fatal errors. Silently propagating astronomical values throughout the program isn't any better than silently truncating them. The underlying problem will be obscured either way. I'm sure many Python programmers will agree.
Representing object sizes: objects larger than half of address space should be simply forbidden. Maybe heaps larger than that should be forbidden too. It's not a big deal considering other GC limitations.
Safety of potentially overflowing operations: I'd reach for int128
or int256
first.
@nnemkin
If I can easy disable this functional at program level and package level and function level,I think the arbitrary precision can add to golang 2.0 and make them default. Bloat the binaries to 120% is not a big problem for me.
I like this idea.
I also think floats should be decimal and arbitrary precision by default, since so many programmers use binary fixed precision floats in inappropriate situations, such as for currency values. There should be binfloat64
and binfloat32
for the special purpose high speed approximate binary floating point numbers of IEEE 754.
@lpar I disagree about the floats. Decimal floats have the same, fundamentally unavoidable problem. You are just going to swap people being confused about f/10 with people being confused about f/3 (or sqrt(f), or…). Computers just fundamentally can't handle real numbers and trying harder to pretend they can won't solve this.
@lpar What would it mean for floats to be arbitrary-precision? In particular, how would you indicate the precision of a float
literal? (big.NewFloat
in today's standard library implicitly uses the precision of float64
.)
It's also worth noting that, unlike int
, there is no float
type in the language today: users of float
must already choose float32
or float64
.
Some other kind of floating-point representation — interval arithmetic or unums or some similar approach — might be interesting to consider as alternatives, but that belongs in a separate proposal: it's more-or-less independent of the choice of representation of the int
type.
Sorry, didn't mean to imply that floats belonged in the same proposal. That was just an aside, I was floating the idea of decimal floats in case anyone else thinks it would be good to knock out another common source of error as a separate proposal.
Why make it Go2? Would it not be better to have that kind of numeric type today in Go1? Select a name ('integer' or 'bigint') that does not collide (often?) with existing go code and use it to introduce such type today. This way if and when Go2 happens we will have experience with dealing with such type :)
@tumdum, because we would only do this if we could do removals and simplifications at the same time to pay for its addition. We don't want to be a language that just keeps accreting features.
@bcmills One choices for arbitrary-precision floats is "Constructive Reals" which have become much harder to demo since browsers quit supporting Java. See http://www.hboehm.info/crcalc/ .
Or also https://android.googlesource.com/platform/external/crcalc/
Basic idea is that a + b
returns a function from requested precision to a string of digits. Crcalc does lots more than just addition. One slightly non-intuitive problem with constructive reals is that certain operations take longer than the naive programmer might expect -- if a and b are actually equal but in a way that the built-in hacks cannot determine, (full) comparison is a non-terminating operation.
I have made practical use of these; if you were interested in knowing things about the quality of floating point operations (e.g., transcendental functions) you can use constructive reals to obtain N digits of truth. A real live numerical analysis prof (John Dennis, then at Rice U.) once proposed using them in error term calculations, because checking the error often costs a factor of N fewer operations than computing the "answer", and using exact arithmetic to calculate the error counts out a layer of estimation, resulting in a tighter error bound. We actually implemented this in a Fortran interpreter, and one amusing and not-too-surprising result is that constructive arithmetic on poorly conditioned matrices costs more (because you have to pull more bits out to prevent cancellation errors).
Please let's keep this focused on the subject at hand (arbitrary precision ints).
I am afraid that compiler could not help much if the arbitrary precision integers are used in function parameters. Doesn't that mean the parameters must always be boxed?
That said, I do strongly hope an integer
type, besides int
and uint
, be added to Go1. A lot of code could then be simplified.
@typeless An arbitrary precision int
type would always use one word (32 or 64bit) when passed as a parameter. See https://github.com/golang/go/issues/19623#issuecomment-287930352 for a possible implementation.
I think there will be a cost for programs overall, but I also think the fears of huge slow-down are overblown.
Is the following proposal feasible if this feature gets implemented?
make "range aChannel" also return two values, one message index and one message value, so that the forms of all "range aContainer" be consistent.
@Go101 No. The problem with index over channel is not there, it's just useless. If a channel emmited 1,000,000 messages a second, a int64 counter would overflow after ~300,000 years.
@faiface
a int64 counter would overflow after ~300,000 years.
Yes, I think this is one reason in theory why "range channel" only return one value in Go1.
The type of the index returned by range is int, not int64. So if this proposal is implemented, then there will be not the channel message index overflow problem any more.
The index is not totally useless. It is really useful sometimes.
@Go101 int is equivalent to int64 on most machines. When is it not useless?
For example, there are M senders and N receivers, when one of the receivers receives a K+th messages, it will notify all senders and receivers to stop messaging. Or print a log for every K messages.
@Go101 Questions of overflow notwithstanding, assigning a unique index to every send and receive operation would introduce a sequential bottleneck (similar to the one in #17973). The spec as it stands today only requires a happens-before relationship between sends and the corresponding receives, allowing for looser synchronization and better parallelism.
With a fixed-size int
type you could at least hope for making the counter update an atomic instruction, but with arbitrary-precision int
even that goes out the window.
@bcmills I'm not very understanding the issue your referred. But I think there is no needs to assign a unique index to every send. Just adding a numReceiveds field for the internal channel struct is ok.
An idea that has been kicking around for years, but never written down:
The current definition of
int
(and correspondinglyuint
) is that it is either 32 or 64 bits. This causes a variety of problems that are small but annoying and add up:int
typeint
values can overflow silently, yet no one depends on this working. (Those who want overflow use sized types.)I propose that for Go 2 we make a profound change to the language and have
int
anduint
be arbitrary precision. It can be done efficiently - many other languages have done so - and with the new compiler it should be possible to avoid the overhead completely in many cases. (The usual solution is to represent an integer as a word with one bit reserved; for instance if clear, the word points to a big.Int or equivalent, while if set the bit is just cleared or shifted out.)The advantages are many:
int
(anduint
, but I'll stop mentioning it now) become very powerful typeslen
etc. can now capture any size without overflowint
without ceremony, simplifying some arithmetical calculationsMost important, I think it makes Go a lot more interesting. No language in its domain has this feature, and the advantages of security and simplicity it would bring are significant.