dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.42k stars 4.75k forks source link

Add an (U)Int128 type #20640

Closed fanoI closed 2 years ago

fanoI commented 7 years ago

While I know it does exist BigInteger sometimes it would be needed to have a type that is explicit 128 bit and nothing more probably it would be more efficient too (at least in a 64 bit machine at it could be represented as a couple of longs).

An Int128 is the "natural" type of GUID, IP6 addresses and more at low level but nevertheless true x86 instruction CPUID returns 4 integers (in 4 registers) that need to be "concatenated" to get the real value, the real "intent" was to return an Int128.

I have doubt if an "alias" could exist if a long is an abbreviation of "longword" and Int128 could be octa as abbreviation of "octaword"? The literal at that point could be "O" or "o"...

svick commented 7 years ago

What are the use cases for this? Both GUIDs and IP addresses already have their own types, so you don't need 128-bit integer types for those.

Are there any languages or libraries that have this type? That could also indicate it is useful.

fanoI commented 7 years ago

C (and consequently C++) have int128 at least as extension in 64 bit architectures. Rust is in the process to add these: https://github.com/rust-lang/rfcs/blob/master/text/1504-int128.md Julia too: http://docs.julialang.org/en/stable/manual/integers-and-floating-point-numbers/

The GUID stuct has done an "Int128" by hand to represent the GUID value at low level: https://github.com/dotnet/coreclr/blob/master/src/mscorlib/src/System/Guid.cs#L28

the more natural constructor of a GUID will be:

var guid = GUID(Int128 value)

I suspect that in other part of the CLR the "(U)Int128" done by hand exist. Note that (U)Int128 at least on x64 could done efficiently using the same "trick" has been used to represent 64 bit long on 32 processor (doing operations on a couple of registers).

Clockwork-Muse commented 7 years ago

The (virtual) pointer size on the IBM System i (iSeries, AS/400) is 16 bytes (possibly due to the architecture using a single-level store, but I don't know quite what they anticipate memory to grow to...). You can't really use it to store a number, though, and most of the legacy customers using the machine would be defining numeric variables by the number of digits, not bytes (especially since the original programming language on the series was intended for accountants).

danmoseley commented 7 years ago

@jkotas

jkotas commented 7 years ago

I assume that this would also include C# language support, etc. It is similar problem as https://github.com/dotnet/coreclr/issues/963 that @KrzysztofCwalina should start tackling soon - may want to keep potential future addition of int128 in mind.

karelz commented 7 years ago

Given that it is cross-cutting, I wonder if it makes sense to track it in CoreFX repo ... however, I am not sure if there is better place for such larger issues to get traction ...

fanoI commented 7 years ago

@jkotas but the idea of https://github.com/dotnet/coreclr/issues/963 is to expose as IL operations add, mul, div for native int and native float (that is some sense already existed in IL... as nint and "F") right? This could be done for int128 / uint128 too? Could this made to work on 32 bit CPU at assembler level?

I've opened the issue here because is where System.Numerics lives (if I'm not mistaking) and I though to a simple struct like thing (there exist already "amateur" int128 implementation in C#) as BigInteger is.

jkotas commented 7 years ago

the idea of dotnet/coreclr#963 is to expose as IL operations add, mul, div for native int and native float

Not strictly required. Depends on the design.

fanoI commented 7 years ago

It is correct that in reality at the IL level the operations between n(u)int and F are perfectly defined but they are not exposed by C#?

mikedn commented 7 years ago

The GUID stuct has done an "Int128" by hand to represent the GUID value at low level:

The Guid struct reflects the UUID format, it's not an Int128 done by hand. If that was the case then probably a simple implementation would have been used (e.g. 2 longs). Besides, Guid is pretty much an opaque blob of bits with no operations defined on it. The value of Int128, if any, is in the operations it offers and not in it being an 128 bit quantity.

I assume that this would also include C# language support

I'm not sure why that would be needed. While having an Int128 might be useful it's unlikely to be that useful to warrant language support.

Given that it is cross-cutting, I wonder if it makes sense to track it in CoreFX repo

I don't think it's cross-cutting. It can be implemented solely in CoreFX. Language support and/or JIT support is nice to have but optional.

It is correct that in reality at the IL level the operations between n(u)int and F are perfectly defined but they are not exposed by C#?

IL does have native int operations that aren't exposed in the language, yes (e.g. you can't add 2 IntPtr in C# even though you can do that in IL). Floating point is a different beast, all operations (+, -, *, /, %) are defined only for type F but there aren't load/store operations for type F like there are for native int.

fanoI commented 7 years ago

Yes this "F" type was a little mystery for me when I've changed Cosmos' IL2CPU to emit SSE instructions for floating point operations... in the end I've decided that for x86 architecture 'F' was the same thing of double (64 bit floating point). So the concept of "native floating point" of which was talking @migueldeicaza is more difficult to have as "F" is not a complete type as IntPtr is at IL level...

Ah another thing maybe obvious but it should exist Int128 and its unsigned equivalent UInt128.

benaadams commented 7 years ago

There is a native compare exchange (LOCK CMPXCHG16b) for 128 bit types on x64 https://github.com/dotnet/corefx/issues/10478 which would require correct alignment to work

tannergooding commented 7 years ago

@benaadams, yes but that could be achieved with a possibly more useful feature, such as an extension to StructLayout that lets you directly control the alignment (not just the size or packing).

In either case, I believe that, while use case is slim, there are probably enough scenarios and algorithms out there that having a 128-byte integer data structure would probably be useful.

I would propose a custom System.Numerics package be created to contain Int128, and any future large integers that get created.

I don't expect single-application memory will really ever grow to above 64-bit (and if it does, I think we have issues), but it is completely reasonable to work with 128-bit integers (several decent, non-cryptographic, hashing algorithms use this many bits).

I think above 128-bits, we would find more and more limited use, where the numbers are essentially limited to specific cryptographic algorithms. That being said, modern x86 processors do include 256-bit and even 512-bit SIMD registers, so those types may be useful to cover as well.

In either case, having the ability to specify alignment would, at the very least, allow users to implement these data structures themselves.

benaadams commented 7 years ago

@tannergooding I did suggest an Atomic type https://github.com/dotnet/corefx/issues/10481 rather than exposing CMPXCHG16b directly which is available in other languages and useful for implementing lock free algorithms

fanoI commented 7 years ago

@tannergooding regarding more than 128 bit I had the idea of a sort of BigInteger but with the possibility to limit its max / min range... I don't know if it is possible.

Regarding the 256 and 512 sizes as SSE registers what do you mean by "having the ability to specify alignment"?

tannergooding commented 7 years ago

Currently users have no way of forcing a specific alignment. However, many data types (such as Int128) may require (or benefit from) a specific alignment (align16, in the case of Int128 or align32 in the case of Int256).

Being able to specify alignment would allow users to implement their own data structures and then have the JIT generate optimized code by convention (such as using two 128-byte loads or a single 256-byte load for align32, depending on what the architecture supports).

benaadams commented 7 years ago

Also use to use LOCK CMPXCHG16B 😉

mikedn commented 7 years ago

Currently users have no way of forcing a specific alignment. However, many data types (such as Int128) may require (or benefit from) a specific alignment (align16, in the case of Int128 or align32 in the case of Int256).

Allowing user to specify alignment and adding an Int128 type are orthogonal issues, neither require or imply the other.

fanoI commented 7 years ago

Yes if understand well it will be an optimization (that is an Int128 could be passed in a XMM0 register instead of been in 4 32 bit register or 2 in X64) this does not mean that the work on (U)Int128 is blocked by this.

mikedn commented 7 years ago

that is an Int128 could be passed in a XMM0 register instead of been in 4 32 bit register or 2 in X64

If you're talking about function parameter passing then that's yet another issue, unrelated to alignment and type size. And it's debatable how beneficial is passing an Int128 in a SSE register, it's' good if you're just passing it around but it may be not so good if you're actually doing operations with it.

tannergooding commented 5 years ago

https://github.com/dotnet/corefxlab/issues/2635 tracks prototyping the Int128/UInt128 (and a few other) types.

DragonSpit commented 5 years ago

One operation 128-bit integer support is useful for is summation of ulong[] and long[] arrays. Since there is no bigger integer type than long or ulong, sum can cause arithmetic overflow, with currently the only alternatives being Decimal and BigInteger. With exposing native CPU support for 128-bit arithmetic, summation of long[] and ulong[] arrays would perform significantly faster than using Decimal or BigInteger, since those types are implemented in software.

CShepartd commented 5 years ago

I hope Int128/UInt128 will really be shipped with 5.0. It's a key feature in many fields where BigInteger is bad choice

jkotas commented 5 years ago

@CShepartd Could you please give us a concrete example what would you use Int128/UInt128 for?

tannergooding commented 5 years ago

Code or just usage examples

Generally real world examples of where and how a new feature would be used as well as metrics on why it is important and why an alternative doesn't necessarily work are beneficial.

Metrics, perf numbers, etc can come from other languages where such a feature (or a similar one) already exist.

For example some reasons why I've been tracking for why Int128 should be added are:

Additional examples of where and how it would be used in real world code would help strengthen the reasoning for adding it.

DragonSpit commented 5 years ago

One example of a substantial performance benefit is one I recently implemented in my HPCsharp nuget package on nuget.org for ulong[].Sum() in SSE and multi-core, which runs at 2.7 GigaAdds/sec on a quad-core with dual-memory-channels, and is also equivalent to C# checked arithmetic implementation (i.e. handles arithmetic overflow inside the algorithm properly and returns a full accuracy Decimal summation result). Decimal[].AsParallel().Sum() runs at 100 MegaAdds/sec, and BigInteger runs at 10 MegaAdds/sec.

Here is a blog that explains the idea with more performance details and code. https://duvanenko.tech.blog/2019/09/23/faster-checked-addition-in-c-part-2/

For my use, 128-bit operations would be useful if they support arithmetic operations, such as addition and subtractions, providing faster internal alternatives to Decimal and BigInteger. At first glance it doesn't look like they do,. Could you confirm this?

Kittoes0124 commented 5 years ago

@CShepartd Could you please give us a concrete example what would you use Int128/UInt128 for?

I would like to implement a 64-bit version of the PCG family of RNGs but currently have to resort to software emulation. The fact that we have to even argue the usefulness of such a feature is absurd; you wouldn't ask us to justify the reasons for supporting 16-bit, 32-bit, or 64-bit integers so why are 128-bit integers any different? It's simple math; anyone who wants or needs to perform multiplication on 64-bit numbers absolutely requires support for 128-bit integers.

The hardware that the vast majority of us write programs for has had this ability since at least the original release of the AMD64 instruction set; it is about time that .NET catches up.

tannergooding commented 5 years ago

The fact that we have to even argue the usefulness of such a feature is absurd

There is a lot that goes into even a "simple" feature like this, including discussions around language support (either immediate or future -- and this goes beyond just C#; as F#, VB, and others may also be impacted), ABI compatibility, desired functionality, how it integrates with the rest of ecosystem, etc.

For Int128 in particular, yes some hardware (namely x86-64 CPUs) have minimal support for it (namely exposing 64*64=128 and 128/64=64 support), but this also has to be supported on CPUs that don't have such support (32-bit CPUs), CPUs where support may lack or differ (ARM64), and it needs to provide functionality that doesn't readily exist on hardware (such as 128*128=128, 128/128=128, etc).

All of it needs to be considered, costed, and weighed against other possible features we could do 😄

Kittoes0124 commented 5 years ago

The fact that we have to even argue the usefulness of such a feature is absurd

There is a lot that goes into even a "simple" feature like this, including discussions around language support (either immediate or future -- and this goes beyond just C#; as F#, VB, and others may also be impacted), ABI compatibility, desired functionality, how it integrates with the rest of ecosystem, etc.

For Int128 in particular, yes some hardware (namely x86-64 CPUs) have minimal support for it (namely exposing 64*64=128 and 128/64=64 support), but this also has to be supported on CPUs that don't have such support (32-bit CPUs), CPUs where support may lack or differ (ARM64), and it needs to provide functionality that doesn't readily exist on hardware (such as 128*128=128, 128/128=128, etc).

All of it needs to be considered, costed, and weighed against other possible features we could do 😄

One recognizes all of the complexity involved and would definitely be willing to contribute to the effort; I just find it absolutely silly that we have to justify "why" here. Other features might have a complicated story but this one is as simple as "We want to do native 64-bit multiplication (and other primitive operations) but, despite the fact that virtually all modern hardware supports the concept, we're forced to implement workarounds in software."

benaadams commented 5 years ago

this one is as simple as "We want to do native 64-bit multiplication but, despite the fact that virtually all modern hardware supports the concept, we're forced to implement workarounds in software.

long Math.BigMul(long, long, out long) https://github.com/dotnet/corefx/issues/41822 😉

Joe4evr commented 5 years ago

I just find it absolutely silly that we have to justify "why" here.

You should read this to understand why things are not that simple: https://blogs.msdn.microsoft.com/ericgu/2004/01/12/minus-100-points/

Kittoes0124 commented 5 years ago

I just find it absolutely silly that we have to justify "why" here.

You should read this to understand why things are not that simple: https://blogs.msdn.microsoft.com/ericgu/2004/01/12/minus-100-points/

I perfectly understand why the feature has yet to be added; what I don't understand is that we're being asked for practical examples of why the types Int128/UInt28 would be useful. They aren't a feature like Spans where we can all debate what the type brings to .NET, they're just bigger integers.

One might ask "Where do we draw the line then? What about types for widths like 256 and 512?" It would seem reasonable, to me, to stop at 128 because that's 2x the native word size at this point in time. If and when we move to architecture that has a bigger word size then the same rule of thumb should be applied there.

brent-williams commented 5 years ago

@jkotas > Could you please give us a concrete example what would you use Int128/UInt128 for? A concrete example, I work on very large storage systems. It's already uncomfortably tight, an aggregate view of a less than two dozen will overflow UInt64 (yes, they really are that big). Support for 128 will take a year or two to work through the stack anyway, so the sooner the corefx work is started the better.

4creators commented 4 years ago

If one of the .NET Core goals is to support high-performance scenarios (I do not mean HPC like in high-performance computing) than 128-bit integrals are a must. It seems that given the pretty advanced stage of Hardware Intrinsics support in Jit there should not really be any major problems in supporting {U}Int128 at least as a part HW intrinsics work.

Obviously the ideal solution would be to expose them in the System namespace as all other base integral numeric types are exposed. I disagree that the implementation of {U}Int128 is such a big work item that it cannot be achieved easily in v5 timeframe and can declare community support at least on my side.

tannergooding commented 4 years ago

The cost isn't in implementing the framework support for such APIs. The cost is in adding language support to places like C# and F#. Integer types directly play into certain language features such as "checked/unchecked" contexts and so cannot be exposed and also expose operators without language support existing simultaneously. Doing otherwise ends up being a breaking change as code .

myblindy commented 4 years ago

One might ask "Where do we draw the line then? What about types for widths like 256 and 512?" It would seem reasonable, to me, to stop at 128 because that's 2x the native word size at this point in time. If and when we move to architecture that has a bigger word size then the same rule of thumb should be applied there.

I was with you until there, but just to add to the discussion 3 days ago clang added support for arbitrary sized integers up to 16mil bits.

While I'll argue that this feature is also directly useful in a framework that's meant to run on a whole slew of cross-platform systems like .Net Core, at the very least it could support smaller power-of-2 integers for feature parity, more than just the register size (and if you're counting SSE/AVX that size is 512 bits).

In fact, since this thread has spanned 3 years and many, many pages of comments all saying how difficult it is to add new integer types, you might as well err on the size of extreme caution and just throw a bunch of integers that should last a long time (think Int1024, Int2048 etc). Remember how long it took for Bill Gates' famous quote "640k should be enough for everyone" to be proven wrong.

Kittoes0124 commented 4 years ago

One might ask "Where do we draw the line then? What about types for widths like 256 and 512?" It would seem reasonable, to me, to stop at 128 because that's 2x the native word size at this point in time. If and when we move to architecture that has a bigger word size then the same rule of thumb should be applied there.

I was with you until there, but just to add to the discussion 3 days ago clang added support for arbitrary sized integers up to 16mil bits.

While I'll argue that this feature is also directly useful in a framework that's meant to run on a whole slew of cross-platform systems like .Net Core, at the very least it could support smaller power-of-2 integers for feature parity, more than just the register size (and if you're counting SSE/AVX that size is 512 bits).

In fact, since this thread has spanned 3 years and many, many pages of comments all saying how difficult it is to add new integer types, you might as well err on the size of extreme caution and just throw a bunch of integers that should last a long time (think Int1024, Int2048 etc). Remember how long it took for Bill Gates' famous quote "640k should be enough for everyone" to be proven wrong.

I actually wholeheartedly agree with you; one would prefer that we all finally accept the fact that arbitrary sized maths isn't just some niche requirement.

filipnavara commented 4 years ago

I was with you until there, but just to add to the discussion 3 days ago clang added support for arbitrary sized integers up to 16mil bits.

The motivation for Clang is support for FPGA architectures that .NET doesn't target and is unlikely to target any time soon or at all. It would require a lot more design changes than just integer types.

and if you're counting SSE/AVX that size is 512 bits

There are already specific types that map to the vector registers (Vector<T>, Vector128, Vector256). Vector512 is missing because there's no AVX512 support yet but it can easily be added without too much hassle just like the other vector types.

arbitrary sized maths isn't just some niche requirement.

That's why BigInteger exists and it's a different use case.

One of the big reasons to support a new type is because it expresses some feature of the underlying architecture that cannot be expressed using different tools, or not in an optimal manner. x64 processors have instructions that directly help with arithmetic on 128-bit integer types. They also have instructions for interlocked operations on them, provided they are correctly aligned in memory (LOCK CMPXCHG16B mentioned above).

Other approaches like https://github.com/dotnet/runtime/issues/17975 may be better fit for supporting the LOCK CMPXCHG16B use case. That approach has the benefit of not relying on specific bit size and offering an easier way to support platforms where 128-bit atomic operations are not available.

That leaves only the arithmetic operations and some sort of convenience as the reasons to introduce the new type. I am not sure whether that alone are compelling enough reasons. The original proposal mentioned GUIDs and IPv6 addresses as examples, neither of which require arithmetic operations (aside from masking where the x64 instructions would be two 64-bit masks anyway).

myblindy commented 4 years ago

Other approaches like #17975 may be better fit for supporting the LOCK CMPXCHG16B use case. That approach has the benefit of not relying on specific bit size and offering an easier way to support platforms where 128-bit atomic operations are not available.

Sure, though I'd point out that the workaround you mention has also been sitting in a queue for 4 years and probably will keep sitting for at least another one. Meanwhile we still don't have 128 bit integer support at the framework level.

That leaves only the arithmetic operations and some sort of convenience as the reasons to introduce the new type

Again, everything you excluded has not even been started to be implemented as a workaround. And arithmetic is kind of the most basic thing you can do with a numeric data type, so it's not like that's useless either.

myblindy commented 4 years ago

Also, since someone brought up Guid as an example where this could be useful, the workarounds required to make that type work without compiler support made some really weird issues crop up:

https://github.com/dotnet/runtime/issues/35622

Seriously, take a moment to read how Guid::op_Equality() is written.

ghost commented 4 years ago

Tagging subscribers to this area: @tannergooding Notify danmosemsft if you want to be subscribed.

cincuranet commented 4 years ago

If/When the (U)Int128 is added, probably also System.Data.DbType should be updated.

christallire commented 3 years ago

so we're still stuck here, eh?

tannergooding commented 3 years ago

so we're still stuck here, eh?

Unfortunately, adding a new integral primitive type isn't as simple as just exposing the type.

There are numerous other considerations, particularly around possible language integration involving constants, literals, and checked vs unchecked contexts. If we were to expose the type today, without any language support, we would actually block ourselves from being able to add language support in the future in a strictly clean and cohesive way. This is namely because, while many of the aspects are somewhat forward compatible, operators and conversions can have slightly different semantics and behavior when they are "compiler defined" vs "user defined".

AraHaan commented 3 years ago

I would not mind if Int128 and UInt128 gets exposed as longlong and ulonglong (for the unsigned version) in C#, VB, and F#.

Atomosk commented 3 years ago

Unfortunately, adding a new integral primitive type isn't as simple as just exposing the type.

That is exactly why it would be nice to have out of the box.

luizfernandonb commented 3 years ago

I think it's a good time to put this into practice

tannergooding commented 3 years ago

This is blocked on either simultaneous language support or the language exposing a way for user defined types to expose checked operators.

If you'd like this, then you'll need to push on the C# language team over on dotnet/csharplang

luizfernandonb commented 3 years ago

I created a thread on dotnet/csharplang about this issue

svick commented 3 years ago

@tannergooding

the language exposing a way for user defined types to expose checked operators

For the record, the championed issue about that (opened by @tannergooding) is https://github.com/dotnet/csharplang/issues/4665.