Closed fanoI closed 2 years ago
What are the use cases for this? Both GUIDs and IP addresses already have their own types, so you don't need 128-bit integer types for those.
Are there any languages or libraries that have this type? That could also indicate it is useful.
C (and consequently C++) have int128 at least as extension in 64 bit architectures. Rust is in the process to add these: https://github.com/rust-lang/rfcs/blob/master/text/1504-int128.md Julia too: http://docs.julialang.org/en/stable/manual/integers-and-floating-point-numbers/
The GUID stuct has done an "Int128" by hand to represent the GUID value at low level: https://github.com/dotnet/coreclr/blob/master/src/mscorlib/src/System/Guid.cs#L28
the more natural constructor of a GUID will be:
var guid = GUID(Int128 value)
I suspect that in other part of the CLR the "(U)Int128" done by hand exist. Note that (U)Int128 at least on x64 could done efficiently using the same "trick" has been used to represent 64 bit long on 32 processor (doing operations on a couple of registers).
The (virtual) pointer size on the IBM System i (iSeries, AS/400) is 16 bytes (possibly due to the architecture using a single-level store, but I don't know quite what they anticipate memory to grow to...). You can't really use it to store a number, though, and most of the legacy customers using the machine would be defining numeric variables by the number of digits, not bytes (especially since the original programming language on the series was intended for accountants).
@jkotas
I assume that this would also include C# language support, etc. It is similar problem as https://github.com/dotnet/coreclr/issues/963 that @KrzysztofCwalina should start tackling soon - may want to keep potential future addition of int128 in mind.
Given that it is cross-cutting, I wonder if it makes sense to track it in CoreFX repo ... however, I am not sure if there is better place for such larger issues to get traction ...
@jkotas but the idea of https://github.com/dotnet/coreclr/issues/963 is to expose as IL operations add, mul, div for native int and native float (that is some sense already existed in IL... as nint and "F") right? This could be done for int128 / uint128 too? Could this made to work on 32 bit CPU at assembler level?
I've opened the issue here because is where System.Numerics lives (if I'm not mistaking) and I though to a simple struct like thing (there exist already "amateur" int128 implementation in C#) as BigInteger is.
the idea of dotnet/coreclr#963 is to expose as IL operations add, mul, div for native int and native float
Not strictly required. Depends on the design.
It is correct that in reality at the IL level the operations between n(u)int and F are perfectly defined but they are not exposed by C#?
The GUID stuct has done an "Int128" by hand to represent the GUID value at low level:
The Guid
struct reflects the UUID format, it's not an Int128
done by hand. If that was the case then probably a simple implementation would have been used (e.g. 2 longs). Besides, Guid is pretty much an opaque blob of bits with no operations defined on it. The value of Int128
, if any, is in the operations it offers and not in it being an 128 bit quantity.
I assume that this would also include C# language support
I'm not sure why that would be needed. While having an Int128
might be useful it's unlikely to be that useful to warrant language support.
Given that it is cross-cutting, I wonder if it makes sense to track it in CoreFX repo
I don't think it's cross-cutting. It can be implemented solely in CoreFX. Language support and/or JIT support is nice to have but optional.
It is correct that in reality at the IL level the operations between n(u)int and F are perfectly defined but they are not exposed by C#?
IL does have native int
operations that aren't exposed in the language, yes (e.g. you can't add 2 IntPtr
in C# even though you can do that in IL). Floating point is a different beast, all operations (+, -, *, /, %
) are defined only for type F
but there aren't load/store operations for type F
like there are for native int
.
Yes this "F" type was a little mystery for me when I've changed Cosmos' IL2CPU to emit SSE instructions for floating point operations... in the end I've decided that for x86 architecture 'F' was the same thing of double (64 bit floating point). So the concept of "native floating point" of which was talking @migueldeicaza is more difficult to have as "F" is not a complete type as IntPtr is at IL level...
Ah another thing maybe obvious but it should exist Int128 and its unsigned equivalent UInt128.
There is a native compare exchange (LOCK CMPXCHG16b) for 128 bit types on x64 https://github.com/dotnet/corefx/issues/10478 which would require correct alignment to work
@benaadams, yes but that could be achieved with a possibly more useful feature, such as an extension to StructLayout that lets you directly control the alignment (not just the size or packing).
In either case, I believe that, while use case is slim, there are probably enough scenarios and algorithms out there that having a 128-byte integer data structure would probably be useful.
I would propose a custom System.Numerics package be created to contain Int128, and any future large integers that get created.
I don't expect single-application memory will really ever grow to above 64-bit (and if it does, I think we have issues), but it is completely reasonable to work with 128-bit integers (several decent, non-cryptographic, hashing algorithms use this many bits).
I think above 128-bits, we would find more and more limited use, where the numbers are essentially limited to specific cryptographic algorithms. That being said, modern x86 processors do include 256-bit and even 512-bit SIMD registers, so those types may be useful to cover as well.
In either case, having the ability to specify alignment would, at the very least, allow users to implement these data structures themselves.
@tannergooding I did suggest an Atomic type https://github.com/dotnet/corefx/issues/10481 rather than exposing CMPXCHG16b directly which is available in other languages and useful for implementing lock free algorithms
@tannergooding regarding more than 128 bit I had the idea of a sort of BigInteger but with the possibility to limit its max / min range... I don't know if it is possible.
Regarding the 256 and 512 sizes as SSE registers what do you mean by "having the ability to specify alignment"?
Currently users have no way of forcing a specific alignment. However, many data types (such as Int128) may require (or benefit from) a specific alignment (align16, in the case of Int128 or align32 in the case of Int256).
Being able to specify alignment would allow users to implement their own data structures and then have the JIT generate optimized code by convention (such as using two 128-byte loads or a single 256-byte load for align32, depending on what the architecture supports).
Also use to use LOCK CMPXCHG16B 😉
Currently users have no way of forcing a specific alignment. However, many data types (such as Int128) may require (or benefit from) a specific alignment (align16, in the case of Int128 or align32 in the case of Int256).
Allowing user to specify alignment and adding an Int128
type are orthogonal issues, neither require or imply the other.
Yes if understand well it will be an optimization (that is an Int128 could be passed in a XMM0 register instead of been in 4 32 bit register or 2 in X64) this does not mean that the work on (U)Int128 is blocked by this.
that is an Int128 could be passed in a XMM0 register instead of been in 4 32 bit register or 2 in X64
If you're talking about function parameter passing then that's yet another issue, unrelated to alignment and type size. And it's debatable how beneficial is passing an Int128 in a SSE register, it's' good if you're just passing it around but it may be not so good if you're actually doing operations with it.
https://github.com/dotnet/corefxlab/issues/2635 tracks prototyping the Int128/UInt128
(and a few other) types.
One operation 128-bit integer support is useful for is summation of ulong[] and long[] arrays. Since there is no bigger integer type than long or ulong, sum can cause arithmetic overflow, with currently the only alternatives being Decimal and BigInteger. With exposing native CPU support for 128-bit arithmetic, summation of long[] and ulong[] arrays would perform significantly faster than using Decimal or BigInteger, since those types are implemented in software.
I hope Int128
/UInt128
will really be shipped with 5.0. It's a key feature in many fields where BigInteger
is bad choice
@CShepartd Could you please give us a concrete example what would you use Int128
/UInt128
for?
Code or just usage examples
Generally real world examples of where and how a new feature would be used as well as metrics on why it is important and why an alternative doesn't necessarily work are beneficial.
Metrics, perf numbers, etc can come from other languages where such a feature (or a similar one) already exist.
For example some reasons why I've been tracking for why Int128
should be added are:
Int128
is an ABI primitive and so the struct packing requirements can't be emulated by a user-defined type. This means users can't manually define an Int128
and then use it in interop scenarios. Int128
is commonly provided in other modern languages. Most C++ compilers (MSVC, GCC, Clang) provide a __int128
extension. Rust provides i128
. Golang has a highly upvoted proposal to addit (https://github.com/golang/go/issues/9455).Int128
that can't currently be provided by the user. These operations (if defined in user code) can add non-trivial overhead to certain loops and cannot properly take advantage of hardware support when available.Additional examples of where and how it would be used in real world code would help strengthen the reasoning for adding it.
One example of a substantial performance benefit is one I recently implemented in my HPCsharp nuget package on nuget.org for ulong[].Sum() in SSE and multi-core, which runs at 2.7 GigaAdds/sec on a quad-core with dual-memory-channels, and is also equivalent to C# checked arithmetic implementation (i.e. handles arithmetic overflow inside the algorithm properly and returns a full accuracy Decimal summation result). Decimal[].AsParallel().Sum() runs at 100 MegaAdds/sec, and BigInteger runs at 10 MegaAdds/sec.
Here is a blog that explains the idea with more performance details and code. https://duvanenko.tech.blog/2019/09/23/faster-checked-addition-in-c-part-2/
For my use, 128-bit operations would be useful if they support arithmetic operations, such as addition and subtractions, providing faster internal alternatives to Decimal and BigInteger. At first glance it doesn't look like they do,. Could you confirm this?
@CShepartd Could you please give us a concrete example what would you use
Int128
/UInt128
for?
I would like to implement a 64-bit version of the PCG family of RNGs but currently have to resort to software emulation. The fact that we have to even argue the usefulness of such a feature is absurd; you wouldn't ask us to justify the reasons for supporting 16-bit, 32-bit, or 64-bit integers so why are 128-bit integers any different? It's simple math; anyone who wants or needs to perform multiplication on 64-bit numbers absolutely requires support for 128-bit integers.
The hardware that the vast majority of us write programs for has had this ability since at least the original release of the AMD64 instruction set; it is about time that .NET catches up.
The fact that we have to even argue the usefulness of such a feature is absurd
There is a lot that goes into even a "simple" feature like this, including discussions around language support (either immediate or future -- and this goes beyond just C#; as F#, VB, and others may also be impacted), ABI compatibility, desired functionality, how it integrates with the rest of ecosystem, etc.
For Int128 in particular, yes some hardware (namely x86-64 CPUs) have minimal support for it (namely exposing 64*64=128
and 128/64=64
support), but this also has to be supported on CPUs that don't have such support (32-bit CPUs), CPUs where support may lack or differ (ARM64), and it needs to provide functionality that doesn't readily exist on hardware (such as 128*128=128
, 128/128=128
, etc).
All of it needs to be considered, costed, and weighed against other possible features we could do 😄
The fact that we have to even argue the usefulness of such a feature is absurd
There is a lot that goes into even a "simple" feature like this, including discussions around language support (either immediate or future -- and this goes beyond just C#; as F#, VB, and others may also be impacted), ABI compatibility, desired functionality, how it integrates with the rest of ecosystem, etc.
For Int128 in particular, yes some hardware (namely x86-64 CPUs) have minimal support for it (namely exposing
64*64=128
and128/64=64
support), but this also has to be supported on CPUs that don't have such support (32-bit CPUs), CPUs where support may lack or differ (ARM64), and it needs to provide functionality that doesn't readily exist on hardware (such as128*128=128
,128/128=128
, etc).All of it needs to be considered, costed, and weighed against other possible features we could do 😄
One recognizes all of the complexity involved and would definitely be willing to contribute to the effort; I just find it absolutely silly that we have to justify "why" here. Other features might have a complicated story but this one is as simple as "We want to do native 64-bit multiplication (and other primitive operations) but, despite the fact that virtually all modern hardware supports the concept, we're forced to implement workarounds in software."
this one is as simple as "We want to do native 64-bit multiplication but, despite the fact that virtually all modern hardware supports the concept, we're forced to implement workarounds in software.
long Math.BigMul(long, long, out long)
https://github.com/dotnet/corefx/issues/41822 😉
I just find it absolutely silly that we have to justify "why" here.
You should read this to understand why things are not that simple: https://blogs.msdn.microsoft.com/ericgu/2004/01/12/minus-100-points/
I just find it absolutely silly that we have to justify "why" here.
You should read this to understand why things are not that simple: https://blogs.msdn.microsoft.com/ericgu/2004/01/12/minus-100-points/
I perfectly understand why the feature has yet to be added; what I don't understand is that we're being asked for practical examples of why the types Int128/UInt28 would be useful. They aren't a feature like Spans where we can all debate what the type brings to .NET, they're just bigger integers.
One might ask "Where do we draw the line then? What about types for widths like 256 and 512?" It would seem reasonable, to me, to stop at 128 because that's 2x the native word size at this point in time. If and when we move to architecture that has a bigger word size then the same rule of thumb should be applied there.
@jkotas > Could you please give us a concrete example what would you use Int128/UInt128 for? A concrete example, I work on very large storage systems. It's already uncomfortably tight, an aggregate view of a less than two dozen will overflow UInt64 (yes, they really are that big). Support for 128 will take a year or two to work through the stack anyway, so the sooner the corefx work is started the better.
If one of the .NET Core goals is to support high-performance scenarios (I do not mean HPC like in high-performance computing
) than 128-bit integrals are a must. It seems that given the pretty advanced stage of Hardware Intrinsics support in Jit there should not really be any major problems in supporting {U}Int128 at least as a part HW intrinsics work.
Obviously the ideal solution would be to expose them in the System
namespace as all other base integral numeric types are exposed. I disagree that the implementation of {U}Int128 is such a big work item that it cannot be achieved easily in v5 timeframe and can declare community support at least on my side.
The cost isn't in implementing the framework support for such APIs. The cost is in adding language support to places like C# and F#. Integer types directly play into certain language features such as "checked/unchecked" contexts and so cannot be exposed and also expose operators without language support existing simultaneously. Doing otherwise ends up being a breaking change as code .
One might ask "Where do we draw the line then? What about types for widths like 256 and 512?" It would seem reasonable, to me, to stop at 128 because that's 2x the native word size at this point in time. If and when we move to architecture that has a bigger word size then the same rule of thumb should be applied there.
I was with you until there, but just to add to the discussion 3 days ago clang added support for arbitrary sized integers up to 16mil bits.
While I'll argue that this feature is also directly useful in a framework that's meant to run on a whole slew of cross-platform systems like .Net Core, at the very least it could support smaller power-of-2 integers for feature parity, more than just the register size (and if you're counting SSE/AVX that size is 512 bits).
In fact, since this thread has spanned 3 years and many, many pages of comments all saying how difficult it is to add new integer types, you might as well err on the size of extreme caution and just throw a bunch of integers that should last a long time (think Int1024, Int2048 etc). Remember how long it took for Bill Gates' famous quote "640k should be enough for everyone" to be proven wrong.
One might ask "Where do we draw the line then? What about types for widths like 256 and 512?" It would seem reasonable, to me, to stop at 128 because that's 2x the native word size at this point in time. If and when we move to architecture that has a bigger word size then the same rule of thumb should be applied there.
I was with you until there, but just to add to the discussion 3 days ago clang added support for arbitrary sized integers up to 16mil bits.
While I'll argue that this feature is also directly useful in a framework that's meant to run on a whole slew of cross-platform systems like .Net Core, at the very least it could support smaller power-of-2 integers for feature parity, more than just the register size (and if you're counting SSE/AVX that size is 512 bits).
In fact, since this thread has spanned 3 years and many, many pages of comments all saying how difficult it is to add new integer types, you might as well err on the size of extreme caution and just throw a bunch of integers that should last a long time (think Int1024, Int2048 etc). Remember how long it took for Bill Gates' famous quote "640k should be enough for everyone" to be proven wrong.
I actually wholeheartedly agree with you; one would prefer that we all finally accept the fact that arbitrary sized maths isn't just some niche requirement.
I was with you until there, but just to add to the discussion 3 days ago clang added support for arbitrary sized integers up to 16mil bits.
The motivation for Clang is support for FPGA architectures that .NET doesn't target and is unlikely to target any time soon or at all. It would require a lot more design changes than just integer types.
and if you're counting SSE/AVX that size is 512 bits
There are already specific types that map to the vector registers (Vector<T>
, Vector128
, Vector256
). Vector512
is missing because there's no AVX512 support yet but it can easily be added without too much hassle just like the other vector types.
arbitrary sized maths isn't just some niche requirement.
That's why BigInteger
exists and it's a different use case.
One of the big reasons to support a new type is because it expresses some feature of the underlying architecture that cannot be expressed using different tools, or not in an optimal manner. x64 processors have instructions that directly help with arithmetic on 128-bit integer types. They also have instructions for interlocked operations on them, provided they are correctly aligned in memory (LOCK CMPXCHG16B
mentioned above).
Other approaches like https://github.com/dotnet/runtime/issues/17975 may be better fit for supporting the LOCK CMPXCHG16B
use case. That approach has the benefit of not relying on specific bit size and offering an easier way to support platforms where 128-bit atomic operations are not available.
That leaves only the arithmetic operations and some sort of convenience as the reasons to introduce the new type. I am not sure whether that alone are compelling enough reasons. The original proposal mentioned GUIDs and IPv6 addresses as examples, neither of which require arithmetic operations (aside from masking where the x64 instructions would be two 64-bit masks anyway).
Other approaches like #17975 may be better fit for supporting the
LOCK CMPXCHG16B
use case. That approach has the benefit of not relying on specific bit size and offering an easier way to support platforms where 128-bit atomic operations are not available.
Sure, though I'd point out that the workaround you mention has also been sitting in a queue for 4 years and probably will keep sitting for at least another one. Meanwhile we still don't have 128 bit integer support at the framework level.
That leaves only the arithmetic operations and some sort of convenience as the reasons to introduce the new type
Again, everything you excluded has not even been started to be implemented as a workaround. And arithmetic is kind of the most basic thing you can do with a numeric data type, so it's not like that's useless either.
Also, since someone brought up Guid
as an example where this could be useful, the workarounds required to make that type work without compiler support made some really weird issues crop up:
https://github.com/dotnet/runtime/issues/35622
Seriously, take a moment to read how Guid::op_Equality()
is written.
Tagging subscribers to this area: @tannergooding Notify danmosemsft if you want to be subscribed.
If/When the (U)Int128
is added, probably also System.Data.DbType
should be updated.
so we're still stuck here, eh?
so we're still stuck here, eh?
Unfortunately, adding a new integral primitive type isn't as simple as just exposing the type.
There are numerous other considerations, particularly around possible language integration involving constants, literals, and checked vs unchecked contexts. If we were to expose the type today, without any language support, we would actually block ourselves from being able to add language support in the future in a strictly clean and cohesive way. This is namely because, while many of the aspects are somewhat forward compatible, operators and conversions can have slightly different semantics and behavior when they are "compiler defined" vs "user defined".
I would not mind if Int128
and UInt128
gets exposed as longlong
and ulonglong
(for the unsigned version) in C#, VB, and F#.
Unfortunately, adding a new integral primitive type isn't as simple as just exposing the type.
That is exactly why it would be nice to have out of the box.
I think it's a good time to put this into practice
This is blocked on either simultaneous language support or the language exposing a way for user defined types to expose checked operators.
If you'd like this, then you'll need to push on the C# language team over on dotnet/csharplang
I created a thread on dotnet/csharplang about this issue
@tannergooding
the language exposing a way for user defined types to expose checked operators
For the record, the championed issue about that (opened by @tannergooding) is https://github.com/dotnet/csharplang/issues/4665.
While I know it does exist BigInteger sometimes it would be needed to have a type that is explicit 128 bit and nothing more probably it would be more efficient too (at least in a 64 bit machine at it could be represented as a couple of longs).
An Int128 is the "natural" type of GUID, IP6 addresses and more at low level but nevertheless true x86 instruction CPUID returns 4 integers (in 4 registers) that need to be "concatenated" to get the real value, the real "intent" was to return an Int128.
I have doubt if an "alias" could exist if a long is an abbreviation of "longword" and Int128 could be octa as abbreviation of "octaword"? The literal at that point could be "O" or "o"...