Open c-ohle opened 2 years ago
may require multi-threading
There are so many multi-threading examples in the test app alone.. Starting with Mandelbrot, The 3D animations, most of the realtime (at animation time) mesh calculations ranning multithread, model building from tesselation extrusions, text objects etc up to the really expensive CSG ops. This is all BigRational, testing the VectorR - Vector interaction becouse I use your Vectors in the 3D engine.
(And the 3D engine is also self written as you can check.)
And would we get only little support from the compiler devs. That it is not more necessary to have such "or" expression. All problems would be solved and also BigInteger cuold easily 10 times faster in expressions with only some lines code as I can show in my implementation. This support is not a big logical impact in the system. It must not even any bublic new future. It could be a simplle call at start and end of a parse sequence to special name marked notify function and it would be done. This would help for so many other solutions too, from string building up to classes that can reset her caches in a determinated way. It works also over function call, I found ways that it works withoiut problem together with debug visualizer and and and
And it is super safe in the meaning, you saw it byself, the upcoming issues with such builder in expressions.
[SpecialNameAttribute]
private static void op_Help(bool begin) { }
I'm not saying that the CPU
setup doesn't work as intended nor am I saying that it isn't performant. I'm saying that it doesn't look like a good fit for the BCL. There are many reasons iterated for this above including the complexity, the new pattern, the naming conventions used, concerns around usability and limitations enforced due to its use of AsyncLocal
or ThreadLocal
data, etc.
Types, especially number types, provided by the BCL should generally be basic building blocks on which other devs can build their own algorithms and functionality. They should be simple/easy to use, they should meet existing expectations of numeric types, they should play well with existing algorithms, they should play well with the new generic math feature, etc. Such types ideally expose properties and fields that users will expect on the type. For example, a Rational
like number is typically expected to directly expose a Numerator
and Denominator
property and since they are properties accessing the data should typically be cheap. A BigFloat
or BigDecimal
like type however has no such expectation, its only expected that the former behaves like other IEEE 754 types and the latter like an arbitrary length base-10
type.
After all the "basic points" around usability and meeting user expectations are considered, then we can take a look at perf and how we can internally optimize or provide an additional set of types/APIs to accelerate the base type. For BigInteger
, the biggest cost today is that around long chains of expressions resulting in many temporary throwaway buffers. This is essentially the same "cost" that exists for other "large immutable" data, such as ImmutableArray<T>
, or even "dynamically sized" data, such as List<T>
. One way to help solve that is improved pooling usage another is to provide a mutable version of the type that can be incrementally built on before it is "frozen" (i.e. the Builder
pattern). There are of course other options as well such as your CPU
based approach or building expression trees and having the "execution" of the expression tree do optimizations to minimize the total work done.
Each of these approaches has differing upsides and downsides and its up to the BCL to pick one that can be exposed to users such that they have the highest chance of success and where success is defined using multiple metrics, not just having the code be as fast as it can be. Having the code be understandable and maintainable over time tend to be more important overall, particularly in large scale applications and services.
I am more than happy to look at providing a Rational
like type to help cover the user needs here, but the backing implementation for it needs to consider all the things detailed above as well as other considerations dictated by the Framework Design Guidelines or API review. That's what I'm ultimately trying to push and drive towards here, is reaching a common ground where we can define a type that API review will approve and that can be reasonably implemented and then used by customers so they can achieve success.
I agree, and I've been working all day (we have 18:00) to find a solution for exactly this. This is maybe more important than fine tuning in the algorithms what I would prefer todo.
You are maybe scaptical and you have to be skeptical. But all the criteria we spoke about, security, conventions, styling rules up to details like T op<T, T>... (change naming conventions and improve protection no problem) I know that the current approach completely fullifies this. Epecially also in the expressions... If I could find a solution to avoid the "or" trick and also this with a clean and save solution.
I assume you have something like a test environment for INumber.
There is a NuGet package: "BigRationalNET7preview" (link)
You could check and just for fun (the or operator is still in) a same calculation with x = 0 | ...
.
The effect is amaizing.
Relying on tricks like that is a non-starter. It will never pass API review.
it's a trial
@tannergooding With request to check. String formatting - does it fulfills the requirements?
Standard numeric format strings (link used as spec)
Custom numeric format strings (link used as spec)
Default ToString()
Of course, all this can be changed quickly. Full custom format support could be included. The BigRational.Parse versions read all specials without extra settings back. (repetition form, "Q" format). All in detail + examples
@tannergooding update
public readonly ref struct SafeCPU
(code, docu) that cannot be used in asynchronous methods, asynchronous lambda expressions, query expressions, iterator blocks or inside non-static nested functions. The overhad is minimal, no performance lost, it is already tested in all possible configurations of NET6, NET7 and VS versions. No side effects, all perfect. That I didn't think of it myself - was to simple.@tannergooding please - with request to check.
A new Builder solution for BigRational
and BigInteger
- but does it fulfills the requirements?
(It's also a perfect substitute for the "OR" trick.)
How it looks to the user:
BigRational a, b, c, d, x;
x = a * b + c * d; // standard notation, 3x alloc/normalization
x = (BigRational.Builder)a * b + c * d; // one final alloc/normalization, up to 3x faster
x = (BigRational.Builder)a * b + (c - 1) * BigRational.Sin(d); // parentheses and function calls inclusive.
To test and show I added a small BigInteger
class called BigInt
(code) where it works exactly the same:
BigInt a, b, c, d, x;
x = a * b + c * d;
x = (BigInt.Builder)a * b + c * d;
x = (BigInt.Builder)a * b + (c - 1) * BigInt.Pow(d, 100);
So far it should conform to NET standards, except that it also works for parenthesized expressions and function calls.
However, this is easy to change, but would require writing more builder casts explicitly, which quickly becomes confusing.
How it works:
public readonly ref struct Builder
- same principle like for the current public readonly ref struct SafeCPU
- thread safety guaranteed etc.Builder
might be confusing, it's more of a BuilderActivator
that delegates the work to the internal Builder solution. This allows to use standard notation and the existing function set without additional public Builder classes with here difficulties. The implementation: (complete code)
public readonly ref struct Builder
{
readonly BigRational p; // the one and only field
public static implicit operator Builder(BigRational v); // set of implicit's
public static implicit operator Builder(long v);
public static implicit operator Builder(double v);
//...
public static Builder operator +(Builder a, Builder b); // set of operator's
public static Builder operator -(Builder a, Builder b);
public static Builder operator *(Builder a, Builder b);
//...
public static implicit operator BigRational(Builder v); // finally, "ToResult()", as implicit cast
}
Almost identical, the code for a BigInteger solution.
BigInteger
with BigInteger.Calculator
behind is possible even if it more difficult as it not designed for this. But there is a simple way using a swap buffer approach using two BigInteger.Calculator
instances. It would be a limited solution, that works only on one layer at time, not over function calls etc. but probably enough for common use. BigRational
, this approach would allow the SafeCPU
to be kept internal, but the user has a way of reaping the performance benefits and memory efficiency. Some more public static functions for fast binary rounding, precision checks, etc. and it would also be possible to write reasonably efficient functions for, say, Taylor series. It cannot replace the SafeCPU
with all bit-level tuning possibilities, avoid mem copys by using best-matching instructions, use stack entries as registers, etc. This is what the Builder approach cannot automate.An expression x = a * b + c * d
is really:
BigInteger t1 = a * b;
BigInteger t2 = c * d;
x = t1 + t2;
Inserting a cast such as the following is problematic for a few reasons:
x = (BigInt.Builder)a * b + c * d;
To start, this only effectively changes the expression to be:
BigInteger.Builder t1 = new BigInteger.Builder(a);
t1 *= b;
BigInteger t3 = c * d;
x = t2 + t3;
It cannot optimize or handle t3
because there is no way to access the originally created builder. You cannot statically or globally cache a backing "stack" because there is no guarantee that some expression t1 = a * b
and some expression t2 = c * d
are sub-expressions of the same higher expression. They could be completely unrelated expressions with the user interleaving operations.
Additionally, casting from a T
to a T.Builder
is not really a desirable operation. This conversion has cost and will allocate/copy because a Builder
needs to assume it can mutate the backing array, so it can't just take the data "as is". We typically want to have such costs more readily visible and having constructors or other patterns for creating the builder from a T
tend to be better. Having operations that do T.Builder op T
is fine however because the RHS is only used as readonly data.
I strongly believe that the stack based approach is not the right approach here. If publicly exposed, it forces users into a more complex pattern and while there is indeed some benefit to such an approach, the complexity cost for something that is meant to be a "simple to use number type" is too high. If used as an internal implementation detail, it becomes far too restrictive on what developers can do. Both of these lead the common usage scenarios into being pits of failure.
Because of this, such functionality would be better suited to a third-party package so that the more advanced/complex path is opt-in and only for when the user really needs it.
@tannergooding thanks for response.
No big discussion is necessary at this point. If it's not good enough, I'll have to find a better solution. More interesting, the other general questions - string formatting - the other Builder type options. What do you think?
Just as note:
BigInteger.Builder t1 = new BigInteger.Builder(a);
It is simply impossible to construct such cases because there are neither constructors nor functions. Operators only, but using this we're back to standard notation and everything is fine and the behavior is clear defined.
It cannot optimize or handle t3...
Regardless of parentheses and function calls, the compiler always generates a flat sequence of operator calls when parsing. The explicit cast from number to number.builder and implicitly back is always the first and the last event in the sequence. The problem is rather the opposite - the operator has no reference to parentheses - difficult to switch the optimizaton of.
no guarantee that some expression t1 = a b and some expression t2 = c d...
Therefore not possible without a cast: t1 = (Builder)a b; The case: builder t1 = (builder)a b; is something that looks dangerous, but improper use can be detected at runtime.
This conversion has cost and will allocate/copy...
Since it is a ref structure, it costs nothing, in fact only the operator calls are needed as an event. But of course it's helpful to host a reference to the value for quick access.
It is simply impossible to construct such cases because there are neither constructors nor functions
I called out that this isn't the type of scenario where a cast is desirable due to the hidden cost. The standard pattern is for a conversion via constructors/functions.
Regardless of parentheses and function calls, the compiler always generates a flat sequence of operator calls when parsing
That is an implementation detail, not a guarantee. You have an expression that results in at least two intermediates needing to exist (one for a * b
and one for c * d
) before the addition can occur. Whether these intermediates exist in a register, in a local, or in some stack (concrete or virtual) is an implementation detail and the IL happening to itself use a "stack based" approach to represent these temporaries is often not how it is compiled down in practice.
Therefore not possible without a cast: t1 = (Builder)a b; The case: builder t1 = (builder)a b; is something that looks dangerous, but improper use can be detected at runtime.
Detecting improper use at runtime, rather than compile time, is often viewed as a "pit of failure". It means that it is much easier to miss an issue or introduce a bug.
Since it is a ref structure, it costs nothing, in fact only the operator calls are needed as an event. But of course it's helpful to host a reference to the value for quick access.
The ref struct doesn't have overhead but it must track the "storage space" for the computed values. Due to this being "big data" and the builder needing to grow to fit the size of the result, this must be either some GC or Native memory allocation (with it likely being a GC allocation due to the potential pit of failures around handling Native allocation lifetimes).
Every builder is, therefore, at least 1 pointer (4/8 bytes) in length. Realistically, it is likely 12-16 bytes instead, as we typically need to track the backing data, the length, and some flags. You can of course do tricks like tracking just a byte[]
and having the length, flags, and data all be "inline" (much like string
/array
do internally), but that also makes things quite a bit more complicated to implement/maintain.
Every builder likewise needs to support "growing" the backing data for the array. This means that you cannot (without flags tracking that the data cannot be mutated) do a "zero cost" conversion of a T
to a T.Builder
. Likewise, since a * b
is likely going to produce something that is bigger than either a
or b
you often need another allocation anyways. Unlike the T
, the T.Builder
however can be allowed to track a "too big" allocation and therefore you can avoid this additional allocation by presizing the builder to be "big enough" (much as you can do with a List<T>
).
There are many considerations that have to be taken into account around usability, where allocations come into play, where those allocations actually have cost vs where they won't show up, etc.
@tannergooding
The standard pattern is for a conversion via constructors/functions
Yes for real Builder solution and yes the name Builder for this thing is misleading. I had no better idea how to call it. "BuilderActivator", "BuilderDelegator", "TempValue", ...
That is an implementation detail, not a guarantee...
The approach is completely independent from this since all depends on the initial- and the final cast and this must happen first and last. It's actually only a safe workaround to simulate the missing op_Assign operator. Also the operators and conversions, there is no science behind, it is only to provide a NET conform and safe interface. Something better than the "OR" trick.
The ref struct doesn't have overhead but it must track the "storage space"
Not really, as ref struct the life time is short, the buffer or builder is there anyway and only needed for back cast's. There is no need for further information like flags etc. And as you mentioned, we can not relay on this and this is exactly the reason why I did not such things in the example implementations.
Every builder likewise needs to support "growing" the backing ...
Behind it remains all the same. It is all only to avoid unnecessary mem allocs, copy's and normalizations.
The approach is completely independent from this since all depends on the initial- and the final cast and this must happen first and last.
The point is that it is functionally equivalent to the example I gave and so there aren't "enough" casts/conversions. A second builder may be needed for c * d
if you want to reduce the allocations as much as possible.
Not really, as ref struct the life time is short, the buffer or builder is there anyway and only needed for back cast's.
You must have a storage space to track the result. Given a * b
you must have a destination buffer that is at least a.Length + b.Length
in width. In order to reduce allocations across many operations you must be able to use something as both a source and destination, therefore mutating it. This generally also necessitates "overallocating" (that is allocating a buffer that is larger than the stored data is big). This in turn needs you to track the amount of the allocated buffer that is actually in use. If you want creating a T.Builder
from an existing T
that also necessitates tracking a flag that means "I can't mutate this and must allocate".
Behind it remains all the same. It is all only to avoid unnecessary mem allocs, copy's and normalizations.
I was trying to elaborate that the ref struct only removes an unimportant and unexpensive short lived allocation. While removing such allocations can sometimes pay off, it will not meaningfully impact most operations and comes with a disadvantage around utilizing the builder in more complex scenarios (including around async code). The cost around things like BigInteger
today are around repeated allocations and copying for complex expressions. Its the same cost that comes from allocating a List<T>
with a capacity of 4
and then adding 1000 individual items to it. You end up resizing the array and copying the data around 9-10 times, where presizing the capacity to 1000 means you remove all that overhead.
@tannergooding (the movs between calls removed)
x = (BigInt.Builder)a b + c d;
00007FFD7B16B566 call System.Numerics.Rational.BigInt+Builder.op_Implicit(System.Numerics.Rational.BigInt) (07FFD7B62C030h)
00007FFD7B16B580 call System.Numerics.Rational.BigInt+Builder.op_Multiply(Builder, System.Numerics.Rational.BigInt) (07FFD7B62C0D8h)
00007FFD7B16B59A call System.Numerics.Rational.BigInt.op_Multiply(System.Numerics.Rational.BigInt, System.Numerics.Rational.BigInt) (07FFD7B620AC8h)
00007FFD7B16B5B4 call System.Numerics.Rational.BigInt+Builder.op_Addition(Builder, System.Numerics.Rational.BigInt) (07FFD7B62C0A8h)
00007FFD7B16B5C7 call System.Numerics.Rational.BigInt+Builder.op_Implicit(Builder) (07FFD7B62C138h)
@tannergooding Please, this is only a option to provide more performance. The solution itself does not depend on it. The other 3 Builder types - the questions regarding string formatting, also questions regarding INumber, this would be helpful.
@tannergooding To make sure I can give verified statements I checked everything again, latest NET 7 code BigInteger
, ArrayPool<>.Shared
etc. even latest Roslyn code.
The ref struct Builder
is gone. For realistic tests and benchmarks I started add a new external type BigInt
(sourcecode) that will get exactly the same functions and interfaces then System.Numerics.BigInteger
.
In principle there is no difference. static class BigIntegerCalculator
uses ArrayPool<>.Shared
memory - SaveCPU
, should been seen as a black box or VM and is a shared object by self.
The CPU internal buffers are private, used as work memory, allocated on demand.
BigInteger and BigRational, both systems using ThreadLocal static root(s), however ArrayPool<>.Shared.Rent() is measured min 5x slower then a simple ThreadStatic field access and BigIntegerCalculator need many rent's where the CPU has the private buffers for this.
Where is the problem with BigRational
memory management?
That and the efficiency of the systems already make a difference in average performance of over 20%. This for simple integer arithmetic - for rational arithmetic the effect is much bigger. With 0 | (a * b + a * b - a + a * a) / a;
the same benchmark shows 70% but this only as info about what would be possible.
Formula to have all bottleneck functions (incl. a sqr) in: (a * b + a * b - a + a * a) / a;
(all latest benchmarks)
Maybe not related to this issue, but BigRational
is not the best choice in calculating Mandelbrot. We need a BigFloat
, which is just extending from double
further to have more significands.
I don't think it's possible to have constant number of variable for irrational number. For rational it possible for just 2 number so we can make a struct out of it. But irrational realm can be chained and so I think it not worth to support them in language
@acaly yes of course. mainly intended to get a test case with massive calculations and a width range of number complexity and to test thread safety. But there is another aspect. BigRational acts like a rational number. The CPU however is a general solution with very special bit level operations and can emulate integers and floating point arithmetic. There are for example fast binary round ops to emulate floating-point behavior: forget the last bits. Tools needed for example in Taylor series where interime results quickly would get much more precision then then necessary for the result precisission. For rational all other number types are only special cases, Integer: denumertor=1, for FP den = 2^n, decimal 10^n etc. The CPU however can handle all cases, also and, or, xor, etc. can also emulate non existing FP types with short seqences of cpu op's.
@Thaina The day someone invents an irrational type of number, I will delete the repository. Yes of course, it's more to have a arbitrary number type for at first, precision in algorithms sensitive for epsilon and robustness problems, geometrics cut of lines, planes etc, intersections always perfect exact - fp can only estimate and wrong estimation and the algorythm fails. Such things. Other case is, when it strarts with irratonals, to have a solution to calculate in any desired precision range to emulate FP with much higer precision. So much possebillities. The cpu has already support for Int256, Int512, Int1024,... one fast op for all int/uint conversions that have the same cheme. So much possebillities with minimal code.
I want a BigDecimal like this one https://github.com/AdamWhiteHat/BigDecimal
But with more functions than pow(). It needs also a BigDecimal of Pi, e, Sin(), Cos, Tan(), etc.
@theuserbl Hi, Meanwhile, BigRational supports all math functions. Results rounded to appropriate decimal places and it matches BigDecimal numbers and their behavior with better performance and memory efficiency.
An IEEE-754 value struct template Type: Decimal<T>
is under development.
Like the already implemented Float<T>
, Int<T>
, UInt<T>
... Decimal<T>
will support fixed sizes eg. 64, 128, 256,... bytes with appropriate precision. The advantage of these types is that no GC allocations are necessary at runtime. The BigRational calculation core is used and so the effort for the templates is minimal.
@c-ohle Nice. But why haven't you added a pull request here? Isn't it better to have BigRational integrated in .NET and possibly later in PowerShell ?
Possibly the .NET maintainers have some terms, like that "Copyright (c) 2022 Christian Ohle" will be changed to "Copyright (c) .NET Foundation and Contributors". And possibly a rename to BigDecimal. Because that makes more sense. There existing Integer and BigInteger. But with BigRational, there dont existing a normal Rational. But .NET have a normal Decimal.
As per our API review process, all new API surface area must first go through and be approved by the API Review team: https://github.com/dotnet/runtime/blob/main/docs/project/api-review-process.md
As per the discussion above, the area owners do not feel the proposed surface area/implementation are the right fit for inclusion in the BCL itself.
There is some interest in providing some type of BigDecimal/BigRational
, but it needs to fit in with the .NET Framework Design Guidelines and other types exposed by the BCL in the numerics area.
@tannergooding Hi, something completely new is coming soon. If you want we could work together.
@tannergooding a new BigRational class - in short:
BigRat.cs (single file solution, just copy and try)
**Would that meet the requirements?***
New approach:
stackalloc
instead of a stack-machine.NaN
or Infinity
.
(Makes not much sense to emulate floating-point behavior)This as a reference version. Major improvements/optimizations are possible. Of course, the stackallocs will be secured. And a cache would bring other big performance improvements, especially for the irrational functions.
@c-ohle Nice. But I still think, that it is the wrong place, to publish the code.
The right place is, to add the code per "Pull requests" and not posting it only under "Issues".
Putting up a PR is not the right approach. All new APIs must first have their open proposal which is accepted by the area owners and then taken to and approved by API review. If all that happens, then a PR is allowed to be put up.
The entire process required is seen here: https://github.com/dotnet/runtime/blob/main/docs/project/api-review-process.md
I'm still not a fan of the overall approach being used and don't believe it fits or considers the needs/scenarios at large. It has taken into account some of the more fundamental issues that were raised initially. However, the code as is not super readable, which makes it difficult to review and determine if it is acceptable or not (lots of one liners and other things that violate the general coding style/rules for the repo): https://github.com/dotnet/runtime/blob/main/docs/coding-guidelines/coding-style.md
If it were in box, it would most likely not support downlevel frameworks. It would be .NET 8+ exclusive and the names of APIs would need to follow the general Framework Design Guidelines around naming: https://learn.microsoft.com/en-us/dotnet/standard/design-guidelines/naming-guidelines
Not supporting concepts such as Infinity
immediately cuts out a number of core scenarios. The same could be said for NaN
, but that is somewhat more debatable as the concept that NaN
supports could be handled a bit differently instead. The handling for irrational values also exposes various complications and makes the use of the APIs potentially non-deterministic or lossy depending on the scenario in question.
Background and motivation
I want to suggest a novel arbitrary base number type for namespace System.Numerics.
I love C#. I miss a base number type for arbitrary precisission that is really usable in practice.
Such a number type should have the properties of a System.String:
a simple binary unique always normalized immutable readonly struct in machine-word size.
This would ensure that it can be passed around internally as quickly as possible and makes it efficient to use it for client types like vectors and matrices.
The general problem with NET and arbitrary number types.
It is so nice and easy to implement custom number types in C# by define numeric operators and implicit or explicit conversions.
For value struct types it works perfectly as long as they have a fixed size.
However, there are major problems for any kind of arbitrary precision type.
No problem if we have a single operation like: a = b + c;
The the + operator function can alloc the necessary memory resources, create and return a final object as needed for a.
But in practice we have expressions like this: a = (b c + d e) / f;
We usually have many intermediate results, much more than final results, but the intermediate results don't require memory allocs and setting up fully qualified, normalized final result objects.
The C# compiler simply generate code that calls the operator functions and the operator function gets no indication that only an intermediate result is needed, which could be managed in buffers.
BigInteger and BigRational.
Both types are closely related.
They need the same arithmetic big-integer calculation core whereby, for efficient rational arithmetics it needs some more bit level functions..
It looks like a logical consequence to setup a BigRational type from two BigInteger.
But that only doubles the problems:
BigInteger has already a 2 machine-word size, a BigRational based on BigInteger gets 4.
A Vector3R with X, Y, Z gets 12 and so on.
For every interimes result we get 12 times allocations and unnecessary normalizations.
This is far from any efficency.
The new approach, a BigRational type for NET that solves the problems
To bypass the mentioned problems I want to propose a general new system with significantly better performance and better memory usage. At first the data structure:
This is all, BigRational has only one array field and therefore the desired machine-word size and has furthermore the immutable properties like a string object. A nested public class CPU what represents a stack-machine and encapsulates the calculation core. Nested as only class that has access to the private BigRational creator and access to the data pointer - for safety. Everything as small and secure as possible, no additional system resources required. Not even predefined default values, nothing.
The data format used in the BigRational uint[] data pointer is the same as used in the stack-machine stack entries and is also simple as possible:
The first uint a sign bit and the number of following uint digits for the numerator, followed by a uint with the number of unit digits for the denominator.
This sequential format has the benefit that the internal integer arithmetics need only one pointer and not ranges, conversions, case handling etc. and this allows the best possible troughput.
A BigRational with null data pointer is defined as the number Zero (0) and therfore the value struct default is a valid number.
The Interfaces
BigRational itself acts like a empty hull to encapsulate the data pointer and provides a save first level user interface for a number type with all the number typical operators:
Documentation for the particular functions at GitHub
For best possible compatibility the high level functions like several
Round
,Truncate
,Floor
,Ceiling
, a set ofLog
andExp
functions up to trigonometric functionsSin
,Cos
,Tan
,Atan
,... specials likeFactorial
etc.These all in a static extra class MathR as mirror to System.
Math
and System.Numercs.MathF
and similiar as possible.This is to reflect the old known standards and to make it easy to change algorythms from for example double to BigRational or back, to check robbustness problems.
A look at the BigRational and MathR functions always shows the same scheme.
Gets the task_cpu and execute instructions on it:
All calculation happens in the CPU object. The high-level API functions are all implemented as short or longer sequences of CPU instructions. It all breaks down in such sequences and this is in general short and fast code and allows a rich API with less memory usage. Some example client types using this system: ComplexR, Vector2R, Vector3R, Matrix3x4R, PlaneR, VectorR. A special is VectorR, This struct has also only one pointer containing a index and a sequence of the rational data structures and can therefore store lot of numbers very efficently. It is an example for client types taht it is easy using BigRational.CPU but independent from BigRational, using a private data format.
The CPU - a stack-machine
Like every stack-machine the CPU has instructions like
push
,push(double)
,push(int)
,... and pop's likepop(),
BigRational popr()
,double popd()
,... self explaining instructions likeadd()
,sub()
,mul()
,div()
,mod()
,neg()
,abs()
,inv()
,dup()
. But there is a difference:Since we have numbers with arbitrary size
mov()
instructions are not efficient because it would imply many memory copies.Instead of this it fast as possible to swap the stack entry pointer only since the stack is a uint[][] array. This explains the versions of
swp()
instructions.A pop command frees nothing. It only decrements the index of the stack top and the buffer there keeps alive for next operations. Allocs are only necessary if the current stack top buffer is not big enough for the next push op but this is a successive rar case. Not only for the high-level API, the stack-machine itself uses the stack excessively for all internal interimes results and therefore dont need stackalloc or shared buffers. It forms a consistent system and enables fast and compact code without special handling.
The CPU class reduced to the public instruction set:
It is noticeable that there are many different versions of the same instruction. Since we have number values with arbitrary size it is all to avoid unnecessary memory copies. For example, to add two BigRational numbers it is possible to push both values to the stack, and call a simple add(). But since the data structure on the stack is the same as in BigRational pointers the CPU can access and read the data pointer directly and push the result only on the stack what saves two mem copies and is therfore faster. Other operation versions working with absolute stack indices what is more efficent in many situations. For example if one function pushes a vector structure and another function takes this as input.
The Efficiency
On an example. The CPU implementation of the sine function, which can also calculate cosines.
This function uses several identities and a final Taylor-series approximation.
The code looks ugly, but it reduces the computation to a sequence of just 36 short CPU instructions. It means that providing a rich set of such common math functions already at this level doesn't involve much overhead.
To come back to the begin, the problem with BigInteger and a Rational type based on BigInteger. Imagin the same sine calculation with such type. Where every instruction involves not only two new full allocated BigIntegers for a result, Rational itself needs as example, for a simple addition 3mul+1div+1*add and this are all new allocated BigIntegers. But not enough, Rational has to normalize or round the fractions - additional many new BigIntegers. It should be obvious that this is inefficient, slow and leads to an impractical solution.
To show the differences I made the Mandelbrot example in BigRationalTest app. It does exactly this. A conventional BigRational class based on a BigInteger for numerator and denominator is part of the Test app and called OldRational. At the begin BigInteger profits as it can store small numbers in the sign field but then, deeper in the Mandelbrot set, it slows down exponentially before it stops working in a short break between endless GC loops.
The bottle-nack for rational arbitrary arithmetic is the integer multiplication and for the normalization the GCD function and the integer division. The first benchmarks showing that the BigRational calculation core is aproximatly 15% faster then the equivalent functions in System.Numerics.BigInteger. At least on my machine. With other words, using BigRagtional for pure integer arithmetics can improve the performance already. For integer algorithms using the CPU the effect is much bigger of course..
BigInteger in NET 7 will use Spans more stackalloc and shared buffers. This benchmarks i made with a NET 7 preview versions are showing that this reduces a little bit the memory pressure but further degreads the performance, especialliy for big numbers. It is probably the use of System.Buffers.ArrayPool.Shared what is not the fastest by a threshold of 256 for stackalloc.
These are problems that are not existing for a stack-machine solution.
Latest benchmark results can be found Latest-BigRational-Benchmarks; (Same benchmarks with NET 7 preview, latest mater-branch a lost of around 10% performance) It is currently under construction, but I will be adding more tests daily.
Further development
There is great potential for further improvements. In the calculation core itself, in the algorithms. Furthermore, frequently used values like
Pow10
,PI
etc. should efficiently cached, numbers that are currently always new calculated for functions likeExp
andLog
; and this would give another big performance boost.I'm looking for a solution using
static AsyncLocal<CPU>? _async_cpu
; instead of only[ThreadStatic] CPU _task:cpu
; This would be more user friendly and save forawait
functions etc. but access to _async_cpu is 3x times slower, I need to find a trick to detect if a _async_cpu request is necessary, maybe a quick stack baseadress check or something.The set of MathR functions should be extended and handle the DefaultPrecision property as it seems to work with a hypothetical floating point type that allows for such precision.
Especially geometric algorithms are very sensitive to epsilon and robustness problems. A vector extension, compatible with the current System.Numercis.Vector types would be useful. No one with BigInteger experiences would believe that it makes sense, that it's even possible, especially not in NET. But the Polyhedron 3D example shows: It works for thousands of BigRational Vertex, Points and Planes, for Polygon Tessellation, Boolean Intersections of Polyhedra (CSG) and this in real time at animation time:
Remarks
The current implementation is optimized for X64, tuned for RyuJIT code generation. For better performance on X86 it would be necessary to put special X86 code in critical pathes. Same goes for ARM, but I don't currently have a development environment for it.
I have created a C++ COM implementation of the CPU where the CPU instructions are COM interface calls. Unfortunately, for C# such a solution is about 30% slower due to the many platform calls. The same for a C++/CLI solution, which is also about 30% slower. But the COM solution inproc in a C++ app is almost 2x faster because of the powerful C++ optimization.
I checked the possibility of using AVX, SSE, Bmi intrinsics for optimizations. Unfortunately none of the test implementations were faster than the simple scalar arithmetic currently in use. Same experiences as published by many other programmers. Only Bmi2.X64.MultiplyNoFlags (MULX r64a, r64b, reg/m64) could really improve the performance but unfortunatlely, the corresponding ADCX and ADOX instructions are currently not implemented in Bmi2.X64.
From my point of view the best choice for an System arbitrary-precision number type is BigRational, not BigInteger. For BigRational all other types are only special cases. This applies for BigInteger, BigFloat, BigDecimal, BigFixed, in general floating-point types in any fixed or arbitrary size. Finally these are all rational number types, only with restricted denumerators, BigInteger 1, BigFLoat or in general floating-point 2^n , BigDecimal 10^n and so on. It is easy and efficent to implement such types very compact based on the BigRational CPU. The CPU has all the instructions that are necessary to calculate for the specific number types, otherwise it's easy to add. No problem to use own data formats as shown in example: VectorR. Only one core is necessary and the resources can shared for several client types. Not necessary to implement every time again a new calculation core. The focus should be on highly optimizing this CPU core for all possible platform- and processor configurations, and all client types would benefit.
More technical details, example codes etc. can be found at:
BigRational Test App: https://github.com/c-ohle/RationalNumerics/releases/BigRational