System.Numerics.BigRational and System.Numerics.MathR

c-ohle commented 2 years ago

Background and motivation

I want to suggest a novel arbitrary base number type for namespace System.Numerics.

I love C#. I miss a base number type for arbitrary precisission that is really usable in practice.
Such a number type should have the properties of a System.String:
a simple binary unique always normalized immutable readonly struct in machine-word size.
This would ensure that it can be passed around internally as quickly as possible and makes it efficient to use it for client types like vectors and matrices.

The general problem with NET and arbitrary number types.

It is so nice and easy to implement custom number types in C# by define numeric operators and implicit or explicit conversions.
For value struct types it works perfectly as long as they have a fixed size.
However, there are major problems for any kind of arbitrary precision type.
No problem if we have a single operation like: a = b + c;
The the + operator function can alloc the necessary memory resources, create and return a final object as needed for a.
But in practice we have expressions like this: a = (b c + d e) / f;
We usually have many intermediate results, much more than final results, but the intermediate results don't require memory allocs and setting up fully qualified, normalized final result objects.
The C# compiler simply generate code that calls the operator functions and the operator function gets no indication that only an intermediate result is needed, which could be managed in buffers.

BigInteger and BigRational.

Both types are closely related.
They need the same arithmetic big-integer calculation core whereby, for efficient rational arithmetics it needs some more bit level functions..
It looks like a logical consequence to setup a BigRational type from two BigInteger.
But that only doubles the problems:
BigInteger has already a 2 machine-word size, a BigRational based on BigInteger gets 4.
A Vector3R with X, Y, Z gets 12 and so on.
For every interimes result we get 12 times allocations and unnecessary normalizations.
This is far from any efficency.

The new approach, a BigRational type for NET that solves the problems

To bypass the mentioned problems I want to propose a general new system with significantly better performance and better memory usage. At first the data structure:

  public unsafe readonly struct BigRational : IComparable<BigRational>, IEquatable<BigRational>, ...
  {
    private readonly uint[] p; // the one and only instance field what gives BigRational a machine word size

    public class CPU // nested for access to BigRational uint[] p
    {
       private int i; private uint[][] p; //only this two fields, necessary to build a stack-machine
    }

    [ThreadStatic]
    private static CPU? cpu; // a static threadlocal shared instance of a CPU object
    public static CPU task_cpu => cpu ??= new CPU(); // accessor as static property  
  }

This is all, BigRational has only one array field and therefore the desired machine-word size and has furthermore the immutable properties like a string object. A nested public class CPU what represents a stack-machine and encapsulates the calculation core. Nested as only class that has access to the private BigRational creator and access to the data pointer - for safety. Everything as small and secure as possible, no additional system resources required. Not even predefined default values, nothing.

The data format used in the BigRational uint[] data pointer is the same as used in the stack-machine stack entries and is also simple as possible:

The first uint a sign bit and the number of following uint digits for the numerator, followed by a uint with the number of unit digits for the denominator.
This sequential format has the benefit that the internal integer arithmetics need only one pointer and not ranges, conversions, case handling etc. and this allows the best possible troughput.
A BigRational with null data pointer is defined as the number Zero (0) and therfore the value struct default is a valid number.

The Interfaces

BigRational itself acts like a empty hull to encapsulate the data pointer and provides a save first level user interface for a number type with all the number typical operators:

  public readonly struct BigRational :
    IComparable<BigRational>, IComparable, IEquatable<BigRational>, IFormattable, ISpanFormattable,
    INumber<BigRational>, ISignedNumber<BigRational>, IPowerFunctions<BigRational>, IRootFunctions<BigRational>,
    IExponentialFunctions<BigRational>, ILogarithmicFunctions<BigRational>, ITrigonometricFunctions<BigRational>,
    IHyperbolicFunctions<BigRational>
  {
    public override readonly string ToString();
    public readonly string ToString(string? format, IFormatProvider? provider = default);
    public readonly bool TryFormat(Span<char> destination, out int charsWritten, ReadOnlySpan<char> format = default, IFormatProvider? provider = null); public static BigRational Parse(ReadOnlySpan<char> value, IFormatProvider? provider = null);
    public static BigRational Parse(ReadOnlySpan<char> s, NumberStyles style, IFormatProvider? provider);
    public static BigRational Parse(string s, NumberStyles style, IFormatProvider? provider);
    public static BigRational Parse(string s, IFormatProvider? provider);
    public static bool TryParse(ReadOnlySpan<char> s, NumberStyles style, IFormatProvider? provider, out BigRational result);
    public static bool TryParse([NotNullWhen(true)] string? s, NumberStyles style, IFormatProvider? provider, out BigRational result);
    public static bool TryParse(ReadOnlySpan<char> s, IFormatProvider? provider, out BigRational result);
    public static bool TryParse([NotNullWhen(true)] string? s, IFormatProvider? provider, out BigRational result);
    public readonly int WriteToBytes(ref Span<byte> destination);
    public static BigRational ReadFromBytes(ref ReadOnlySpan<byte> value);
    public override readonly int GetHashCode();
    public override readonly bool Equals([NotNullWhen(true)] object? obj);
    public readonly bool Equals(BigRational b);
    public readonly int CompareTo(object? obj);
    public readonly int CompareTo(BigRational b);
    public readonly int CompareTo(long b);
    public BigRational(float value);
    public BigRational(double v);
    public static implicit operator BigRational(byte value);
    public static implicit operator BigRational(sbyte value);
    public static implicit operator BigRational(ushort value);
    public static implicit operator BigRational(char value);
    public static implicit operator BigRational(short value);
    public static implicit operator BigRational(int value);
    public static implicit operator BigRational(uint value);
    public static implicit operator BigRational(long value);
    public static implicit operator BigRational(nint value);
    public static implicit operator BigRational(nuint value);
    public static implicit operator BigRational(ulong value);
    public static implicit operator BigRational(Int128 value);
    public static implicit operator BigRational(UInt128 value);
    public static implicit operator BigRational(BigInteger value);
    public static implicit operator BigRational(Half value);
    public static implicit operator BigRational(float value);
    public static implicit operator BigRational(double value);
    public static implicit operator BigRational(NFloat value);
    public static implicit operator BigRational(decimal value);
    public static explicit operator BigRational(string value);
    public static explicit operator byte(BigRational value);
    public static explicit operator sbyte(BigRational value);
    public static explicit operator short(BigRational value);
    public static explicit operator char(BigRational value);
    public static explicit operator int(BigRational value);
    public static explicit operator ushort(BigRational value);
    public static explicit operator long(BigRational value);
    public static explicit operator ulong(BigRational value);
    public static explicit operator nint(BigRational value);
    public static explicit operator nuint(BigRational value);
    public static explicit operator Int128(BigRational value);
    public static explicit operator UInt128(BigRational value);
    public static explicit operator BigInteger(BigRational value);
    public static explicit operator Half(BigRational value);
    public static explicit operator float(BigRational value);
    public static explicit operator double(BigRational value);
    public static explicit operator NFloat(BigRational value);
    public static explicit operator decimal(BigRational value);
    public static explicit operator ReadOnlySpan<uint>(BigRational value);
    public static BigRational operator +(BigRational a);
    public static BigRational operator -(BigRational a);
    public static BigRational operator ++(BigRational value);
    public static BigRational operator --(BigRational value);
    public static BigRational operator +(BigRational a, BigRational b);
    public static BigRational operator -(BigRational a, BigRational b);
    public static BigRational operator *(BigRational a, BigRational b);
    public static BigRational operator /(BigRational a, BigRational b);
    public static BigRational operator %(BigRational a, BigRational b);
    public static BigRational operator +(BigRational a, long b);
    public static BigRational operator -(BigRational a, long b);
    public static BigRational operator *(BigRational a, long b);
    public static BigRational operator /(BigRational a, long b);
    public static BigRational operator +(long a, BigRational b);
    public static BigRational operator -(long a, BigRational b);
    public static BigRational operator *(long a, BigRational b);
    public static BigRational operator /(long a, BigRational b);
    public static bool operator ==(BigRational a, BigRational b);
    public static bool operator !=(BigRational a, BigRational b);
    public static bool operator <=(BigRational a, BigRational b);
    public static bool operator >=(BigRational a, BigRational b);
    public static bool operator <(BigRational a, BigRational b);
    public static bool operator >(BigRational a, BigRational b);
    public static bool operator ==(BigRational a, long b);
    public static bool operator !=(BigRational a, long b);
    public static bool operator <=(BigRational a, long b);
    public static bool operator >=(BigRational a, long b);
    public static bool operator <(BigRational a, long b);
    public static bool operator >(BigRational a, long b);
    public static int Sign(BigRational a);
    public static bool IsInteger(BigRational a);
    public static bool IsNaN(BigRational a);
    public static BigRational Abs(BigRational a);
    public static BigRational Min(BigRational a, BigRational b);
    public static BigRational Max(BigRational a, BigRational b);
    public static BigRational Truncate(BigRational a);
    public static BigRational Floor(BigRational a);
    public static BigRational Ceiling(BigRational a);
    public static BigRational Factorial(int a);
    public static BigRational Round(BigRational a);
    public static BigRational Round(BigRational a, int digits);
    public static BigRational Round(BigRational a, int digits, MidpointRounding mode);
    public static BigRational Pow(int a, int b);
    public static BigRational Pow(BigRational a, int b);
    public static BigRational Pow(BigRational x, BigRational y, int digits);
    public static BigRational Pow2(BigRational x, int digits);
    public static BigRational Sqrt(BigRational a, int digits);
    public static BigRational Log2(BigRational x, int digits);
    public static BigRational Log10(BigRational x, int digits);
    public static int ILog10(BigRational a);
    public static BigRational Log(BigRational x, int digits);
    public static BigRational Exp(BigRational x, int digits);
    public static BigRational Pi(int digits);
    public static BigRational Pi();
    public static BigRational Tau(int digits);
    public static BigRational Tau();
    public static BigRational Sin(BigRational x, int digits);
    public static BigRational Cos(BigRational x, int digits);
    public static BigRational Tan(BigRational x, int digits);
    public static BigRational Asin(BigRational x, int digits);
    public static BigRational Acos(BigRational x, int digits);
    public static BigRational Atan(BigRational x, int digits);
    public static BigRational Atan2(BigRational y, BigRational x, int digits);
    public static BigRational GreatestCommonDivisor(BigRational a, BigRational b);
    public static BigRational LeastCommonMultiple(BigRational a, BigRational b);
    public static BigRational IDiv(BigRational a, BigRational b);
    public static BigRational IMod(BigRational a, BigRational b);
    public static BigRational DivRem(BigRational a, BigRational b, out BigRational r);
    public static (BigRational Quotient, BigRational Remainder) DivRem(BigRational a, BigRational b);
    public static BigRational NumDen(BigRational a, out BigRational den);
    public static BigRational Clamp(BigRational value, BigRational min, BigRational max);
    public static BigRational Pow(BigRational x, BigRational y);
    public static BigRational Sqrt(BigRational x);
    public static BigRational Cbrt(BigRational x);
    public static BigRational Hypot(BigRational x, BigRational y);
    public static BigRational Root(BigRational x, int n);
    public static BigRational Exp(BigRational x);
    public static BigRational Exp10(BigRational x);
    public static BigRational Exp2(BigRational x);
    public static BigRational Log(BigRational x);
    public static BigRational Log(BigRational x, BigRational newBase);
    public static BigRational Log10(BigRational x);
    public static BigRational Log2(BigRational x);
    public static BigRational Sin(BigRational x);
    public static BigRational Cos(BigRational x);
    public static BigRational Tan(BigRational x);
    public static BigRational Asin(BigRational x);
    public static BigRational Acos(BigRational x);
    public static BigRational Atan(BigRational x);
    public static BigRational Atan2(BigRational y, BigRational x);
    public static (BigRational Sin, BigRational Cos) SinCos(BigRational x);
    public static BigRational AcosPi(BigRational x);
    public static BigRational AsinPi(BigRational x);
    public static BigRational Atan2Pi(BigRational y, BigRational x);
    public static BigRational AtanPi(BigRational x);
    public static BigRational CosPi(BigRational x);
    public static BigRational SinPi(BigRational x);
    public static BigRational TanPi(BigRational x);
    public static BigRational Acosh(BigRational x);
    public static BigRational Asinh(BigRational x);
    public static BigRational Atanh(BigRational x);
    public static BigRational Cosh(BigRational x);
    public static BigRational Sinh(BigRational x);
    public static BigRational Tanh(BigRational x);
    public static CPU task_cpu { get; }
    public static int DefaultDigits { get; set; }
    public static int Radix => 1; //todo: check rational Radix ?
    public static BigRational One => 1u;
    public static BigRational Zero => 0;
    public static BigRational NegativeOne => -1;
    public static BigRational AdditiveIdentity => 0;
    public static BigRational MultiplicativeIdentity => 1u;
    public static bool IsZero(BigRational value) => value.p == null;
    public static bool IsNegative(BigRational value) => Sign(value) < 0;
    public static bool IsPositive(BigRational value) => Sign(value) > 0;
    public static bool IsEvenInteger(BigRational value);
    public static bool IsOddInteger(BigRational value);
    public static bool IsCanonical(BigRational value) => true;
    public static bool IsComplexNumber(BigRational value) => true;
    public static bool IsFinite(BigRational value) => !IsNaN(value);
    public static bool IsImaginaryNumber(BigRational value) => false;
    public static bool IsInfinity(BigRational value) => false;
    public static bool IsNegativeInfinity(BigRational value) => false;
    public static bool IsPositiveInfinity(BigRational value) => false;
    public static bool IsRealNumber(BigRational value) => true;
    public static bool IsNormal(BigRational value) => true;
    public static bool IsSubnormal(BigRational value) => false;
    public static BigRational MaxMagnitude(BigRational x, BigRational y);
    public static BigRational MaxMagnitudeNumber(BigRational x, BigRational y);
    public static BigRational MinMagnitude(BigRational x, BigRational y);
    public static BigRational MinMagnitudeNumber(BigRational x, BigRational y);    
    static bool INumberBase<BigRational>.TryConvertFromChecked<TOther>(TOther value, out BigRational result);
    static bool INumberBase<BigRational>.TryConvertFromSaturating<TOther>(TOther value, out BigRational result);
    static bool INumberBase<BigRational>.TryConvertFromTruncating<TOther>(TOther value, out BigRational result);
    static bool INumberBase<BigRational>.TryConvertToChecked<TOther>(BigRational value, [NotNullWhen(true)] out TOther? result) where TOther : default;
    static bool INumberBase<BigRational>.TryConvertToSaturating<TOther>(BigRational value, [NotNullWhen(true)] out TOther? result) where TOther : default;
    static bool INumberBase<BigRational>.TryConvertToTruncating<TOther>(BigRational value, [NotNullWhen(true)] out TOther? result) where TOther : default;    
  }

Documentation for the particular functions at GitHub

For best possible compatibility the high level functions like several Round, Truncate, Floor, Ceiling, a set of Log and Exp functions up to trigonometric functions Sin, Cos, Tan, Atan,... specials like Factorial etc.
These all in a static extra class MathR as mirror to System.Math and System.Numercs.MathF and similiar as possible.
This is to reflect the old known standards and to make it easy to change algorythms from for example double to BigRational or back, to check robbustness problems.

A look at the BigRational and MathR functions always shows the same scheme.
Gets the task_cpu and execute instructions on it:

    public static BigRational operator *(BigRational a, long b)
    {
      var cpu = task_cpu; cpu.push(b); cpu.mul(a); return cpu.popr();
    }

    public static BigRational Round(BigRational a, int digits, MidpointRounding mode)
    {
      int f = 1;
      switch (mode)
      {
        case MidpointRounding.ToZero: f = 0; break;
        case MidpointRounding.ToPositiveInfinity: if (rat.Sign(a) < 0) f = 0; else f = 4; break;
        case MidpointRounding.ToNegativeInfinity: if (rat.Sign(a) > 0) f = 0; else f = 4; break;
      }
      var cpu = rat.task_cpu; cpu.push(a);
      cpu.rnd(digits, f); return cpu.popr();
    }    

    public static Vector3R Cross(in Vector3R a, in Vector3R b)
    {
      // return new VectorR3(a.Y * b.Z - a.Z * b.Y, a.Z * b.X - a.X * b.Z, a.X * b.Y - a.Y * b.X);
      var cpu = BigRational.task_cpu;
      cpu.mul(a.X, b.Y); cpu.mul(a.Y, b.X); cpu.sub();
      cpu.mul(a.Z, b.X); cpu.mul(a.X, b.Z); cpu.sub();
      cpu.mul(a.Y, b.Z); cpu.mul(a.Z, b.Y); cpu.sub();
      return new Vector3R(cpu.popr(), cpu.popr(), cpu.popr());
    }

All calculation happens in the CPU object. The high-level API functions are all implemented as short or longer sequences of CPU instructions. It all breaks down in such sequences and this is in general short and fast code and allows a rich API with less memory usage. Some example client types using this system: ComplexR, Vector2R, Vector3R, Matrix3x4R, PlaneR, VectorR. A special is VectorR, This struct has also only one pointer containing a index and a sequence of the rational data structures and can therefore store lot of numbers very efficently. It is an example for client types taht it is easy using BigRational.CPU but independent from BigRational, using a private data format.

The CPU - a stack-machine

Like every stack-machine the CPU has instructions like push,push(double), push(int),... and pop's like pop(), BigRational popr(), double popd(),... self explaining instructions like add(), sub(), mul(), div(), mod(), neg(), abs(), inv(), dup(). But there is a difference:
Since we have numbers with arbitrary size mov() instructions are not efficient because it would imply many memory copies.
Instead of this it fast as possible to swap the stack entry pointer only since the stack is a uint[][] array. This explains the versions of swp() instructions.
A pop command frees nothing. It only decrements the index of the stack top and the buffer there keeps alive for next operations. Allocs are only necessary if the current stack top buffer is not big enough for the next push op but this is a successive rar case. Not only for the high-level API, the stack-machine itself uses the stack excessively for all internal interimes results and therefore dont need stackalloc or shared buffers. It forms a consistent system and enables fast and compact code without special handling.

The CPU class reduced to the public instruction set:

   public sealed class CPU
    {
      public CPU(uint capacity = 32)
      public uint mark()
      public void push()
      public void push(BigRational v)
      public void push(int v)
      public void push(int v, int n)
      public void push(uint v)
      public void push(long v)
      public void push(ulong v)
      public void push(float v)
      public void push(double v)
      public void push(double v, bool exact)
      public void push(decimal v)
      public void push(BigInteger v)
      public void push(ReadOnlySpan<uint> v)
      public void get(uint i, out BigRational v)
      public void get(uint i, out double v)
      public void get(uint i, out float v)
      public void get(uint i, out ReadOnlySpan<uint> v)
      public BigRational popr()
      public double popd()
      public int popi()
      public void pop()
      public void pop(int n)
      public void swp()
      public void swp(int a = 1, int b = 0)
      public void swp(uint a, uint b)
      public void swp(uint a)
      public void dup(int a = 0)
      public void dup(uint a)
      public void neg(int a = 0)
      public void neg(uint a)
      public void abs(int a = 0)
      public void add()
      public void add(int a, int b)
      public void add(uint a, uint b)
      public void add(uint a)
      public void add(BigRational a)
      public void add(BigRational a, BigRational b)
      public void sub()
      public void sub(int a, int b)
      public void sub(uint a, uint b)
      public void sub(BigRational a, BigRational b)
      public void mul()
      public void mul(int a, int b)
      public void mul(uint a, uint b)
      public void mul(uint a)
      public void mul(BigRational a)
      public void mul(BigRational a, uint b)
      public void mul(BigRational a, BigRational b)
      public void div()
      public void div(int a, int b)
      public void div(uint a, uint b)
      public void div(BigRational a, BigRational b)
      public void div(BigRational a, int b)
      public void sqr(int a = 0)
      public void sqr(uint a)
      public void inv(int i = 0)
      public void shl(int c, int i = 0)
      public void shr(int c, int i = 0)
      public void and()
      public void or()
      public void xor()
      public void mod(int f = 0)
      public void rnd(int c, int f = 1)
      public void pow(int x, int y)
      public void pow(int y)
      public void fac(uint c)
      public int bdi()
      public void lim(uint c, int i = 0)
      public int sign(int a = 0)
      public int cmpa(int a = 0, int b = 1)
      public int cmpi(int a, int b)
      public int cmp(int a = 0, int b = 1)
      public int cmp(uint a, uint b)
      public int cmp(uint b)
      public int cmp(BigRational a, BigRational b)
      public int cmp(BigRational b)
      public bool equ(BigRational a, BigRational b)
      public bool equ(uint a, uint b)
      public void norm(int i = 0)
      public uint hash(uint i)
      public uint msb()
      public uint lsb()
      public bool isi()
      public void gcd()
      public void tos(Span<char> sp, out int ns, out int exp, out int rep, bool reps)
      public void tor(ReadOnlySpan<char> sp, int bas = 10, char sep = default)
      public void sqrt(uint c)
      public void log2(uint c)
      public void log(uint c)
      public void exp(uint c) //todo: catch cases fast exp(1), ...
      public void pi(uint c) //todo: c = f(c), cache
      public void sin(uint c, bool cos)
      public void atan(uint c)
}

It is noticeable that there are many different versions of the same instruction. Since we have number values with arbitrary size it is all to avoid unnecessary memory copies. For example, to add two BigRational numbers it is possible to push both values to the stack, and call a simple add(). But since the data structure on the stack is the same as in BigRational pointers the CPU can access and read the data pointer directly and push the result only on the stack what saves two mem copies and is therfore faster. Other operation versions working with absolute stack indices what is more efficent in many situations. For example if one function pushes a vector structure and another function takes this as input.

The Efficiency

On an example. The CPU implementation of the sine function, which can also calculate cosines.
This function uses several identities and a final Taylor-series approximation.
The code looks ugly, but it reduces the computation to a sequence of just 36 short CPU instructions. It means that providing a rich set of such common math functions already at this level doesn't involve much overhead.

 public void sin(uint c, bool cos)
  {
    var e = bdi() << 2; c += 16; if (e > c) { } // bdi x
    var m = mark(); pi(e > c ? unchecked((uint)e) : c); shr(1); // push PI2
    if (cos) add(1, 0); // x += PI2 
    div(m - 1, m); mod(); swp(); pop(); // push divmod(x, PI2)
    var s = this.p[this.i - 1][1]; // dup(); var s = pop_int(); // seg
    mul(0, 1); neg(); add(2, 0); pop(); // x - trunc(x / PI2) * PI2
    if ((s & 1) != 0) { swp(); sub(); } else pop(); // x = PI2 - x, pop PI2
    if ((s & 2) != 0) neg(); lim(c); // x = -x, lim x
    push(6u); dup(1); dup(); sqr(); lim(c); swp(); // 3!, x^2, x
    for (uint i = 1, k = 4; ; i++, k += 2)
    {
      mul(1, 0); lim(c, 1); dup(1); div(0, 3); // x +/-= x^n / n!, n = 3, 5, 7, ...
      var b = -bdi(); if (b > c) break; lim(c);
      if ((i & 1) != 0) neg(); add(4, 0); pop(); lim(c, 3); // x += 
      push(k * (k + 1)); mul(3, 0); pop(); mul(1, 0); // n! *=, x^n *=
    }
    pop(4);
  }

To come back to the begin, the problem with BigInteger and a Rational type based on BigInteger. Imagin the same sine calculation with such type. Where every instruction involves not only two new full allocated BigIntegers for a result, Rational itself needs as example, for a simple addition 3mul+1div+1*add and this are all new allocated BigIntegers. But not enough, Rational has to normalize or round the fractions - additional many new BigIntegers. It should be obvious that this is inefficient, slow and leads to an impractical solution.

To show the differences I made the Mandelbrot example in BigRationalTest app. It does exactly this. A conventional BigRational class based on a BigInteger for numerator and denominator is part of the Test app and called OldRational. At the begin BigInteger profits as it can store small numbers in the sign field but then, deeper in the Mandelbrot set, it slows down exponentially before it stops working in a short break between endless GC loops.

The bottle-nack for rational arbitrary arithmetic is the integer multiplication and for the normalization the GCD function and the integer division. The first benchmarks showing that the BigRational calculation core is aproximatly 15% faster then the equivalent functions in System.Numerics.BigInteger. At least on my machine. With other words, using BigRagtional for pure integer arithmetics can improve the performance already. For integer algorithms using the CPU the effect is much bigger of course..

BigInteger in NET 7 will use Spans more stackalloc and shared buffers. This benchmarks i made with a NET 7 preview versions are showing that this reduces a little bit the memory pressure but further degreads the performance, especialliy for big numbers. It is probably the use of System.Buffers.ArrayPool.Shared what is not the fastest by a threshold of 256 for stackalloc. These are problems that are not existing for a stack-machine solution.

Latest benchmark results can be found Latest-BigRational-Benchmarks; (Same benchmarks with NET 7 preview, latest mater-branch a lost of around 10% performance) It is currently under construction, but I will be adding more tests daily.

Further development

There is great potential for further improvements. In the calculation core itself, in the algorithms. Furthermore, frequently used values like Pow10, PI etc. should efficiently cached, numbers that are currently always new calculated for functions like Exp and Log; and this would give another big performance boost.

I'm looking for a solution using static AsyncLocal<CPU>? _async_cpu; instead of only [ThreadStatic] CPU _task:cpu; This would be more user friendly and save for await functions etc. but access to _async_cpu is 3x times slower, I need to find a trick to detect if a _async_cpu request is necessary, maybe a quick stack baseadress check or something.

The set of MathR functions should be extended and handle the DefaultPrecision property as it seems to work with a hypothetical floating point type that allows for such precision.

Especially geometric algorithms are very sensitive to epsilon and robustness problems. A vector extension, compatible with the current System.Numercis.Vector types would be useful. No one with BigInteger experiences would believe that it makes sense, that it's even possible, especially not in NET. But the Polyhedron 3D example shows: It works for thousands of BigRational Vertex, Points and Planes, for Polygon Tessellation, Boolean Intersections of Polyhedra (CSG) and this in real time at animation time:

BigRationalAnimation

Remarks

The current implementation is optimized for X64, tuned for RyuJIT code generation. For better performance on X86 it would be necessary to put special X86 code in critical pathes. Same goes for ARM, but I don't currently have a development environment for it.

I have created a C++ COM implementation of the CPU where the CPU instructions are COM interface calls. Unfortunately, for C# such a solution is about 30% slower due to the many platform calls. The same for a C++/CLI solution, which is also about 30% slower. But the COM solution inproc in a C++ app is almost 2x faster because of the powerful C++ optimization.

I checked the possibility of using AVX, SSE, Bmi intrinsics for optimizations. Unfortunately none of the test implementations were faster than the simple scalar arithmetic currently in use. Same experiences as published by many other programmers. Only Bmi2.X64.MultiplyNoFlags (MULX r64a, r64b, reg/m64) could really improve the performance but unfortunatlely, the corresponding ADCX and ADOX instructions are currently not implemented in Bmi2.X64.

From my point of view the best choice for an System arbitrary-precision number type is BigRational, not BigInteger. For BigRational all other types are only special cases. This applies for BigInteger, BigFloat, BigDecimal, BigFixed, in general floating-point types in any fixed or arbitrary size. Finally these are all rational number types, only with restricted denumerators, BigInteger 1, BigFLoat or in general floating-point 2^n , BigDecimal 10^n and so on. It is easy and efficent to implement such types very compact based on the BigRational CPU. The CPU has all the instructions that are necessary to calculate for the specific number types, otherwise it's easy to add. No problem to use own data formats as shown in example: VectorR. Only one core is necessary and the resources can shared for several client types. Not necessary to implement every time again a new calculation core. The focus should be on highly optimizing this CPU core for all possible platform- and processor configurations, and all client types would benefit.

More technical details, example codes etc. can be found at:

How it works and how to use it: https://c-ohle.github.io/RationalNumerics
ApiDocumentation: https://c-ohle.github.io/RationalNumerics/api/System.Numerics.html
Project URL: https://github.com/c-ohle/RationalNumerics
BigRational Test App: https://github.com/c-ohle/RationalNumerics/releases/BigRational

ghost commented 2 years ago

Tagging subscribers to this area: @dotnet/area-system-numerics See info in area-owners.md if you want to be subscribed.

Issue Details

### Background and motivation ### I want to suggest a novel BigRational number type for namesapce System.Numerics. [https://github.com/c-ohle/RationalNumerics](https://github.com/c-ohle/RationalNumerics) This solution does **not** based on BigInteger, it based on a novel concept and an own calculation core. In difference to the usual implementation of such a class, this new concept offers a significantly better performance and better memory usage. This is mainly due to the advantages of a stack machine, which minimizes the need for memory allocations and copies, and which allows to apply more efficient numerical algorithms. * How it works and how to use it: [https://c-ohle.github.io/RationalNumerics](https://c-ohle.github.io/RationalNumerics) * ApiDocumentation: [https://c-ohle.github.io/RationalNumerics/api/System.Numerics.html](https://c-ohle.github.io/RationalNumerics/api/System.Numerics.html) * Project URL: [https://github.com/c-ohle/RationalNumerics](https://github.com/c-ohle/RationalNumerics) * BigRational Test App: [https://github.com/c-ohle/RationalNumerics/releases/BigRational](https://github.com/c-ohle/RationalNumerics/releases/BigRational) --- ### API Proposal ```csharp namespace System.Numerics { public unsafe readonly partial struct BigRational : IComparable, IEquatable, IFormattable, ISpanFormattable { } } ``` ### API Usage ```csharp var x = (BigRational)1 / 3; var y = MathR.Sin(x, digits: 100); //... ``` ### Alternative Designs _No response_ ### Risks _No response_

Author:	c-ohle
Assignees:	-
Labels:	`api-suggestion`, `area-System.Numerics`
Milestone:	-

tannergooding commented 2 years ago

Thanks for the interest here, I'd like to note that this proposal isn't complete enough to be actionable as is.

The proposal needs to be updated to include the full API surface being proposed. If you have a specific implementation in mind, then it would be helpful to go more in depth with examples explaining how its expected to work and any limitations resulting from that. Ideally it would be considered alongside another proposal such as https://github.com/dotnet/runtime/issues/21588, which equally needs a "full proposal" writeup covering the shape and functionality.

One consideration is that a general purpose struct Rational<TInteger> where TInteger : IBinaryInteger<TInteger> is one way this could be provided where it can support both small fixed-size and big dynamically sized Rational like values.

c-ohle commented 2 years ago

@tannergooding, thanks for your response. I will update the proposal this night that way.

I saw your set of new interfaces for numeric types and I guess it is a good idea. I would directly implement them in BigRational, to try out and get experiences to effort and how it works all together.

But I'm sceptical with a Rational<TInteger> type.

For machine word- or in general fixed size integers types it is really limited since every mul and div can double the bit count already.
Any kind of rounding to handle this would dramatical limit the number range and this is difficult to control.
This as interface, to get dynamic size results back, wourld imply heap allocs for the smaller value types too.

Therefore I'm sceptical that such template makes sense in practice. Every floating-point type has better properties and is finally also a rational number - denumerator 2^n.

c-ohle commented 2 years ago

@tannergooding, I've updated the suggestion and hope it's now in the form you need. Otherwise please let me know. Regards co

tfenise commented 2 years ago

I just wonder, should one really calculate transcendental functions like exp, sin or cos with rational numbers? It seems that (arbitrary precision) floating-point numbers are more appropriate when calculating transcendental functions.

c-ohle commented 2 years ago

@tfenise sure, up to an desired perecisission, and this should be possible with an arbitrary nuber type.

tfenise commented 2 years ago

Of couse it is possible, but I don't think it is particularly efficient having to deal with the denominators when doing non-exact, approximating calculation. For example, if sin(1) is calculated to a / 2^n up to an absolute error of 2^(-n-1), and sin(2), sin(3) calculated to b / 2^n, c / 2^n likewise, calculating the product sin(1)*sin(2)*sin(3) would mean to calculate (a*b*c) / 2^(3n) if using rational number multiplication, which would be too laborious to calculate the exact big integer multiplication a*b*c, because (a*b*c) / 2^(3n) cannot be expected to really have an error as small as 2^(-3n-1), as sin(1), sin(2) and sin(3) may already have an error of 2^(-n-1).

c-ohle commented 2 years ago

@tfenise, Yes, that's right, so if you want to work with such precisions, you have to know what you're doing, to estimate the desired precision before. The only difference to floating-point is that the precision is natural limited there by default. Finally a floating-point type is nothing else than a rational number type, with the limitation that the denumerator is limited to 2^n. (or 10^n for decimal). It is possible to build an arbitrary BigFloat type based on the BigRational CPU. Like a more performant and efficent BigInteger. Thats the point. The necessry arithmetics is there and can be shared. Thats what I mean with an arbitrary base type for NET.

c-ohle commented 2 years ago

@tfenise, I read your concerns again this morning. "I don't think it is particularly efficient having to deal with the denominators when doing non-exact" This is probaly your focus - my was at first the math in your example. Sorry.

You are absolutely right. It would not been efficent, especially not in such iterations, taylor series, etc. Every single multiplication step can double the number of digits in numerator and denominator. In a loop we would get quickly an explosion of digits and more worse: the next iteration steps would slow down dramaticaly. But we know in such iteartions approximately how much diggits it needs to get a final result of the specified precision. A possible high-level approach would be to round such values in every iteration step but this would be to slow as it implies further multiplies and divisions.

The solution is a very fast binary rounding on bit level, implemented as cpu instruction: lim(uint c)

It is very simple: a msb (Most Significant Bit) test for the numerator and the denominator what cost nothing. If the minimum of these msb exceed the bitcount c: Then the numerator and denumerator shiftet to the right by the difference and this is a fast operation as well .

Pseudo code:

var x = min(msb(num), msb(den));
If (x > c)     
{
   num >>= x - c;
   den >> x - c;
}

This is finally the same that happens at floating-point arithmetic automatically. But here we have the possibility (and the necessity) of a fine control. I guess it is understandable that this solves the problem and we have a lot of such binary tricks to improve the performance, to get fast approximations etc. And, this one yes but the most are not possible or efficient using BigInteger.

tannergooding commented 2 years ago

Thanks for the comprehensive write-up.

For public unsafe readonly struct BigRational, its still unfortunately not "complete" enough in the proposal. There are various conversions missing (such as byte, sbyte, short, ushort, uint, nint, nuint, Int128, UInt128, and Half). It's also missing the surface area from the respective generic math interfaces being introduced in .NET 7 (such as INumber<T> itself). While you don't strictly have to list every member that would be exposed via the interfaces, it does help to have them listed rather than leaving them "implied". It's also beneficial to cover what's implicitly vs explicitly implemented so the exact surface can be easily viewed.

These all in a static extra class MathR as mirror to System.Math and System.Numercs.MathF and similiar as possible.

Prior to .NET 7, Math and MathF were "special" and they were effectively the only place in the BCL where we did something like this and only because it impacted the "primitive" types. For .NET 7, we are officially moving away from this and standardizing on exposing such functionality as static methods on the corresponding type (e.g. float.Sqrt or BigInteger.Pow). In the case of "generics", such methods can optionally be exposed on a non-generic type of the same name (e.g. Vector128<T> and Vector128.Method()).

The CPU - a stack-machine

This is likely going to be a non-starter, especially with it exposed as a public implementation detail. Not only can the consumer of the APIs not rely on operators and therefore language features like expressions and precedence (e.g. * happens before +), but it makes the code much harder to read and reason about overall. -- It's also worth noting that it isn't following standard .NET naming guidelines/conventions which is itself going to be an issue if this went to API review.

Having this use ThreadStatic is also not very compatible with things like Tasks or other async code and effectively exposes an "implementation detail" we wouldn't want end users to rely on.

The issue here is effectively the same that BigInteger or even say ImmutableArray has, which is that the concrete type is an "immutable struct" and so doing many intermediate operations can potentially be expensive. The standard solution for this and the most likely solution we'll have for BigInteger is the "builder" pattern. That is, we will likely have some future class BigInteger.Builder which is a mutable version of BigInteger. It would be used to hold intermediate computations and to allow a low/zero cost "move" from builder to immutable value. This removes the biggest cost around intermediate computations and ensures the overall use of the data can be efficient.

We should aim to expose something that remains easy and intuitive to use while also achieving good performance and there are many potential avenues for this. We should also consider what impact having a "packed" format will have. It may make it "machine word sized", but it also impacts the usability of the code and will increase the cost in other areas, such as if the consumer wants to get the numerator and denominators explicitly (a fairly common and expected operation for any Rational number). If that data isn't important, then you're almost designing a BigDecimal instead which has similar overall concerns but different needs on what gets exposed.

c-ohle commented 2 years ago

@tannergooding, thanks for taking the time, I understand the needs and the priorities better now.

conversions missing (such as byte, sbyte, ...

I have omitted the conversions from smaller types, since they are automatically mapped to the next larger type by the compiler. But yes, it makes sense for specs like this to implement and document them all explicitly, so everything is absolutely clear. I can do this of course in shortest time. For the new NET 7 types, Int128, INumber<T>... So far it has been a NET 6 project with a focus on compactness and performance, but I can implement all this in short time with the new requirements in a new NET 7 environment.

Math and MathF were "special" ... For .NET 7, we are officially moving away from this ...

Good decision for the API design, I like it, more straight an logic. Only a copy and paste action from MathR to BigRational and it is done and MathR is gone. It is only a collection of static functions.

The CPU - a stack-machine... a non-starter, especially with it exposed as a public implementation detail...

Ok, proposal for a API solution:

The CPU as internal class, only visible on demand for system assemblies .
Renaming of the instruction set, no problem, it's all linked - up to in the documentation. Only problem to find meaningful names. That it follows standard .NET naming guidelines/conventions.

Having this use ThreadStatic is also not very compatible...

I am aware of the problem. The better choice would be to use:

private static AsyncLocal<BigRational.CPU>? _async_cpu;

That would result in the desired behavior and user safety. Unfortunately, the access time for this construction is about 3x times slower than to a ThreadStatic root as there is always small dictionary access behind. That's why I keeped ThreadStatic so far for the performance. After a discussion I found already a fast solution for this:

[ThreadStatic] _task_cpu;remains.
task_cpu { get; } checks always quick and dirty the base address of the current stack frame:
```
uint id = ((uint)&id) >> 20;  // as long as the NET stack is limited to 1MB: >> 20
```
Only if this id differs from a last one stored in _task_cpu does it require a request to AsyncLocal to swap and overall we maintain performance. It's just not public yet, because I have to test it beforehand in all possible configurations, parallel, task, await constructions, etc. It needs a second thread-static root for the solution, but this is the price for the security.

The issue here is effectively the same that BigInteger .... BigInteger.Builder

As proposed: task_cpu private but a new: public class BigRational.Builder. This Builder class can expose such a simple, readable and secure end user interface as desired. Behind it is nothing else than a wrapper to a CPU object with additional security checks but with the same functionallity like a BigIntager.Builder and it works with Enumerator's, IEnumerable<>. etc. It is really less effort and there are already examples in the TestApp for similiare cases.

Another much more general approach: unlike C++, in C# we don't have a copy constructor/operator or anything with similar functionality, We have op_Addition, op_Multiply, but nothing like op_Assign, which the compiler calls (if defined) before a var or target gets the result from expressions parsing with interimes result. This would allow to automate the process where now such a builder solution is necessary. Not difficult or much overhead for the compiler, but helpful in so many other situations, not just for our cases. But is it realistic and useful to contact the compiler team? I'm sure I wouldn't get an answer.

Questions

It stands to reason that I should do this as a dedicated NET 7 development in a new repository. All the new types, interfaces, templates. Isn't it the best I would do it in a MS shared repository with access to the latest runtime instead of Git full public, based on any latest master-branche? I dont know, you are in the position to decide or organize such things? The ways are so long, bugs in System.Numerics, which I reported back in May, were fixed today. Many other bugs that I have already reported are still in the pipeline.

And in general. Befor I start with something. Is there really any interest in such an solution at all? I can't know what's planned internally at MS.

Regards co

c-ohle commented 2 years ago

@tannergooding, a .NET7 BigRational version is available for testing. MathR is gone, the missing conversions and the new interfaces are implemented and tested, - only not finally optimized. Updated the full public interface above. Documentation for the particular functions at GitHub The RationalNumerics project can now be compiled with the VS2022 preview to get a fully featured NET 7 version according to the new standards.

c-ohle commented 2 years ago

@tannergooding

The new INumber-based interfaces are great. I tested this from both sides.

But I want to suggest some improvements - before it is to late.

The INumberBase function set should be improved. The set of 17 Is... functions, IsCanonical(TSelf)...IsZero(TSelf) sould be one function like: NumberInfoFlags GetNumberInfo(TSelf, NumberInfoFlags mask)

Easy extensible and compatible for future use by simply extend enum NumberInfoFlags and not by extend the interface.
Faster and easier to get the necessary information in templates with one hit to select the best possible alg. based on the flags.
More type specific without more code, for all thinkable types like arbitrary, rational, vectors etc. where more infos is needed.
Less code, more easy to implement on number side as the most flags are const and for others, using the mask param, to bypass unrequested, unnecessary calculations.
INumberBase<TSelf>.Radix should be integrated here, as this can also be dynamic (and it's always annoying to see this obvious information in debug) and what is Radix for rational?
Better for inline in most cases. And, for any reason it works better with inline that way (runtime/issues/68890) what I tested on X64.

The same applies to the functions Create...<TOther>(TOther), TryConvert...<TOther>(TOther), TryConvertFrom...<TOther>(TOther)

should be one function with enum parameter for the same reasons.
I've seen in other implementations, I've done this in mine, on the number side a mapping to such a private general function(s) and it wouldn't be necessary that way.

For the statics TSelf One, TSelf Zero, There should be a function like: TSlef GetConst(NumberConst)

easy extendable by extend enum NumberConst
can contain MinValue, MaxValue, Epsilon, etc.
can contain the best type specific representation for Pi, E, Tau etc. which is really importand for alg's. as template.
can contain the props AdditiveIdentity, MultiplicativeIdentity, etc. and would make approriated interfaces IAdditiveIdentity, IMultiplicativeIdentity obsolete.
The jump table in implementations really isn't the problem and since everything is constant the compiler in release can inline this perfectly.

The ..Magnitude functions should be moved in another interface. The ..TryFormat functions should be moved in another interface.

Also for the other interfaces. They could all be improved upon and benefit from this approach by using enum parameters for capability checks and then only one static method is required for each method body type, unary, binary etc.

Consistently thought: the entire current INumber interface set could be implemented in one interface and everything would be more flexible, more type-specific, extensible for the future, less code, easier to read, better inline, more performance.

But yes, it is also nice to see the always same normed function names for similar types and to force this by the interfaces.

Thaina commented 2 years ago

I am not so sure they design 17 of Is... numberinfo like that because we could separate each Is into separate interface responsible for each type of number?

c-ohle commented 2 years ago

@Thaina but always ever each number type with INumberBase interface gets a function table for every static interface function for the own implementation and has completely implement this. One table for each INumber sub interface.
Therefore, to keep such interface constructions small as possible and extendable without further interface functions is a good idea. (table not like a real vtable but assembly inernal a table as set of static functions)

c-ohle commented 2 years ago

@tannergooding update:

The issue here is effectively the same that BigInteger or even say ImmutableArray has...

I found a solution using the or-operator in combination with an implicit conversion for the Nullable<T> variant. This allows for a logically non-incorrect syntax and effectively simulates an assign operator that is unfortunately unsupported in C#. Instead of:

var x = a * b + c * d + e * f;

it is possible to write:

var x = 0 | a * b + c * d + e * f; // 5x faster, allocs only for the final result

It must be the or operator, because of its priority in arithmetic expressions, whereby its Nullable<T> parameter is always evaluated first what builds a frame for the intermediates.

The effect is amazing, average measured for matrix-inverse - about 100x faster .
The consumer can rely on the operators and the language features and must not deal with builder solutions to achieve such a performance.
It is easy to implement and ways to protect since the operator used are not the main type operators.
BigRational.CPU could be private now since all INumber functions are member of BigRational .

That is, we will likely have some future class BigInteger.Builder...

It works for BigInteger as well, not necessary to expose a Builder class
Lots of possibilities as well where the INumber system would benefit.

How easy it is here

Of course, the best would be the compiler-devs would reconsidering the 20 years old decision regarding an assign-operator in C#. Only some lines and it would possible to automate such things perfectly. So long it remains that there are of course possebillities to construct problem cases, to provocate conflicts like anywhere else.

tannergooding commented 2 years ago

As per the framework design guidelines:

❌ DO NOT be cute when defining operator overloads.

That is, you should only use an operator where that mathematically makes sense so using / for "path concatenation" or << for "stream output" is a non-starter. In the same way we don't want to use | which stands for "logical or" as a stand in for the builder pattern that would otherwise reduce the computation costs.

c-ohle commented 2 years ago

The real or operator is not in touch. A better idea? Is there a chance to get an op_assign or something similar solution from the compiler team?

tannergooding commented 2 years ago

But I want to suggest some improvements - before it is to late.

We snapped "code complete" last Tuesday and any additional changes need a lot more scrutinization to be accepted. None of these suggestions look like they warrant such a change.

The set of 17 Is... functions, IsCanonical(TSelf)...IsZero(TSelf) sould be one function like: NumberInfoFlags GetNumberInfo(TSelf, NumberInfoFlags mask)

This will ultimately hurt codegen and optimizations that the JIT can do. It will also make the code overall harder to understand and cause it to do more work except for potentially rare edge cases where you want multiple checks simultaneously.

Easy extensible and compatible for future use by simply extend enum NumberInfoFlags and not by extend the interface. Faster and easier to get the necessary information in templates with one hit to select the best possible alg. based on the flags.

Extending this is no easier and is in a few ways much harder. While it indeed doesn't need new API surface, it does require existing code paths to update to handle it (risks failure for unrecognized flags) and still requires all existing types to version regardless.

T.GetNumberInfo(value, NumberInfoFlags.IsPositive | NumberInfoFlags.IsEvenInteger) is no faster than T.IsPositive(value) && T.IsEvenInteger(value) and in many ways it is likely slower due to having a larger more verbose method body that is unlikely to be inlined by the JIT. It also disallows "constant" paths for various calls (e.g. IsNegative can return constant false for unsigned types, GetNumberInfo could not).

INumberBase.Radix should be integrated here, as this can also be dynamic (and it's always annoying to see this obvious information in debug) and what is Radix for rational?

It is a consideration of the storage format, most generally this is 2. Some specific types that explicitly work with "declets" or base 10 exponents would be 10 instead.

For the statics TSelf One, TSelf Zero, There should be a function like: TSlef GetConst(NumberConst)

This is problematic both short and long term for many of the reason's I've listed above. It makes the code harder to read/reason about and hinders optimizations available to the JIT as it has to deal with inlining larger method bodies with more code.

The jump table in implementations really isn't the problem and since everything is constant the compiler in release can inline this perfectly.

Jump tables are problematic in the first place, ideally we little to no jumps in the common paths and the JIT is able to optimize these calls instead.

The ..Magnitude functions should be moved in another interface. The ..TryFormat functions should be moved in another interface.

The former are IEEE 754 required functions and important for allowing basic comparisons between Complex numbers. The latter are a core concept to numeric types in general and a basic expectation for developers working with number types in end user applications.

Consistently thought: the entire current INumber interface set could be implemented in one interface and everything would be more flexible, more type-specific, extensible for the future, less code, easier to read, better inline, more performance.

This would restrict versioning, restrict which types could implement the interfaces, and hurt the overall ability to improve these interfaces and their support over time.

tannergooding commented 2 years ago

A better idea? Is there a chance to get an op_assign or something similar solution from the compiler team?

I don't see this happening. The solution is to follow the normal and existing design guidelines and patterns. In this case, the most likely solution is to have and expose the builder pattern. We then have an immutable type for storage and a mutable type for complex computation. This is already known to behave well and is familiar to existing .NET users. We will be able to get the same or better perf with such a pattern.

c-ohle commented 2 years ago

ok, thanks, I need time to think.

c-ohle commented 2 years ago

@tannergooding Please feel free and let me implement a builder class that meets the needs for a net-core solution. The functionality is there to realize short term every imaginable model. I need only a direction, what exactly is meant with "builder pattern"? Are there concrete specs, papers, ideas, examples?

I guess there is agreement that the current BigIntegerCalculator is UI- and functional unsuitable as a prototype. The best approach remains, in my opinion, to use standard arithmetic expressions, so I wrote already a suggestion and mayby that this can help. Otherwise, I'm open to any approach.

tfenise commented 2 years ago

https://source.dot.net/#q=class%20builder

c-ohle commented 2 years ago

@tfenise thanks, StringBuilder, ok. Hard for me to get a picture. How should a typical calculation sequence look like? a b + c d, function calls etc. And, except for simple add, we can't use the operators because then we're back with the same problem.

tannergooding commented 2 years ago

I expect that BigInteger.Builder should be nearly identical to BigInteger the biggest difference is that the Builder is a class and mutable. This follows how ImmutableArray<T> and ImmutableArray<T>.Builder work, for example.

The easiest to use setup would be supporting static Builder operator +(Builder left, BigInteger right) where the return value is left. This would allow (a * b + c * d).MoveToBigInteger() and for it to reduce the overall amount of allocations.

It would be up to API review to determine if a + b returning a is potentially confusing or not and I could see it being a matter of some debate. It might also be debatable whether Builder is the right name in this particular scenario, but the general premise and API shape would be roughly the same regardless mostly just differing on type name and whether operator + is exposed or just a method named Add.

c-ohle commented 2 years ago

such a simple case and we have already two builder in use. lot of operator versions, possible side effects, limitations, what is with ((x = a) * b), double function set, and and and.

tannergooding commented 2 years ago

Yes, you have a builder per top level part of the expression tree. It is ultimately have no more allocations than BigInteger and in practice will reduce overall allocations by at least the number of operations performed. So rather than up to 7 allocations, you have between 1 (best case) and 4 (common case). The common case is 4 simply because the BigInteger itself (when the value is outside the range of Int32) requires an allocation to track the backing data and there is no way to reduce those.

Even in the case of your BigRational scenario above there is an allocation per BigRational to hold the uint[] p. The CPU type then fills the need of the "builder" type, but comes with at least two allocations (to represent uint[][] p) and will have the same overall limitations as a Builder type where it has to grow p if the backing data no longer fits and has to consider mutation across many operations. Utilizing pooling or other tricks are the best ways to remove these allocations for intermediate computations resulting in growth.

The most interesting thing is going to be looking at real world algorithms requiring complex expressions and finding the simplest way for users to achieve success. That is, the simplest way they can write their expression and achieve "optimal" reduction of allocations without needing to significantly refactor the code they write themselves.

c-ohle commented 2 years ago

cpu buffer allocs: new uint[1 << (BitOperations.Log2(n - 1) + 1)]; //4, 8, 16, ... since the buffers are only swapped, no copys, for subsequent push's it becomes a more and more rar case that allocs are necessary. You should see this in debug, I would really like make a presentation.

c-ohle commented 2 years ago

The calculation is simple: for every stack machine push op it would need one new Builder class. Every Bilder class represents finally one stack entry. But also with shared mem etc. it is class, it is always alloc. I cant see a way to recicle all the builder trash without explicit calls. But is this user friendly? And this is only BigInt. A simple Rational add is already (a b + c d) / (e * f).

tannergooding commented 2 years ago

since the buffers are only swapped, no copys, for subsequent push's it becomes a more and more rar case that allocs are necessary

Consider where you have a + b + c. In order for this to function given a BigInteger a, BigInteger b, and BigInteger c you have to create a new allocation d to hold the intermediate a + b and then e to hold d + c, you can't reuse the backing array from a, b, or c because that would be an illegal mutation.

Given a builder, the builder is mutable and so a BigInteger.Builder a, BigInteger b, and BigInteger c can be four, rather than five, allocations. This is because a += b then a += c can be done. In some cases you may still have to grow the backing array, but its still "no worse" and often better. You can also explicitly oversize to help reduce total allocations up front (in the same way you can pre-size a List<T>). The longer the expression chain, the more savings you get, but with the note that complex trees with two separate expression chains requires two builders.

Given the CPU stack based approach, you ultimately have the same limitations. You can reuse the backing arrays from a, b, and c but they must remain "readonly" (and so if you do reuse, you must track the usage). New arrays must be allocated to hold the intermediate results with the ability to reuse arrays for intermediates that are no longer in use.

In the same way, builders can be pooled as can the backing arrays for the builders to avoid repeat allocations. A builder could theoretically also be a struct and reduce allocations by one more, but that has potential considerations and downsides that make it harder to use (mutable structs are generally not recommended in .NET) and so having it be a class with some pooling is likely a better choice overall.

These two approaches are actually similar and run into the same limitations and scenarios around them with regards to maximum total allocations and copying done. The main difference is whether or not they follow an existing "standard" around how .NET exposes its APIs.

c-ohle commented 2 years ago

Maybe a misunderstanding with the stack machine, how it works. The stack machine gives never a buffer away or pushes a buffer from outside in the collection. It is only to have buffers to write number data from calculations to - or read from an to avoid copys it is allowed to swap buffer in the collection to bring in the right position. That after add() the result is on top of the stack. Forget allocs. This is absolutely not the problem with such approch. Happens at begin, sometimes later if bigger data comes. Because of the manny swaps the buffers always good mixed. Don't know how I can better explain. It would be better to show how it looks like in paxis.

c-ohle commented 2 years ago

And also the thing with final BigRationl allocs. This is in praxis also rar case. Most algorithms don't need BigRational as result. Example tesselation: Float vectors in - float vectors out - best case not only one alloc in subsequent calls. Mandelbrot, millions of stack calculations, but final, read a int32 form stack for the pixel color. Or vertex data, thousands ove vertices but readen from stack directly in a big array buffer.(VectorR) Nobody create BigRational for fun, ownly if it necessary to keep the data for long time, and then it is ok of corse. and so on.

c-ohle commented 2 years ago

Also string format and parsing, it runs all on the stack, there is not only one alloc, only for the final string of course.

c-ohle commented 2 years ago

It does not solve our problem. Only to make sure - the stackmachine buffers, this is only work buffer, isolated from the number data. But you can see, if I would emulate with BigRational any model like the model of a Builder behavior - there is no single problem with buffers and allocs. This is only if youd have only a BigIntergerBuilder.

And since I have no problem with allocs etc. the only question remains. How should such end usre Builder UI look like?

c-ohle commented 2 years ago

Ok we could make the Builder emulator as a struct type with field stack index and maybe a ref to the cpu for securiy. It would solve at least the broblem with the many Builder allocations... I skip some steps what all is possible... but also the security issues etc...
In the moment if we want to provide standard "a + b * c" - then we have exactly the original problem again.

c-ohle commented 2 years ago

This would work but is of course a little bit dangerouse.

  struct BigIntegerBuilder
  {
    public static explicit operator BigIntegerBuilder(BigInteger value)
    {
      BigRational.task_cpu.push((BigRational)value); return default;
    }
    public BigInteger ToBigInteger()
    { 
      return (BigInteger)BigRational.task_cpu.popr(); 
    }
    public static BigIntegerBuilder operator +(BigIntegerBuilder a, BigIntegerBuilder b)
    {
      BigRational.task_cpu.add(); return default;
    }
    public static BigIntegerBuilder operator -(BigIntegerBuilder a, BigIntegerBuilder b)
    {
      BigRational.task_cpu.sub(); return default;
    }
    public static BigIntegerBuilder operator *(BigIntegerBuilder a, BigIntegerBuilder b)
    {
      BigRational.task_cpu.mul(); return default;
    }
    public static BigIntegerBuilder operator /(BigIntegerBuilder a, BigIntegerBuilder b)
    {
      BigRational.task_cpu.div(); return default;
    }
  }

then we could write:

  static BigInteger muladd(BigInteger a, BigInteger b, BigInteger c)
  {
    return (((BigIntegerBuilder)a * (BigIntegerBuilder)b) + (BigIntegerBuilder)c).ToBigInteger();
  }

tannergooding commented 2 years ago

The stack machine gives never a buffer away or pushes a buffer from outside in the collection.

This means that you, by definition, have additional allocations beyond what the Builder pattern does. The T.Builder pattern is able to take in existing T without allocating and return new T by directly utilizing the backing buffer that the Builder had created.

That after add() the result is on top of the stack. Forget allocs.

In order for add to put something onto the "stack", there must be an allocation to hold the result. The stack must be likewise be grown as necessary, to fit the longest chain of operations.

Most algorithms don't need BigRational as result. Example tesselation: Float vectors in - float vectors out - best case not only one alloc in subsequent calls.

The standard .NET pattern is T in, T out. That is, if an algorithm takes in float it returns float. It can internally operate using some other type, but the result should stay correct and not unnecessarily lose precision. In the same way, BigInteger needs to take in BigInteger and return BigInteger. BigRational takes in BigRational and returns BigRational. A BigRational.Builder can take a BigRational as a secondary input, but one of the inputs should be a BigRational.Builder as should the result. Only the explicit ToT() or MoveToT() APIs convert from T.Builder to T, with the former being a copy and the latter being a transference of the backing array.

But you can see, if I would emulate with BigRational any model like the model of a Builder behavior - there is no single problem with buffers and allocs.

The stack based approach cannot avoid the need to allocate or reuse existing allocations. The T itself is immutable and so any new T or data to represent a T must be some allocation or reuse an existing allocation of sufficient size (such as via pooling). The exception is a MoveToT() which can safely, due to its design, transfer ownership to the T and make the backing array Empty again.

then we could write:

The default pattern for such an API should be:

static BigInteger MultiplyAdd(BigInteger left, BigInteger right, BigInteger addend)
{
    return (a * b) + c;
}

such a pattern is natural, intuitive, and just works. The downside is that it is equivalent to the following and so allocates an extra time:

static BigInteger MultiplyAdd(BigInteger left, BigInteger right, BigInteger addend)
{
    BigInteger temp = a * b; // Allocate a temporary
    BigInteger result = temp + c; // Allocate the temporary
    return result;
}

The builder pattern allows you to optimize this away such as:

static BigInteger MultiplyAdd(BigInteger left, BigInteger right, BigInteger addend)
{
    // Compute the maximum length up front
    int size = left.Length + right.Length;
    size = int.Max(size, addend.Length) + 1;

    // Allocate a temporary big enough to hold the result
    BigInteger.Builder result = new BigInteger.Builder(size);

    result += left;
    result *= right;
    result += addend;

    return result.MoveToBigInteger();
}

Now you have the minimum number of allocations required and allow for maximum reuse of the data. There are of course tweaks to the API surface that can be made to make this friendly and easier to use, but it follows an existing .NET pattern that is known to work.

A stack based system as an implementation detail isn't going to be viable for a general .NET implementation as its not going to play well with threading, with async code, or with the common and expected usage scenario by consumers of the type because its an entirely new pattern and one that is not as friendly to typical programmers.

c-ohle commented 2 years ago

Dont now where to start, at the end.

Threading, is solved. can be protected with AsyncLocal<CPU>? _async_cpu The fast stack base adress check works, the method Imentioned upstairs, zero performance lost in praxis. It is only not tested and there was no more reason becouse if the CPU is later private than we would never get in such situation.

 a situatuation would be:
 var cpu1 = rat.task_cpu;    
 await something().
 var cpu2 =rat.task_cpu;    //maybe not the same like cpu 1.
 but for what? and what is dangerouse? nothing. It is something what not works and what makes no sense

and there is always also the way: var myprivatecpu = new BigRational.CPU(); cost not memory, works perfect for iterator and so on. It is all much more flexible...

c-ohle commented 2 years ago

The default pattern for such

I wanted only to show the pricipl. that then it also possible with a bunch if implicit conversions and operators to make the Builder (nearly) invisible is self explainig. Self explaining like the security problems and the code necessary to protect. But this is exactly the same for a class IntegerBuilder. But there every instance memory.

c-ohle commented 2 years ago

The standard .NET pattern is T in, T out of course, and this is what BigRational is doing by default. But I gues you ment my bigIntegerBuilder. As I said, it was only empty draft.

c-ohle commented 2 years ago

This means that you, by definition, have additional allocations beyond what the Builder pattern does.

For nomal use, no big array processing etc. calculating for precesission and not PI 100000 digits. A stack frame of 16 entrys is enough and coming from double input, using Rational to calculate line and plane intersections without epsilon problems for example typical case for geometrical algorithms. Results are in the range from 4 - 10 uint digits. Max stack buffer allocated 32 .. 64 uint digits, all together never more then 4k. Only to give a picture about the memory neads behind. It is notthing in comparison to all the allocs what happens for the same slow calculation with BigInt. And if it is to much. task_cpu could also hold as WeakReference. But cpu has a also free instruction - is documented. This is more efficent.

c-ohle commented 2 years ago

(we can delete later all this ok?)

c-ohle commented 2 years ago

But for a Builder as class, what you wanted. I can realize this that it exactly does what you want and I could make it safe that there is no more discussion. The real question is, the functions, the names, a protectiv UI layer. I understand, we have so far the basics:

void Set(BigInterg v); //and conversions etc.... for all void Add(BigInterg v);
void Sub(BigInterg v); void Mul(BigInterg v); void Div(BigInterg v); finaly a: BigInteger ToBigInterg();

all not a problem so far but from my understanding, for ab + cd we should find a expression that does not need new Builder(). Otherwise it will always ends with more allocs then without any builder.

tannergooding commented 2 years ago

Threading, is solved. can be protected with AsyncLocal? _async_cpu

AsyncLocal<T> has its own costs/limitations and while it does handle async code correctly, it may not behave correctly for multi-threaded code. Additionally, its not typical to use ThreadLocal or AsyncLocal outside of rare scenarios where its actually needed. In this scenario, it will restrict and otherwise hinder the ability for devs to perform complex operations with multiple BigInteger or BigRational types. They will be restricted to purely serial processing one chain of operations at a time and will not be able to better optimize their state management as required -or- appropriate.

It is all much more flexible...

Flexibility of the API surface isn't all that matters nor is even perf for that matter. The APIs will ship as part of the "core" set of functionality included "in box" and therefore they need to be easy to use (from a variety of languages) and versionable over time. The Framework Design Guidelines lays down the basis for how such APIs are shaped, named, and considerations that should be taken into account. They are by no means perfect, but we have 20 years of "proof" showing that following them tends to lead towards success.

BigInteger is an example of where, even though perf isn't amazing, it is generally simple and easy to use. Devs are able to pick it up and have it "just work". There is a missing scenario where code involving more complex expressions becomes overly expensive and there have been requests to improve that. There are also a few possible solutions to covering this scenario, one of which is the builder pattern and another of which would be expression trees. Each of these have pros and cons and we'll need to determine the actual best approach longer term, but the builder pattern in particular tends to be a relatively 1-to-1 translation and simple for devs (or analyzers/code fixers) to transition over towards. The allocations here aren't even the expensive part really, its the large repeated allocations and copies that would otherwise be unnecessary. Allocating itself is fairly cheap as are the indirections to access the data and while not allocating, especially for a lot of short term data, can be beneficial; it is far from the largest current expense for these types.

But cpu has a also free instruction - is documented. This is more efficent.

The CPU class is not something we could expose to users and again has all the same allocation requirements and limitations that the Builder pattern would have. There is no way to remove or avoid certain allocations as the memory required to represent and store the intermediate results must be available. The purpose of the Builder pattern (and your CPU class) is to reduce allocations as much as possible. The Builder pattern is a setup that is much easier to work with and which is an existing pattern familiar to many .NET developers, so it already comes out ahead in terms of what such an API can or should expose.

The CPU class is an "entirely new" pattern and is effectively trying to model a way to emit "inline instructions" for a miniature VM dedicated to processing BigRational numbers. It adds and requires a significant number of new concepts to understand and different pattern/approach to writing the algorithms that will make it a pit of failure for many developers, particularly once you start getting into more complex programs that may require multi-threading or async code, code where devs are trying to manually optimize or fine tune the handling, etc.

c-ohle commented 2 years ago

AsyncLocal

I know, but again, the solution, what I found and cost nothing is a quick check that:

gives a hint that AsyncLocal shold be ask, (what is 3x times slower than ask [ThreadStatic]) to get the right cpu.
It is really hard to construct such case for any reasonable scenario and one reason that it is not alread in, and the second reason: I would avoid a second [ThreadStatic] to keep all small and compact as possible.
But the gut thing is: this quick check can also be used to break everything with an exception since it is a programmers bug like call a WinForms function from another thread.

c-ohle commented 2 years ago

Flexibility of the API surface

Yes, No question

BigInteger is an example of where, even...

But what I keep hearing all the ways is performance Net 7, performance Span's, intrinsics etc. so my I guess there is a interest for small performant solutions to make the system better.

c-ohle commented 2 years ago

The CPU class is not something...

But then it is realy absolute unvisible for the user and only a Rational number class that can much more than BigInteger. And also is currently 20% faster for pure integer arithmetic than BigInteger. Do you saw the last banchmark?

c-ohle commented 2 years ago

https://c-ohle.github.io/RationalNumerics/web/ban/BigRational-Benchmarks-NET7.html for a assembly internal size of 20k

c-ohle commented 2 years ago

The CPU class is an "entirely new" pattern

sure, but also with internal cpu, we can expose powerfule sulutions. Do you saw this? For the whole Numeric Vector set exist already VectorR version, compatible in both directions. I have so much more currently not in the Test project. There is a Sparse Matrix library ready, a Tensor library. All implementations are extremly compact in size. So much more.

c-ohle commented 2 years ago

Additional, there so many places in the framework with similiar things. System.Reflection.Emit the low lever verison is also open. And sorry, I see all the way in the code of the core ThreadStatics and that they keep her buffers.

dotnet / runtime