dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
14.64k stars 4.57k forks source link

Very slow in those type of calcs #92184

Open johnnygiter opened 10 months ago

johnnygiter commented 10 months ago

Net (all versions, 6, 7 and coming 8 are all very slow compared to other languages) I do not expect to be as fast as e.g. rust, but being 10x slower than slow python is a bit of a shame:

https://programming-language-benchmarks.vercel.app/problem/edigits

I'm not sure if this will work there, but this is a suggestion for possible optimisations. Hope that version 9 will do something with that.

neon-sunset commented 10 months ago

Seems to be an issue with BigInteger performance.

Given n = 250001, VS profiler shows most time is spent at these two lines: https://github.com/hanabi1224/Programming-Language-Benchmarks/blob/main/bench/algorithm/edigits/1.cs#L18-L19

Suchiman commented 10 months ago

To be precise, most time is spent in BigInteger.op_division

tannergooding commented 10 months ago

These benchmark games are not necessarily representative of real world workloads. Nor do they always implement code the idiomatic way nor do they always do them the best way.

There are improvements that could be made to BigInteger to further improve its performance, but likewise one needs to account for whether that scenario is likely to be encountered in the real world and if there is a viable and obvious alternative that would better handle the scenario natively for the language in question.

Python largely only has 3 numeric types: int, float, complex and as such it is explicitly optimized to best handle the fact that these are the only numeric types it has. There are scenarios that you simply cannot handle well in Python because of this.

On the other hand, C#/.NET has many numeric types with different precisions and signedness: byte/sbyte, short/ushort, int/uint, long/ulong, nint/nuint, Int128/UInt128, float, double, Half, BigInteger, Complex, etc.

C#/.NET are correspondingly optimized around this fact and it is often expected that you select the right data type for your scenario, rather than solely relying on BigInteger for everything.

Much in the same way since Java doesn't have value types, their JIT has a ton of optimizations to work around this limitation. C#/.NET explicitly have value types and so we have less need to provide the same since it is expected you pick the right type for your scenario.

This allows C#/.NET users more control and often the ability to achieve overall better perf when you select the right tool for the job. Simple naive transpilation is often not sufficient nor a good representation of language performance.

davidfowl commented 10 months ago

While that's true @tannergooding most of the benchmarks are using a big integer implementation so it looks pretty fair from that POV. I agree that it's unclear if it's real-world benchmark but if there's low hanging fruit in big integer this might be the motivation to improve it.

tannergooding commented 10 months ago

Yes. BigInteger needs a full rewrite eventually, but it's much lower priority compared to other work.

The two key considerations are:

  1. BigInteger is readonly so every operation is a new allocation and copy
  2. BigInteger currently uses uint[] as its backing storage when it could benefit from using nuint[] so it can process twice as many bits per cycle on 64-bit and simplify it's multiplication/division algorithms

Accounting for the readonly factor would likely make the large factor in the perf here

tannergooding commented 10 months ago

We have a higher perf mutable and inline buffer based BigInteger we use internally for things like floating-point parsing and formatting

davidfowl commented 10 months ago

If there's a work item on BigInteger performance, we can link to these benchmarks.

johnnygiter commented 10 months ago

Unfortunately explanations like "i's not real world scenario" and similar are not fully true Actually this is is real world scenario someone want to do some calculations and other languages outperforming not 10 or 30% but 1000%. IMHO definetely there is lot of room for improvements even in those benchmark games to be on top results rather than beeing on the bottom or "out of time" or at least not behind slow languages like python.

Good example is this simple code I did On my hardware version 8 RC runs ~3x faster than version 7, and 6 with code below Unfortunately 8.0 in mentioned benchmark in my first post is still horribly slow.

So potential hope for new version 9.0 about BigInteger improvements.

public class MyBenchmark
{
    private const int Iterations = 50000; 

    [Benchmark]
    public void BenchmarkSubtraction()
    {
        BigInteger total = BigInteger.Pow(10, 10000);
        for (int i = 0; i < Iterations; i++)
        {
            total -= i;
        }
    }

}
davidfowl commented 10 months ago

Actually this is is real world scenario someone want to do some calculations and other languages outperforming not 10 or 30% but 1000%.

Maybe, but it's also possible they would use a different number type when using C#. That's not an excuse, but this isn't the only to do calculations in C# as @tannergooding mentioned.

johnnygiter commented 10 months ago

Actually this is is real world scenario someone want to do some calculations and other languages outperforming not 10 or 30% but 1000%.

Maybe, but it's also possible they would use a different number type when using C#. That's not an excuse, but this isn't the only to do calculations in C# as @tannergooding mentioned.

That's why I called it "fully" because partially I can agree with tannergooding

tannergooding commented 10 months ago

The intent wasn't to say that using BigInteger to perform calculations wasn't "real world". But rather that many micro-benchmarks aren't representative of how such calculations might be done for scenarios that require it.

Microbenchmarks are, as the name implies, "micro". They are often a very small piece of the overall picture and may often have various constraints around the inputs/outputs that do not align with how you might use such code in a fully integrated production environment.

As I mentioned, Python primarily performs better in this particular case because they only have BigInteger and so it is always their hot path, for every application/library/framework. As such, they have done significant work to ensure it performs well across all scenarios.

However, in .NET, BigInteger is not typically part of the hot path. It is restricted to more specific domains and edge cases instead. As such, it does not get as much focus. Effort put into BigInteger helps a small percentage of apps, while effort put into things like Span<T> and LINQ, help almost all apps.

It is, of course, still important we optimize it further. But its prioritization may differ from other languages accordingly. And that when we optimize, we take a look at what the microbenchmark is doing and if it looks represenative of the work a non static scenario would need to compute. -- For example, using BigInteger to do floating-point parsing/formatting is a much more representative example than e-digits. This is because the former is a core algorithm used by all langauges/runtimes, in machine learning, graphics, etc. It is a very core operation. The latter is very domain specific and doesn't have many real world usages. The scenarios where e digits is useful is largely just for knowledge purposes and is better handled by a highly tuned scenario specific algorithm that might get distributed to run on a supercomputer (much as algorithms such as pi digits are done).


We are aware of some efforts that are needed to improve BigInteger and why it performs slower in this particular case.

For this scenario the biggest issue is that BigInteger is immutable and so a new allocation is required as part of every operation. This allocation isn't actually the expensive part, but rather having to work with and touch the additional memory as part of every operation.

The other factor is that our BigInteger implementation is still based on uint[] and so for larger values, it has to do more work than if it were backed by nuint[] instead. This is particularly impactful to multiplication and division operations.

The primary solution to the first factor would be to provide a class BigIntegerBuilder like type which is itself mutable.

The primary solution to the second factor would be to rewrite the implementation of BigInteger to better take advantage of 64-bit processors.

Neither of these are trivial tasks and would require significant work. The first would require API design and review, while the latter would simply require someone to do the implementation and review it.

There are notably some other more minor improvements that could be done as well. But they would be overall less impactful outside targeted scenarios like this. As an examples, we could provide a specialization for Pow(10, ...) or an explicit Exp2 and Exp10 function, to match what single/double have. We could likewise provide specialized functions for scenarios such as (a * b) / c (much as many places provide ModPow).

ghost commented 10 months ago

Tagging subscribers to this area: @dotnet/area-system-runtime See info in area-owners.md if you want to be subscribed.

Issue Details
Net (all versions, 6, 7 and coming 8 are all very slow compared to other languages) I do not expect to be as fast as e.g. rust, but being 10x slower than slow python is a bit of a shame: https://programming-language-benchmarks.vercel.app/problem/edigits I'm not sure if this will work there, but this is a suggestion for possible optimisations. Hope that version 9 will do something with that.
Author: johnnygiter
Assignees: -
Labels: `area-System.Runtime`, `tenet-performance`, `untriaged`, `needs-area-label`
Milestone: -
ghost commented 10 months ago

Tagging subscribers to this area: @dotnet/area-system-numerics See info in area-owners.md if you want to be subscribed.

Issue Details
Net (all versions, 6, 7 and coming 8 are all very slow compared to other languages) I do not expect to be as fast as e.g. rust, but being 10x slower than slow python is a bit of a shame: https://programming-language-benchmarks.vercel.app/problem/edigits I'm not sure if this will work there, but this is a suggestion for possible optimisations. Hope that version 9 will do something with that.
Author: johnnygiter
Assignees: -
Labels: `area-System.Numerics`, `tenet-performance`
Milestone: -
mjsabby commented 2 months ago

BCryptEncrypt which admittedly is super optimized for the case it cares about ... but still, it's about 7 times faster than equivalent math using BigInteger. I've pasted this gist that compares two implementations

https://gist.github.com/mjsabby/b87b629fa902cd641b955892b1ab9604

kzrnm commented 2 months ago

This issue may be fixed in #96895.


BenchmarkDotNet v0.13.12, Windows 11 (10.0.22631.3527/23H2/2023Update/SunValley3)
13th Gen Intel Core i5-13500, 1 CPU, 20 logical and 14 physical cores
.NET SDK 9.0.100-preview.3.24204.13
  [Host]     : .NET 9.0.0 (9.0.24.17209), X64 RyuJIT AVX2
  Job-HUDNJV : .NET 9.0.0 (42.42.42.42424), X64 RyuJIT AVX2
  Job-ZLKUHU : .NET 9.0.0 (42.42.42.42424), X64 RyuJIT AVX2
Method Job Toolchain n Mean Error StdDev Ratio Code Size Gen0 Allocated Alloc Ratio
Run Job-HUDNJV \main\corerun.exe 100000 562.1 ms 5.64 ms 5.00 ms 1.00 12,619 B - 4.28 MB 1.00
Run Job-ZLKUHU \pr\corerun.exe 100000 160.0 ms 0.90 ms 0.80 ms 0.28 14,862 B 250.0000 4.28 MB 1.00
Run Job-HUDNJV \main\corerun.exe 250001 3,503.8 ms 25.70 ms 24.04 ms 1.00 12,680 B - 11.27 MB 1.00
Run Job-ZLKUHU \pr\corerun.exe 250001 936.8 ms 9.63 ms 9.01 ms 0.27 12,624 B - 11.27 MB 1.00
benchmark code ```cs using BenchmarkDotNet.Attributes; using BenchmarkDotNet.Running; using BenchmarkDotNet.Configs; using System.Numerics; using System.Runtime.InteropServices; [DisassemblyDiagnoser] [GroupBenchmarksBy(BenchmarkLogicalGroupRule.ByMethod)] public class PerformanceTest { [Benchmark] [Arguments(100000)] [Arguments(250001)] public string Run(int n) { var result = new System.Text.StringBuilder(); var k = BinarySearch(n); var (p, q) = SumTerms(0, k - 1); p += q; var a = BigInteger.Pow(new BigInteger(10), n - 1); var answer = p * a / q; var answerStr = answer.ToString(); Span sb = stackalloc char[10]; for (var i = 0; i < n; i += 10) { var count = i + 10; if (count > n) { count = n; } for (var j = i; j < i + 10; j++) { if (j < n) { sb[j - i] = answerStr[j]; } else { sb[j - i] = ' '; } } result.AppendLine($"{new String(sb)}\t:{count}"); } return result.ToString(); } static (BigInteger, BigInteger) SumTerms(int a, int b) { if (b == a + 1) { return (new BigInteger(1), new BigInteger(b)); } var mid = (a + b) / 2; var (pLeft, qLeft) = SumTerms(a, mid); var (pRight, qRight) = SumTerms(mid, b); return (pLeft * qRight + pRight, qLeft * qRight); } static int BinarySearch(int n) { var a = 0; var b = 1; while (!TestK(n, b)) { a = b; b *= 2; } while (b - a > 1) { var m = (a + b) / 2; if (TestK(n, m)) { b = m; } else { a = m; } } return b; } static bool TestK(int n, int k) { if (k < 0) { return false; } var lnKFactorial = k * (Math.Log((double)k) - 1) + 0.5 * Math.Log(Math.PI * 2); var log10KFactorial = lnKFactorial / Math.Log(10); return log10KFactorial >= (double)(n + 50); } } ```
johnnygiter commented 3 weeks ago

This issue may be fixed in #96895.


BenchmarkDotNet v0.13.12, Windows 11 (10.0.22631.3527/23H2/2023Update/SunValley3)
13th Gen Intel Core i5-13500, 1 CPU, 20 logical and 14 physical cores
.NET SDK 9.0.100-preview.3.24204.13
  [Host]     : .NET 9.0.0 (9.0.24.17209), X64 RyuJIT AVX2
  Job-HUDNJV : .NET 9.0.0 (42.42.42.42424), X64 RyuJIT AVX2
  Job-ZLKUHU : .NET 9.0.0 (42.42.42.42424), X64 RyuJIT AVX2

Method Job Toolchain n Mean Error StdDev Ratio Code Size Gen0 Allocated Alloc Ratio Run Job-HUDNJV \main\corerun.exe 100000 562.1 ms 5.64 ms 5.00 ms 1.00 12,619 B - 4.28 MB 1.00 Run Job-ZLKUHU \pr\corerun.exe 100000 160.0 ms 0.90 ms 0.80 ms 0.28 14,862 B 250.0000 4.28 MB 1.00 Run Job-HUDNJV \main\corerun.exe 250001 3,503.8 ms 25.70 ms 24.04 ms 1.00 12,680 B - 11.27 MB 1.00 Run Job-ZLKUHU \pr\corerun.exe 250001 936.8 ms 9.63 ms 9.01 ms 0.27 12,624 B - 11.27 MB 1.00 benchmark code

Is that already in .NET 9 Preview 5?