Open johnnygiter opened 1 year ago
Seems to be an issue with BigInteger
performance.
Given n = 250001
, VS profiler shows most time is spent at these two lines: https://github.com/hanabi1224/Programming-Language-Benchmarks/blob/main/bench/algorithm/edigits/1.cs#L18-L19
To be precise, most time is spent in BigInteger.op_division
These benchmark games are not necessarily representative of real world workloads. Nor do they always implement code the idiomatic way nor do they always do them the best way.
There are improvements that could be made to BigInteger
to further improve its performance, but likewise one needs to account for whether that scenario is likely to be encountered in the real world and if there is a viable and obvious alternative that would better handle the scenario natively for the language in question.
Python largely only has 3 numeric types: int
, float
, complex
and as such it is explicitly optimized to best handle the fact that these are the only numeric types it has. There are scenarios that you simply cannot handle well in Python because of this.
On the other hand, C#/.NET has many numeric types with different precisions and signedness: byte
/sbyte
, short
/ushort
, int
/uint
, long
/ulong
, nint
/nuint
, Int128
/UInt128
, float
, double
, Half
, BigInteger
, Complex
, etc.
C#/.NET are correspondingly optimized around this fact and it is often expected that you select the right data type for your scenario, rather than solely relying on BigInteger
for everything.
Much in the same way since Java doesn't have value types
, their JIT has a ton of optimizations to work around this limitation. C#/.NET explicitly have value types and so we have less need to provide the same since it is expected you pick the right type for your scenario.
This allows C#/.NET users more control and often the ability to achieve overall better perf when you select the right tool for the job. Simple naive transpilation is often not sufficient nor a good representation of language performance.
While that's true @tannergooding most of the benchmarks are using a big integer implementation so it looks pretty fair from that POV. I agree that it's unclear if it's real-world benchmark but if there's low hanging fruit in big integer this might be the motivation to improve it.
Yes. BigInteger needs a full rewrite eventually, but it's much lower priority compared to other work.
The two key considerations are:
uint[]
as its backing storage when it could benefit from using nuint[]
so it can process twice as many bits per cycle on 64-bit and simplify it's multiplication/division algorithmsAccounting for the readonly factor would likely make the large factor in the perf here
We have a higher perf mutable and inline buffer based BigInteger we use internally for things like floating-point parsing and formatting
If there's a work item on BigInteger performance, we can link to these benchmarks.
Unfortunately explanations like "i's not real world scenario" and similar are not fully true Actually this is is real world scenario someone want to do some calculations and other languages outperforming not 10 or 30% but 1000%. IMHO definetely there is lot of room for improvements even in those benchmark games to be on top results rather than beeing on the bottom or "out of time" or at least not behind slow languages like python.
Good example is this simple code I did On my hardware version 8 RC runs ~3x faster than version 7, and 6 with code below Unfortunately 8.0 in mentioned benchmark in my first post is still horribly slow.
So potential hope for new version 9.0 about BigInteger improvements.
public class MyBenchmark
{
private const int Iterations = 50000;
[Benchmark]
public void BenchmarkSubtraction()
{
BigInteger total = BigInteger.Pow(10, 10000);
for (int i = 0; i < Iterations; i++)
{
total -= i;
}
}
}
Actually this is is real world scenario someone want to do some calculations and other languages outperforming not 10 or 30% but 1000%.
Maybe, but it's also possible they would use a different number type when using C#. That's not an excuse, but this isn't the only to do calculations in C# as @tannergooding mentioned.
Actually this is is real world scenario someone want to do some calculations and other languages outperforming not 10 or 30% but 1000%.
Maybe, but it's also possible they would use a different number type when using C#. That's not an excuse, but this isn't the only to do calculations in C# as @tannergooding mentioned.
That's why I called it "fully" because partially I can agree with tannergooding
The intent wasn't to say that using BigInteger
to perform calculations wasn't "real world". But rather that many micro-benchmarks aren't representative of how such calculations might be done for scenarios that require it.
Microbenchmarks are, as the name implies, "micro". They are often a very small piece of the overall picture and may often have various constraints around the inputs/outputs that do not align with how you might use such code in a fully integrated production environment.
As I mentioned, Python primarily performs better in this particular case because they only have BigInteger
and so it is always their hot path, for every application/library/framework. As such, they have done significant work to ensure it performs well across all scenarios.
However, in .NET, BigInteger
is not typically part of the hot path. It is restricted to more specific domains and edge cases instead. As such, it does not get as much focus. Effort put into BigInteger
helps a small percentage of apps, while effort put into things like Span<T>
and LINQ, help almost all apps.
It is, of course, still important we optimize it further. But its prioritization may differ from other languages accordingly. And that when we optimize, we take a look at what the microbenchmark is doing and if it looks represenative of the work a non static scenario would need to compute. -- For example, using BigInteger
to do floating-point parsing/formatting is a much more representative example than e-digits
. This is because the former is a core algorithm used by all langauges/runtimes, in machine learning, graphics, etc. It is a very core operation. The latter is very domain specific and doesn't have many real world usages. The scenarios where e digits
is useful is largely just for knowledge purposes and is better handled by a highly tuned scenario specific algorithm that might get distributed to run on a supercomputer (much as algorithms such as pi digits
are done).
We are aware of some efforts that are needed to improve BigInteger
and why it performs slower in this particular case.
For this scenario the biggest issue is that BigInteger
is immutable
and so a new allocation is required as part of every operation. This allocation isn't actually the expensive part, but rather having to work with and touch the additional memory as part of every operation.
The other factor is that our BigInteger
implementation is still based on uint[]
and so for larger values, it has to do more work than if it were backed by nuint[]
instead. This is particularly impactful to multiplication and division operations.
The primary solution to the first factor would be to provide a class BigIntegerBuilder
like type which is itself mutable.
The primary solution to the second factor would be to rewrite the implementation of BigInteger
to better take advantage of 64-bit processors.
Neither of these are trivial tasks and would require significant work. The first would require API design and review, while the latter would simply require someone to do the implementation and review it.
There are notably some other more minor improvements that could be done as well. But they would be overall less impactful outside targeted scenarios like this. As an examples, we could provide a specialization for Pow(10, ...)
or an explicit Exp2
and Exp10
function, to match what single
/double
have. We could likewise provide specialized functions for scenarios such as (a * b) / c
(much as many places provide ModPow
).
Tagging subscribers to this area: @dotnet/area-system-runtime See info in area-owners.md if you want to be subscribed.
Author: | johnnygiter |
---|---|
Assignees: | - |
Labels: | `area-System.Runtime`, `tenet-performance`, `untriaged`, `needs-area-label` |
Milestone: | - |
Tagging subscribers to this area: @dotnet/area-system-numerics See info in area-owners.md if you want to be subscribed.
Author: | johnnygiter |
---|---|
Assignees: | - |
Labels: | `area-System.Numerics`, `tenet-performance` |
Milestone: | - |
BCryptEncrypt which admittedly is super optimized for the case it cares about ... but still, it's about 7 times faster than equivalent math using BigInteger. I've pasted this gist that compares two implementations
https://gist.github.com/mjsabby/b87b629fa902cd641b955892b1ab9604
This issue may be fixed in #96895.
BenchmarkDotNet v0.13.12, Windows 11 (10.0.22631.3527/23H2/2023Update/SunValley3)
13th Gen Intel Core i5-13500, 1 CPU, 20 logical and 14 physical cores
.NET SDK 9.0.100-preview.3.24204.13
[Host] : .NET 9.0.0 (9.0.24.17209), X64 RyuJIT AVX2
Job-HUDNJV : .NET 9.0.0 (42.42.42.42424), X64 RyuJIT AVX2
Job-ZLKUHU : .NET 9.0.0 (42.42.42.42424), X64 RyuJIT AVX2
Method | Job | Toolchain | n | Mean | Error | StdDev | Ratio | Code Size | Gen0 | Allocated | Alloc Ratio |
---|---|---|---|---|---|---|---|---|---|---|---|
Run | Job-HUDNJV | \main\corerun.exe | 100000 | 562.1 ms | 5.64 ms | 5.00 ms | 1.00 | 12,619 B | - | 4.28 MB | 1.00 |
Run | Job-ZLKUHU | \pr\corerun.exe | 100000 | 160.0 ms | 0.90 ms | 0.80 ms | 0.28 | 14,862 B | 250.0000 | 4.28 MB | 1.00 |
Run | Job-HUDNJV | \main\corerun.exe | 250001 | 3,503.8 ms | 25.70 ms | 24.04 ms | 1.00 | 12,680 B | - | 11.27 MB | 1.00 |
Run | Job-ZLKUHU | \pr\corerun.exe | 250001 | 936.8 ms | 9.63 ms | 9.01 ms | 0.27 | 12,624 B | - | 11.27 MB | 1.00 |
This issue may be fixed in #96895.
BenchmarkDotNet v0.13.12, Windows 11 (10.0.22631.3527/23H2/2023Update/SunValley3) 13th Gen Intel Core i5-13500, 1 CPU, 20 logical and 14 physical cores .NET SDK 9.0.100-preview.3.24204.13 [Host] : .NET 9.0.0 (9.0.24.17209), X64 RyuJIT AVX2 Job-HUDNJV : .NET 9.0.0 (42.42.42.42424), X64 RyuJIT AVX2 Job-ZLKUHU : .NET 9.0.0 (42.42.42.42424), X64 RyuJIT AVX2
Method Job Toolchain n Mean Error StdDev Ratio Code Size Gen0 Allocated Alloc Ratio Run Job-HUDNJV \main\corerun.exe 100000 562.1 ms 5.64 ms 5.00 ms 1.00 12,619 B - 4.28 MB 1.00 Run Job-ZLKUHU \pr\corerun.exe 100000 160.0 ms 0.90 ms 0.80 ms 0.28 14,862 B 250.0000 4.28 MB 1.00 Run Job-HUDNJV \main\corerun.exe 250001 3,503.8 ms 25.70 ms 24.04 ms 1.00 12,680 B - 11.27 MB 1.00 Run Job-ZLKUHU \pr\corerun.exe 250001 936.8 ms 9.63 ms 9.01 ms 0.27 12,624 B - 11.27 MB 1.00 benchmark code
Is that already in .NET 9 Preview 5?
5% improvement regarding .net 8.0 in net 9.0 RC2, joke. Still at least 300-500% times to slow
Net (all versions, 6, 7 and coming 8 are all very slow compared to other languages) I do not expect to be as fast as e.g. rust, but being 10x slower than slow python is a bit of a shame:
https://programming-language-benchmarks.vercel.app/problem/edigits
I'm not sure if this will work there, but this is a suggestion for possible optimisations. Hope that version 9 will do something with that.