Open johnnygiter opened 7 months ago
Benchmarks like these have to be taken with a grain of salt.
Both e-digits and pi-digits are not representative of real world work and are instead only representative of something that one would do purely for fun. Not only does any non-specialized use of pi
not need more than around 39 digits (the number needed to compute the circumference of the observable universe to within 1 atom of precision), but there are better ways to get the first n-digits
of these irrational numbers as there are actual supercomputers that have already written highly fine tuned applications that have computed 105 trillion digits and where the code looks nothing like any of the given implementations.
Additionally, the initial code for a platform is often authored in the most naive way possible. It isn't accounting for things like whether the functionality is built-in, whether it is actually executing in the language in question, whether it may be deferring down to another language where the actual core logic is running, etc. The community is then responsible for coming in and giving alternative implementations that may do better and more optimal things, with different languages getting different levels of interest. This means that the actual implementation of for two languages could be very different and make them not exactly comparable.
The way they are measuring the code also has to be accounted for, as they are not always measuring the code in identical scenarios. Some of the languages are being compiled AOT with all possible optimizations enabled, some of these compilers have built-in optimizations to recognize some of these patterns and effectively compile the code down to a near no-op (which is only possible when you have scenarios like this where everything is constant). Others are being compiled JIT and the timings are including that overhead plus not correctly accounting for other features like Tiered Compilation.
Now, the main reason the C# implementation for these two benchmarks is much slower is effectively down to the BigInteger
implementation and how the code is then using it.
In Python, every number is a "bignum" and so they've spent much more time fine tuning this handling to ensure that the common cases remain performant. Their big integer type is designed to work with any code, to account for people writing the most naive possible algorithms, and has specialized compiler support to do optimizations and other logic to make it as optimized as possible.
In C#, we instead have both fixed-sized
types (int
, long
, Int128
, etc) and the arbitrarily sized System.Numerics.BigInteger
. We expect that the fixed-sized types are the typical ones used and that BigInteger
is only used in specialized scenarios. We likewise have reasonable assumptions in BigInteger
around the maximum size we'll see and while we do support larger sizes, they've not received as much care or optimization work to make those scenarios the most efficient possible.
Correspondingly, this means that BigInteger
has some design decisions around it like it being immutable
and therefore (a + b) / 2
will involve allocating at twice and traversing the memory of inputs 4 times (once for a
, once for b
, once to create temporary that (a + b)
produced, and then once more for that temporary as part of the tmp / 2
operation). The naive implementation is therefore not very efficient and there are often better ways to handle this.
There's also some nuance in that because we have many different integer types, BigInteger
doesn't get as much focus since its primarily intended for specialized scenarios and not for "everyday scenarios". This does mean that we have some known optimization opportunities to make it perform better on modern hardware and even to allow users to more easily write naive logic (like done in the benchmark) without seeing terrible performance. There hasn't been an opportunity to do this optimization work as of yet, namely because there is work that is higher priority and more impactful to real world workloads to be completed in other areas first.
The other factor is that the code isn't actually taking into account how C# operates, what functionality is built into the BCL, etc. It's missing opportunities to use constants like double.Tau
(while it is using these for Python
and Rust
), it's missing opportunities to do more efficient string formatting using span/slices, that doesn't force many copies and allocations, and so on.
Simple pow on bigintegers is horribly slow comparing to other languages too (even python is much faster like over 300%)... Checked on .net6 and .net8 almost same.
using System.Numerics;
class Program
{
static void Main()
{
string baseStr = "1000";
string exponentStr = "50000";
BigInteger baseNum = BigInteger.Parse(baseStr);
BigInteger exponentNum = BigInteger.Parse(exponentStr);
BigInteger result = BigInteger.Pow(baseNum, (int)exponentNum);
Console.WriteLine($"Result: {result}");
}
}
That program is:
As explained above, there are known cases where the .NET BigInteger code could be improved. It was also explained that there's a lot of nuance towards benchmarking, real world perf considerations, etc. It is generally not sufficient to take a single simple program and use it as some metric of proof showing that it means X
is better or faster than Y
.
I've chosen this benchmark for a reason, it's very poor. net.9 Preview4 make it even 12% worse
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
using BenchmarkDotNet.Configs;
using System;
using System.Numerics;
using BenchmarkDotNet.Jobs;
[DisassemblyDiagnoser]
[SimpleJob(RuntimeMoniker.Net80, baseline: true)]
[SimpleJob(RuntimeMoniker.Net90)]
[GroupBenchmarksBy(BenchmarkLogicalGroupRule.ByMethod)]
public class PerformanceTest
{
[Benchmark]
[Arguments(80000)]
public string Run(int n)
{
var result = new System.Text.StringBuilder();
var k = BinarySearch(n);
var (p, q) = SumTerms(0, k - 1);
p += q;
var a = BigInteger.Pow(new BigInteger(10), n - 1);
var answer = p * a / q;
var answerStr = answer.ToString();
Span<char> sb = stackalloc char[10];
for (var i = 0; i < n; i += 10)
{
var count = i + 10;
if (count > n)
{
count = n;
}
for (var j = i; j < i + 10; j++)
{
if (j < n)
{
sb[j - i] = answerStr[j];
}
else
{
sb[j - i] = ' ';
}
}
result.AppendLine($"{new string(sb)}\t:{count}");
}
return result.ToString();
}
static (BigInteger, BigInteger) SumTerms(int a, int b)
{
if (b == a + 1)
{
return (new BigInteger(1), new BigInteger(b));
}
var mid = (a + b) / 2;
var (pLeft, qLeft) = SumTerms(a, mid);
var (pRight, qRight) = SumTerms(mid, b);
return (pLeft * qRight + pRight, qLeft * qRight);
}
static int BinarySearch(int n)
{
var a = 0;
var b = 1;
while (!TestK(n, b))
{
a = b;
b *= 2;
}
while (b - a > 1)
{
var m = (a + b) / 2;
if (TestK(n, m))
{
b = m;
}
else
{
a = m;
}
}
return b;
}
static bool TestK(int n, int k)
{
if (k < 0)
{
return false;
}
var lnKFactorial = k * (Math.Log((double)k) - 1) + 0.5 * Math.Log(Math.PI * 2);
var log10KFactorial = lnKFactorial / Math.Log(10);
return log10KFactorial >= (double)(n + 50);
}
public static void Main(string[] args)
{
var summary = BenchmarkRunner.Run<PerformanceTest>();
Console.WriteLine(summary);
}
}
.NET SDK 9.0.100-preview.4.24267.66 [Host] : .NET 8.0.5 (8.0.524.21615), X64 RyuJIT AVX2 .NET 8.0 : .NET 8.0.5 (8.0.524.21615), X64 RyuJIT AVX2 .NET 9.0 : .NET 9.0.0 (9.0.24.26619), X64 RyuJIT AVX2
Method | Job | Runtime | n | Mean | Error | StdDev | Ratio | Code Size |
---|---|---|---|---|---|---|---|---|
Run | .NET 8.0 | .NET 8.0 | 80000 | 326.4 ms | 0.55 ms | 0.49 ms | 1.00 | 9,827 B |
Run | .NET 9.0 | .NET 9.0 | 80000 | 364.4 ms | 0.56 ms | 0.49 ms | 1.12 | 10,435 B |
I've chosen this benchmark for a reason, it's very poor.
There is a certain level of responsibility on the code author to write good code when they want performance.
net.9 Preview4 make it even 12% worse
This regression is from ToString
, which as detailed above has known issues. It is being worked on, but isn't a huge priority compared to all the other work that could be done and a benchmark like this isn't going to change that since it isn't representative of a real world scenario someone would be likely to use in production. The primary fix for such code would be to not write things in such a naive/unoptimized fashion if performance were a consideration.
On my box, the exact code you've given (+ [MemoryDiagnoser]
) reports:
Method | Job | Runtime | n | Mean | Error | StdDev | Median | Ratio | RatioSD | Code Size | Allocated | Alloc Ratio |
---|---|---|---|---|---|---|---|---|---|---|---|---|
Run | .NET 8.0 | .NET 8.0 | 80000 | 272.8 ms | 3.10 ms | 6.53 ms | 269.7 ms | 1.00 | 0.00 | 11,489 B | 3.41 MB | 1.00 |
Run | .NET 9.0 | .NET 9.0 | 80000 | 314.9 ms | 6.12 ms | 6.28 ms | 315.9 ms | 1.14 | 0.03 | 12,257 B | 3.23 MB | 0.94 |
It's then worth noting that you're skewing the benchmark (little bit of overhead, code size, and total bytes allocated) due to using the string builder in a fairly naive fashion. Changing it like this fixes that and makes it closer to the Python implementation by utilizing string interpolation, formatting, and slicing:
public IEnumerable<object[]> RunArgs()
{
const int n = 80000;
var lineCount = (n / 10) + 1;
var output = new StringBuilder(
n // Number of digits
+ lineCount // Number of tabs
+ lineCount // Number of colons
+ (int)(lineCount * float.Log10(n)) // Number of markers (approx)
+ (lineCount * Environment.NewLine.Length) // Number of newlines
);
yield return [n, output];
}
[Benchmark]
[ArgumentsSource(nameof(RunArgs))]
public void Run(int n, StringBuilder output)
{
var k = BinarySearch(n);
var (p, q) = SumTerms(0, k - 1);
p += q;
var a = BigInteger.Pow(new BigInteger(10), n - 1);
var answer = p * a / q;
var s = answer.ToString().AsSpan();
for (var i = 0; i < n; i += 10)
{
if ((i + 10) <= n)
{
output.AppendLine($"{s[i..(i + 10)]}\t:{i + 10}");
}
else
{
output.AppendLine($"{s[i..],-10}\t:{n}");
}
}
}
Which gives us: | Method | Job | Runtime | n | output | Mean | Error | StdDev | Median | Ratio | RatioSD | Code Size | Allocated | Alloc Ratio |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Run | .NET 8.0 | .NET 8.0 | 80000 | 273.5 ms | 4.36 ms | 10.69 ms | 269.7 ms | 1.00 | 0.00 | 16,196 B | 2.76 MB | 1.00 | ||
Run | .NET 9.0 | .NET 9.0 | 80000 | 313.6 ms | 5.20 ms | 4.61 ms | 313.6 ms | 1.10 | 0.07 | 6,419 B | 2.57 MB | 0.93 |
Now, it reduced allocations and code size for .NET 9, but it didn't really change the execution time because string builder itself is very fast. So we can experiment a bit and find out where the cost is coming from:
[ArgumentsSource(nameof(RunArgs))]
public string Run(int n, StringBuilder output)
{
var k = BinarySearch(n);
var (p, q) = SumTerms(0, k - 1);
p += q;
var a = BigInteger.Pow(new BigInteger(10), n - 1);
var answer = p * a / q;
return answer.ToString();
}
Gives us: | Method | Job | Runtime | n | output | Mean | Error | StdDev | Ratio | RatioSD | Code Size | Allocated | Alloc Ratio |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Run | .NET 8.0 | .NET 8.0 | 80000 | 272.4 ms | 4.73 ms | 9.46 ms | 1.00 | 0.00 | 10,090 B | 2.47 MB | 1.00 | ||
Run | .NET 9.0 | .NET 9.0 | 80000 | 312.7 ms | 6.24 ms | 6.13 ms | 1.13 | 0.05 | 4,707 B | 2.28 MB | 0.92 |
While:
[ArgumentsSource(nameof(RunArgs))]
public BigInteger Run(int n, StringBuilder output)
{
var k = BinarySearch(n);
var (p, q) = SumTerms(0, k - 1);
p += q;
var a = BigInteger.Pow(new BigInteger(10), n - 1);
return p * a / q;
}
Gives us: | Method | Job | Runtime | n | output | Mean | Error | StdDev | Ratio | RatioSD | Code Size | Allocated | Alloc Ratio |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Run | .NET 8.0 | .NET 8.0 | 80000 | 233.2 ms | 3.85 ms | 3.60 ms | 1.00 | 0.00 | 4,627 B | 2.13 MB | 1.00 | ||
Run | .NET 9.0 | .NET 9.0 | 80000 | 205.6 ms | 1.20 ms | 1.93 ms | 0.88 | 0.02 | 9,144 B | 2.13 MB | 1.00 |
We can then break apart the last few bits as well, to find out the following
BinarySearch(n)
Method | Job | Runtime | n | output | Mean | Error | StdDev | Ratio | RatioSD | Code Size | Allocated | Alloc Ratio |
---|---|---|---|---|---|---|---|---|---|---|---|---|
Run | .NET 8.0 | .NET 8.0 | 80000 | 125.1 ns | 1.79 ns | 1.67 ns | 1.00 | 0.00 | 254 B | - | NA | |
Run | .NET 9.0 | .NET 9.0 | 80000 | 125.6 ns | 1.21 ns | 1.13 ns | 1.00 | 0.02 | 244 B | - | NA |
SumTerms(0, k - 1)
Method | Job | Runtime | k | output | Mean | Error | StdDev | Ratio | Code Size | Gen0 | Gen1 | Allocated | Alloc Ratio |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Run | .NET 8.0 | .NET 8.0 | 20631 | 14.09 ms | 0.230 ms | 0.215 ms | 1.00 | 1,010 B | 109.3750 | 15.6250 | 1.97 MB | 1.00 | |
Run | .NET 9.0 | .NET 9.0 | 20631 | 10.01 ms | 0.092 ms | 0.086 ms | 0.71 | 995 B | 109.3750 | 15.6250 | 1.97 MB | 1.00 |
p += q
Method | Job | Runtime | p | q | output | Mean | Error | StdDev | Ratio | RatioSD | Code Size | Gen0 | Gen1 | Allocated | Alloc Ratio |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Run | .NET 8.0 | .NET 8.0 | 260(...)701 [80052] | 151(...)000 [80052] | 5.631 us | 0.0991 us | 0.0927 us | 1.00 | 0.00 | 310 B | 1.9836 | 0.2441 | 32.49 KB | 1.00 | |
Run | .NET 9.0 | .NET 9.0 | 260(...)701 [80052] | 151(...)000 [80052] | 5.518 us | 0.0532 us | 0.0472 us | 0.98 | 0.02 | 306 B | 1.9836 | 0.2441 | 32.49 KB | 1.00 |
BigInteger.Pow(new BigInteger(10), n - 1)
Method | Job | Runtime | n | output | Mean | Error | StdDev | Ratio | Code Size | Allocated | Alloc Ratio |
---|---|---|---|---|---|---|---|---|---|---|---|
Run | .NET 8.0 | .NET 8.0 | 80000 | 7.274 ms | 0.1437 ms | 0.1476 ms | 1.00 | 1,771 B | 32.47 KB | 1.00 | |
Run | .NET 9.0 | .NET 9.0 | 80000 | 2.903 ms | 0.0358 ms | 0.0335 ms | 0.40 | 2,653 B | 32.47 KB | 1.00 |
var pa = p * a
Method | Job | Runtime | p | a | output | Mean | Error | StdDev | Ratio | RatioSD | Code Size | Allocated | Alloc Ratio |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Run | .NET 8.0 | .NET 8.0 | 143(...)400 [80005] | 100(...)000 [80000] | 4.594 ms | 0.0898 ms | 0.0840 ms | 1.00 | 0.00 | 2,374 B | 64.91 KB | 1.00 | |
Run | .NET 9.0 | .NET 9.0 | 143(...)400 [80005] | 100(...)000 [80000] | 4.607 ms | 0.0586 ms | 0.0548 ms | 1.00 | 0.02 | 3,575 B | 64.91 KB | 1.00 |
return pa / q
Method | Job | Runtime | pa | q | output | Mean | Error | StdDev | Ratio | RatioSD | Code Size | Allocated | Alloc Ratio |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Run | .NET 8.0 | .NET 8.0 | 143(...)000 [160004] | 527(...)000 [80004] | 184.8 ms | 1.62 ms | 2.88 ms | 1.00 | 0.00 | 1,616 B | 32.71 KB | 1.00 | |
Run | .NET 9.0 | .NET 9.0 | 143(...)000 [160004] | 527(...)000 [80004] | 192.0 ms | 2.70 ms | 3.88 ms | 1.04 | 0.02 | 2,565 B | 32.6 KB | 1.00 |
So this gives us a breakdown where most of the time is being spent (Division
and ToString
). There's some optimization opportunities in other areas too, but they don't represent the bulk of the work.
URL(s)
https://github.com/dotnet/core/tree/main
Description
Almost last place. Concering all versions Shame that compilable language is slower than Python Regius
https://programming-language-benchmarks.vercel.app/problem/edigits few others benchmark not impressive either.. https://programming-language-benchmarks.vercel.app/problem/pidigits