Open sebastienros opened 6 years ago
@stephentoub
Trying to find our perf tests for these..
The tests are under Tests.System.Tests...
@sebastienros do you have evidence that this ratio is any different to 2.0? From the benchmarks we have, I do see that string comparison on Linux is significantly slower than Windows, but much faster than 2.0, and the ratio has improved.
[edit: that's comparing Linux with Windows for the same comparer, which was not what you flagged. Nevertheless, the question remains has this changed since 2.0?]
I could find some culture tests on benchview, but none for string comparison. I ran the same application on .NET 2.0 too, and I can see some regression on an ASP.NET app so maybe a micro-benchmark would show different results.
Description | RPS - 2.0 | RPS - 2.1 | Delta |
---|---|---|---|
Linux - CompareTo | 167931 | 147252 | -12% |
Linux - CompareOrdinal | 319786 | 317025 | -1% |
Windows - CompareTo | 371181 | 293785 | -21% |
Windows - CompareOrdinal | 471728 | 364015 | -23% |
So you are right that the gap is less important on 2.1 from 2.0, but not for the good reason.
@sebastienros the link above show's CoreFX perf results. Of course the tests may be poor (they are certainly too few iterations) but they show improvements over 2.0 for Linux.
Can you repeat your benchmark, without ASP.NET in the picture -- just a console app?
Ideally with Benchmark.NET
With BenchmarkDotNet we can also see regressions on CompareTo
but not on CompareOrdinal
.
BenchmarkDotNet=v0.10.14, OS=debian 8
Intel Xeon CPU E5-1650 v3 3.50GHz, 1 CPU, 12 logical and 6 physical cores
.NET Core SDK=2.1.300-rc1-008673
[Host] : .NET Core 2.0.6 (CoreCLR 4.6.0.0, CoreFX 4.6.26212.01), 64bit RyuJIT
DefaultJob : .NET Core 2.0.6 (CoreCLR 4.6.0.0, CoreFX 4.6.26212.01), 64bit RyuJIT
Method | Mean | Error | StdDev |
--------------- |-----------:|----------:|----------:|
CompareTo | 4,320.8 ns | 59.219 ns | 55.394 ns |
CompareOrdinal | 468.5 ns | 2.515 ns | 2.229 ns |
BenchmarkDotNet=v0.10.14, OS=debian 8
Intel Xeon CPU E5-1650 v3 3.50GHz, 1 CPU, 12 logical and 6 physical cores
.NET Core SDK=2.1.300-rc1-008673
[Host] : .NET Core 2.1.0-rc1-26428-03 (CoreCLR 4.6.26428.03, CoreFX 4.6.26428.03), 64bit RyuJIT
DefaultJob : .NET Core 2.1.0-rc1-26428-03 (CoreCLR 4.6.26428.03, CoreFX 4.6.26428.03), 64bit RyuJIT
Method | Mean | Error | StdDev |
--------------- |-----------:|----------:|----------:|
CompareTo | 4,910.9 ns | 18.723 ns | 14.617 ns |
CompareOrdinal | 443.1 ns | 3.340 ns | 2.961 ns |
BenchmarkDotNet=v0.10.14, OS=Windows 10.0.14393.2189 (1607/AnniversaryUpdate/Redstone1)
Intel Xeon CPU E5-1650 v3 3.50GHz, 1 CPU, 12 logical and 6 physical cores
Frequency=3410079 Hz, Resolution=293.2483 ns, Timer=TSC
.NET Core SDK=2.1.300-rc1-008673
[Host] : .NET Core 2.0.6 (CoreCLR 4.6.26212.01, CoreFX 4.6.26212.01), 64bit RyuJIT
DefaultJob : .NET Core 2.0.6 (CoreCLR 4.6.26212.01, CoreFX 4.6.26212.01), 64bit RyuJIT
Method | Mean | Error | StdDev |
--------------- |-----------:|-----------:|-----------:|
CompareTo | 3,277.2 ns | 12.3949 ns | 11.5942 ns |
CompareOrdinal | 467.8 ns | 0.5474 ns | 0.5120 ns |
BenchmarkDotNet=v0.10.14, OS=Windows 10.0.14393.2189 (1607/AnniversaryUpdate/Redstone1)
Intel Xeon CPU E5-1650 v3 3.50GHz, 1 CPU, 12 logical and 6 physical cores
Frequency=3410079 Hz, Resolution=293.2483 ns, Timer=TSC
.NET Core SDK=2.1.300-rc1-008673
[Host] : .NET Core 2.1.0-rc1-26428-03 (CoreCLR 4.6.26428.03, CoreFX 4.6.26428.03), 64bit RyuJIT
DefaultJob : .NET Core 2.1.0-rc1-26428-03 (CoreCLR 4.6.26428.03, CoreFX 4.6.26428.03), 64bit RyuJIT
Method | Mean | Error | StdDev |
--------------- |-----------:|----------:|----------:|
CompareTo | 3,495.5 ns | 1.2315 ns | 1.1520 ns |
CompareOrdinal | 402.1 ns | 0.2601 ns | 0.2433 ns |
Can you please share the benchmark?
https://github.com/sebastienros/stringbenchmarks branch console
, if you want to adapt it to measure other things (it's doing a Sort for instance) then I can re-run everything on all environments easily, that might save you some time.
Snap, I pushed a change to remove the Sort just before your comment. One commit before in the 'console' branch then.
@sebastienros, thanks for sharing.
First I'm surprised by some of the absolute values your benchmark shows. That's saying it took almost 5 microseconds to do that comparison on Linux? That must be a very slow machine, or something else is going on, as it's at least an order of magnitude more than I'd expect. I just tried your [Benchmarks]:
[Benchmark]
public int CompareTo() => Fortune1.CompareTo(Fortune2);
[Benchmark]
public int CompareOrdinal() => String.CompareOrdinal(Fortune1, Fortune2);
private const string Fortune1 = "fortune: No such file or directory";
private const string Fortune2 = "A computer scientist is someone who fixes things that aren''t broken.";
by plugging them into the harness I described in https://blogs.msdn.microsoft.com/dotnet/2018/04/18/performance-improvements-in-net-core-2-1/, and I get these results on my Ubuntu 16.04 VM:
Method | Toolchain | Mean | Allocated |
--------------- |-------------- |-----------:|----------:|
CompareTo | .NET Core 2.0 | 157.709 ns | 0 B |
CompareTo | .NET Core 2.1 | 175.951 ns | 0 B |
CompareOrdinal | .NET Core 2.0 | 3.876 ns | 0 B |
CompareOrdinal | .NET Core 2.1 | 3.123 ns | 0 B |
and this on my Windows 10 machine:
Method | Toolchain | Mean | Allocated |
--------------- |-------------- |----------:|----------:|
CompareTo | .NET Core 2.0 | 70.046 ns | 0 B |
CompareTo | .NET Core 2.1 | 76.217 ns | 0 B |
CompareOrdinal | .NET Core 2.0 | 2.403 ns | 0 B |
CompareOrdinal | .NET Core 2.1 | 2.108 ns | 0 B |
so significantly smaller numbers in magnitude than what you got.
Second, extrapolating from a single micro-benchmark can be misleading. The particular micro-benchmark you've chosen has the strings entirely different, which means it's really just testing the overhead involved in setting up the comparison, that'll end up failing on the very first character examined. For culture-based comparisons, there is a tiny bit more overhead there in 2.1 due to spans being used internally, converting from strings to spans, etc. Once the comparison gets going, though, 2.1's implementation is better, e.g. try making the beginning of your two strings equal, and you should see 2.1 outshine 2.0. For example, I just changed the above to the following that has some differences in the middle of the short strings being compared:
private const string Fortune1 = "fortune: No such file or directory";
private const string Fortune2 = "fortune: No such file is directory";
On Linux I got:
Method | Toolchain | Mean | Allocated |
--------------- |-------------- |-----------:|----------:|
CompareTo | .NET Core 2.0 | 197.154 ns | 0 B |
CompareTo | .NET Core 2.1 | 181.274 ns | 0 B |
CompareOrdinal | .NET Core 2.0 | 10.242 ns | 0 B |
CompareOrdinal | .NET Core 2.1 | 3.088 ns | 0 B |
and on Windows:
Method | Toolchain | Mean | Allocated |
--------------- |-------------- |-----------:|----------:|
CompareTo | .NET Core 2.0 | 109.057 ns | 0 B |
CompareTo | .NET Core 2.1 | 79.227 ns | 0 B |
CompareOrdinal | .NET Core 2.0 | 6.173 ns | 0 B |
CompareOrdinal | .NET Core 2.1 | 2.018 ns | 0 B |
showing 2.1 beating out 2.0. Then I further changed it to be:
private const string Fortune1 = "A computer scientist is someone who fixes things that aren''t broken!";
private const string Fortune2 = "A computer scientist is someone who fixes things that aren''t broken.";
so that the strings differ only by the last character and are slightly longer; on Linux I got:
Method | Toolchain | Mean | Allocated |
--------------- |-------------- |-----------:|----------:|
CompareTo | .NET Core 2.0 | 262.782 ns | 0 B |
CompareTo | .NET Core 2.1 | 178.342 ns | 0 B |
CompareOrdinal | .NET Core 2.0 | 15.097 ns | 0 B |
CompareOrdinal | .NET Core 2.1 | 3.117 ns | 0 B |
and on Windows:
Method | Toolchain | Mean | Allocated |
--------------- |-------------- |-----------:|----------:|
CompareTo | .NET Core 2.0 | 149.206 ns | 0 B |
CompareTo | .NET Core 2.1 | 85.108 ns | 0 B |
CompareOrdinal | .NET Core 2.0 | 10.366 ns | 0 B |
CompareOrdinal | .NET Core 2.1 | 1.911 ns | 0 B |
showing 2.1 being significantly better on these inputs than 2.0.
cc: @tarekgh, @jkotas
I mentioned it in my previous comment that the results I pasted are not from the HEAD commit on the repository but the commit before, it had List.Sort
call on a list of strings using the two comparers, hence the 5ms. This was just to do exactly the same thing as Fortunes is doing. Then I realized that if would make more sense to compare only a single thing so I changed it, I should have created another branch. For what it's worth I just created it under sort.
I ran the same tests as you then, but with a different outcome. That's problematic, but I can run it on a different set of machines to get numbers we can be confident with.
Compare
private const string Fortune1 = "fortune: No such file or directory";
private const string Fortune2 = "A computer scientist is someone who fixes things that aren''t broken.";
CompareSame
private const string Fortune1 = "fortune: No such file or directory";
private const string Fortune2 = "fortune: No such file is directory";
CompareSimilar
private const string Fortune1 = "A computer scientist is someone who fixes things that aren''t broken!";
private const string Fortune2 = "A computer scientist is someone who fixes things that aren''t broken.";
Linux 2.0
Method | Mean | Error | StdDev |
---------------------- |------------:|----------:|----------:|
CompareTo | 94.5751 ns | 0.4293 ns | 0.3585 ns |
CompareToSame | 1.1872 ns | 0.0187 ns | 0.0175 ns |
CompareToSimilar | 174.3142 ns | 1.0688 ns | 0.8925 ns |
CompareOrdinal | 2.5018 ns | 0.0353 ns | 0.0313 ns |
CompareOrdinalSame | 0.8260 ns | 0.0164 ns | 0.0128 ns |
CompareOrdinalSimilar | 9.8285 ns | 0.2129 ns | 0.1887 ns |
Linux 2.1
Method | Mean | Error | StdDev |
---------------------- |------------:|----------:|----------:|
CompareTo | 108.3260 ns | 0.0919 ns | 0.0717 ns |
CompareToSame | 8.5684 ns | 0.0065 ns | 0.0050 ns |
CompareToSimilar | 196.7931 ns | 3.3594 ns | 2.9781 ns |
CompareOrdinal | 2.0004 ns | 0.0293 ns | 0.0274 ns |
CompareOrdinalSame | 0.7804 ns | 0.0310 ns | 0.0290 ns |
CompareOrdinalSimilar | 9.7766 ns | 0.0813 ns | 0.0721 ns |
Windows 2.0
Method | Mean | Error | StdDev |
---------------------- |-----------:|----------:|----------:|
CompareTo | 71.282 ns | 0.0295 ns | 0.0262 ns |
CompareToSame | 1.379 ns | 0.0104 ns | 0.0097 ns |
CompareToSimilar | 142.187 ns | 0.0927 ns | 0.0867 ns |
CompareOrdinal | 2.623 ns | 0.0052 ns | 0.0048 ns |
CompareOrdinalSame | 1.096 ns | 0.0025 ns | 0.0022 ns |
CompareOrdinalSimilar | 8.868 ns | 0.0019 ns | 0.0017 ns |
Windows 2.1
Method | Mean | Error | StdDev |
---------------------- |------------:|----------:|----------:|
CompareTo | 85.3825 ns | 0.0098 ns | 0.0082 ns |
CompareToSame | 8.4219 ns | 0.0150 ns | 0.0133 ns |
CompareToSimilar | 143.2853 ns | 0.0322 ns | 0.0301 ns |
CompareOrdinal | 2.0223 ns | 0.0067 ns | 0.0059 ns |
CompareOrdinalSame | 0.7535 ns | 0.0020 ns | 0.0018 ns |
CompareOrdinalSimilar | 10.8721 ns | 0.0816 ns | 0.0724 ns |
Note that I am running Linux and Windows on two identical physical machines, without a VM (docker in the case of Linux) so the comparisons between Linux and Windows are fair.
Sample BenchmarkDotNet framework summary, to show the framework versions are correct:
BenchmarkDotNet=v0.10.14, OS=debian 8
Intel Xeon CPU E5-1650 v3 3.50GHz, 1 CPU, 12 logical and 6 physical cores
.NET Core SDK=2.1.300-rc1-008673
[Host] : .NET Core 2.0.6 (CoreCLR 4.6.0.0, CoreFX 4.6.26212.01), 64bit RyuJIT
DefaultJob : .NET Core 2.0.6 (CoreCLR 4.6.0.0, CoreFX 4.6.26212.01), 64bit RyuJIT
and
BenchmarkDotNet=v0.10.14, OS=debian 8
Intel Xeon CPU E5-1650 v3 3.50GHz, 1 CPU, 12 logical and 6 physical cores
.NET Core SDK=2.1.300-rc1-008673
[Host] : .NET Core 2.1.0-rc1-26428-03 (CoreCLR 4.6.26428.03, CoreFX 4.6.26428.03), 64bit RyuJIT
DefaultJob : .NET Core 2.1.0-rc1-26428-03 (CoreCLR 4.6.26428.03, CoreFX 4.6.26428.03), 64bit RyuJIT
Also Windows is faster in the noise margin even on my results so I assume we can forget about it and focus on the Linux case. Not quite for CompareTo
I will update this thread with results from Azure VMs to exclude any environment specificity.
More data:
Linux - Azure - 2.0
BenchmarkDotNet=v0.10.14, OS=debian 8
Intel Xeon CPU E5-2673 v3 2.40GHz, 1 CPU, 4 logical and 4 physical cores
.NET Core SDK=2.1.300-rc1-008673
[Host] : .NET Core 2.0.6 (CoreCLR 4.6.0.0, CoreFX 4.6.26212.01), 64bit RyuJIT
DefaultJob : .NET Core 2.0.6 (CoreCLR 4.6.0.0, CoreFX 4.6.26212.01), 64bit RyuJIT
Method | Mean | Error | StdDev |
---------------------- |-----------:|----------:|-----------:|
CompareTo | 132.176 ns | 2.7644 ns | 8.1510 ns |
CompareToSame | 1.637 ns | 0.0748 ns | 0.1931 ns |
CompareToSimilar | 237.060 ns | 5.2528 ns | 15.4879 ns |
CompareOrdinal | 3.508 ns | 0.1128 ns | 0.3182 ns |
CompareOrdinalSame | 1.204 ns | 0.0671 ns | 0.1732 ns |
CompareOrdinalSimilar | 14.108 ns | 0.4070 ns | 1.1999 ns |
Linux - Azure - 2.1
BenchmarkDotNet=v0.10.14, OS=debian 8
Intel Xeon CPU E5-2673 v3 2.40GHz, 1 CPU, 4 logical and 4 physical cores
.NET Core SDK=2.1.300-rc1-008673
[Host] : .NET Core 2.1.0-rc1-26428-03 (CoreCLR 4.6.26428.03, CoreFX 4.6.26428.03), 64bit RyuJIT
DefaultJob : .NET Core 2.1.0-rc1-26428-03 (CoreCLR 4.6.26428.03, CoreFX 4.6.26428.03), 64bit RyuJIT
Method | Mean | Error | StdDev |
---------------------- |-----------:|----------:|-----------:|
CompareTo | 157.820 ns | 3.4567 ns | 10.1922 ns |
CompareToSame | 11.910 ns | 0.2834 ns | 0.6101 ns |
CompareToSimilar | 281.970 ns | 6.2934 ns | 18.5564 ns |
CompareOrdinal | 2.708 ns | 0.1058 ns | 0.3119 ns |
CompareOrdinalSame | 1.127 ns | 0.0656 ns | 0.1622 ns |
CompareOrdinalSimilar | 14.563 ns | 0.4333 ns | 1.2776 ns |
Linux - "Citrine (same hardware as TechEmpower)" - 2.0
BenchmarkDotNet=v0.10.14, OS=debian 8
Intel Xeon Gold 5120 CPU 2.20GHz, 1 CPU, 28 logical and 14 physical cores
.NET Core SDK=2.1.300-rc1-008673
[Host] : .NET Core 2.0.6 (CoreCLR 4.6.0.0, CoreFX 4.6.26212.01), 64bit RyuJIT
DefaultJob : .NET Core 2.0.6 (CoreCLR 4.6.0.0, CoreFX 4.6.26212.01), 64bit RyuJIT
Method | Mean | Error | StdDev |
---------------------- |-----------:|----------:|----------:|
CompareTo | 133.469 ns | 0.0141 ns | 0.0132 ns |
CompareToSame | 1.577 ns | 0.0013 ns | 0.0011 ns |
CompareToSimilar | 234.991 ns | 0.0242 ns | 0.0202 ns |
CompareOrdinal | 3.029 ns | 0.0115 ns | 0.0096 ns |
CompareOrdinalSame | 1.248 ns | 0.0040 ns | 0.0033 ns |
CompareOrdinalSimilar | 13.876 ns | 0.1108 ns | 0.0925 ns |
Linux - "Citrine (same hardware as TechEmpower)" - 2.1
BenchmarkDotNet=v0.10.14, OS=debian 8
Intel Xeon Gold 5120 CPU 2.20GHz, 1 CPU, 28 logical and 14 physical cores
.NET Core SDK=2.1.300-rc1-008673
[Host] : .NET Core 2.1.0-rc1-26428-03 (CoreCLR 4.6.26428.03, CoreFX 4.6.26428.03), 64bit RyuJIT
DefaultJob : .NET Core 2.1.0-rc1-26428-03 (CoreCLR 4.6.26428.03, CoreFX 4.6.26428.03), 64bit RyuJIT
Method | Mean | Error | StdDev |
---------------------- |------------:|----------:|----------:|
CompareTo | 158.5255 ns | 0.2813 ns | 0.2493 ns |
CompareToSame | 12.5199 ns | 0.0007 ns | 0.0006 ns |
CompareToSimilar | 278.3703 ns | 0.0449 ns | 0.0375 ns |
CompareOrdinal | 2.5301 ns | 0.0141 ns | 0.0125 ns |
CompareOrdinalSame | 0.7521 ns | 0.0016 ns | 0.0013 ns |
CompareOrdinalSimilar | 13.9922 ns | 0.0529 ns | 0.0495 ns |
Interestingly the Citrine machines which are much powerful have the same results as the Azure VMs we are using. Note that the Citrine servers don't have Page Table Isolation disabled (Meltdown security vulnerability).
The CompareToSame lines appear to be highlighting something else. Note that what you wrote above for the "CompareSame" case is not actually what your benchmark is testing: for "CompareSame" you copied what I had, which was almost the same text but swapping the word "or" for "is" so that there was a difference in the middle. But your benchmark CompareToSame is actually doing what its name says and is comparing not only identical strings, but identical references. As such, it should be hitting the same fast path in both 2.0: https://github.com/dotnet/coreclr/blob/19b74c1ea20102b4882a7e034e0ba8cd2ab88b82/src/mscorlib/src/System/String.Comparison.cs#L387-L396 and 2.1: https://github.com/dotnet/coreclr/blob/ad0f22cd2f011e7112588415b13d615238b5acb4/src/mscorlib/shared/System/String.Comparison.cs#L270-L274 That there's a difference of ~2ns vs ~13ns is strange. That said, it's a fix difference: this test only measures the case where the strings are the same reference.
Correct, I didn't see this difference, I will add it. I won't add the results here unless you want to, I think I already drowned this thread with too much data already.
@sebastienros
@adamsitnik has merged just a fix which likely fix this one too. could you try it with your scenario and look if you see any improvement?
Ok, I have forked the example provided by @sebastienros in https://github.com/dotnet/corefx/issues/37691 and extended it with the benchmarks provided here (I could not use https://github.com/sebastienros/stringbenchmarks/blob/master/Startup.cs because it targets 2.0):
sample command:
dotnet run -- --server http://asp-perf-lin:5001 --client http://asp-perf-load:5002 --repository https://github.com/adamsitnik/invariantcultureperf --project-file InvariantCulture.csproj --path /api/values/CompareOrdinal --warmup 1 --duration 5 --runtime 3.0.0-*
The results are RPS for asp-perf-lin and asp-perf-win machines:
Path | Windows | Linux with 2 fixes | Linux with 3 fixes |
---|---|---|---|
CompareTo | 166 | 144 | 188 |
CompareOrdinal | 183 | 189 | 192 |
CompareOrdinalIgnoreCase | 171 | 188 | 208 |
The results are RPS for the Citrix machines:
Path | Windows | Linux with 2 fixes | Linux with 3 fixes |
---|---|---|---|
CompareTo | 516 | 105 | 377 |
CompareOrdinal | 580 | 390 | 390 |
CompareOrdinalIgnoreCase | 575 | 400 | 400 |
I am going to take a look at the traces from Citrix machines
Citrine (not citrix)
I have run the StringComparer benchmarks from the performance repo using latest CoreCLR bits with my 3 fixes. (https://github.com/dotnet/performance/blob/master/src/benchmarks/micro/corefx/System.Runtime/Perf.StringComparer.cs)
OS=Windows 10.0.17763.107 (1809/October2018Update/Redstone5) OS=debian 10
Intel Xeon CPU E5-1650 v3 3.50GHz, 1 CPU, 12 logical and 6 physical cores .NET Core SDK=3.0.100-preview7-012507
Method | Count | Comparison | Mean Windows | Mean Debian | Ratio |
---|---|---|---|---|---|
CompareSame | 128 | CurrentCulture | 268.61 ns | 127.63 ns | 2,10 |
CompareSame | 128 | CurrentCultureIgnoreCase | 265.39 ns | 127.87 ns | 2,08 |
CompareSame | 128 | InvariantCulture | 266.37 ns | 127.47 ns | 2,09 |
CompareSame | 128 | InvariantCultureIgnoreCase | 264.94 ns | 127.91 ns | 2,07 |
CompareSame | 128 | Ordinal | 15.50 ns | 15.04 ns | 1,03 |
CompareSame | 128 | OrdinalIgnoreCase | 53.60 ns | 242.72 ns | 0,22 |
CompareSame | 262144 | CurrentCulture | 428,407.75 ns | 140,836.14 ns | 3,04 |
CompareSame | 262144 | CurrentCultureIgnoreCase | 427,232.45 ns | 140,973.42 ns | 3,03 |
CompareSame | 262144 | InvariantCulture | 427,156.27 ns | 139,343.72 ns | 3,07 |
CompareSame | 262144 | InvariantCultureIgnoreCase | 425,779.88 ns | 147,502.32 ns | 2,89 |
CompareSame | 262144 | Ordinal | 36,734.94 ns | 33,936.02 ns | 1,08 |
CompareSame | 262144 | OrdinalIgnoreCase | 89,270.31 ns | 420,274.81 ns | 0,21 |
Linux is on par for Ordinal
, two to three times faster for CurrentCulture
, CurrentCultureIgnoreCase
, InvariantCulture
, InvariantCultureIgnoreCase
and five times slower for OrdinalIgnoreCase
.
I am going to do some research and remove the gap for OrdinalIgnoreCase
.
The PR https://github.com/dotnet/runtime/pull/40910 is addressing the ordinal cases.
After noticing a very important impact on string comparison algorithms while sorting a list of business objects, I decided to run a benchmark to analyze the differences between Linux and Windows. The code is here: https://github.com/sebastienros/stringbenchmarks/blob/master/Startup.cs
Result:
CompareTo is expected to be slower that CompareOrdinal and I am not questioning that, but on Linux the ratio is 46% while on Windows it's 86%. This could have a significant impact on ASP.NET that uses it extensively. In the TechEmpower Fortunes scenario, on our 12 Cores machine we noticed using a performance by a factor of 3 while sorting the results using ordinal comparison (70K RPS to 216K RPS), so the impact seems to be even bigger than these micro benchmark differences.