Simple number parsing performance (NumberStyles.None)

discostu105 commented 5 years ago

Triggered by this SO question, I've tried to parse numbers (longs) as fast as possible. Assume these numbers are ID's (always positive, decimal, no fancy characters) I was puzzled to see that long.TryParse performs far worse than a hand-rolled implementation like this:

public static long LongParseFast(ReadOnlySpan<char> value)
{
    long result = 0;
    for (int i = 0; i < value.Length; i++)
    {
        result = 10 * result + (value[i] - 48);
    }
    return result;
}

Results:


BenchmarkDotNet=v0.11.4, OS=Windows 10.0.17763.253 (1809/October2018Update/Redstone5)
Intel Core i9-8950HK CPU 2.90GHz, 1 CPU, 12 logical and 6 physical cores
.NET Core SDK=3.0.100-preview-009812
  [Host]     : .NET Core 3.0.0-preview-27122-01 (CoreCLR 4.6.27121.03, CoreFX 4.7.18.57103), 64bit RyuJIT
  DefaultJob : .NET Core 3.0.0-preview-27122-01 (CoreCLR 4.6.27121.03, CoreFX 4.7.18.57103), 64bit RyuJIT

Method	Mean	Error	StdDev	Median
Long_Parse	42.535 ns	0.8830 ns	1.6585 ns	43.389 ns
Long_TryParse	39.385 ns	0.4472 ns	0.3734 ns	39.529 ns
Long_TryParse_NumberStyles_None	27.609 ns	0.2951 ns	0.2464 ns	27.633 ns
Convert_ToInt64	49.489 ns	1.0129 ns	1.5468 ns	49.931 ns
LongParseFast	9.071 ns	0.2158 ns	0.4505 ns	9.203 ns

Full benchmark code: https://gist.github.com/discostu105/4d208e20619295bddf98afdbe8543ae0

I've found that Number Parsing in .NET deals with a lot of stuff (currency symbols, whitespaces, parantheses, exponents, etc...). https://github.com/dotnet/corefx/blob/5710b6d09441a0a2d3cb9778ae927da14b5087cd/src/Common/src/CoreLib/System/Number.Parsing.cs#L245

But in this case, I need none of that stuff. I have plain ID's (e.g. "1234567890").

Anyway I thought that when specifying NumberStyles.None and CultureInfo.InvariantCulture, I should come pretty close to my hand-rolled method performance.

long.TryParse(str, NumberStyles.None, CultureInfo.InvariantCulture, out result);

But there is still a 5X difference.

Is there potential for improvement? Maybe a fast-path for NumberStyles.None?

discostu105 commented 5 years ago

As a side-note, I see that there has been a big improvement over .NET Core 2.2 already (yay!). Same benchmark in 2.2:


BenchmarkDotNet=v0.11.4, OS=Windows 10.0.17763.253 (1809/October2018Update/Redstone5)
Intel Core i9-8950HK CPU 2.90GHz, 1 CPU, 12 logical and 6 physical cores
.NET Core SDK=3.0.100-preview-009812
  [Host]     : .NET Core 2.2.1 (CoreCLR 4.6.27207.03, CoreFX 4.6.27207.03), 64bit RyuJIT
  DefaultJob : .NET Core 2.2.1 (CoreCLR 4.6.27207.03, CoreFX 4.6.27207.03), 64bit RyuJIT

Method	Mean	Error	StdDev	Median
Long_Parse	90.583 ns	1.8210 ns	3.9586 ns	91.002 ns
Long_TryParse	93.792 ns	1.7241 ns	1.5284 ns	93.677 ns
Long_TryParse_NumberStyles_None	80.889 ns	1.6359 ns	4.1042 ns	82.686 ns
Convert_ToInt64	98.078 ns	1.8924 ns	2.1793 ns	99.171 ns
LongParseFast	8.910 ns	0.1046 ns	0.0927 ns	8.882 ns

Still. Can we do better? :) Especially in the NumberStyles.None case?

vcsjones commented 5 years ago

Might be worth looking at the discussions in dotnet/runtime#19208

adamsitnik commented 5 years ago

As I thought, the optimizations brought by @tannergooding to .NET Core 3.0 are already visible in our benchmarks.

git clone https://github.com/dotnet/performance.git
dotnet run -f netcoreapp2.1 -p .\performance\src\benchmarks\micro\MicroBenchmarks.csproj \
  --filter System.Tests.Perf_Int64.Parse \
  --runtimes netcoreapp2.1 netcoreapp2.2 netcoreapp3.0

Method	Toolchain	value	Mean
Parse	netcoreapp2.1	-9223372036854775808	139.16 ns
Parse	netcoreapp2.2	-9223372036854775808	137.49 ns
Parse	netcoreapp3.0	-9223372036854775808	52.50 ns

Parse	netcoreapp2.1	12345	79.16 ns
Parse	netcoreapp2.2	12345	81.39 ns
Parse	netcoreapp3.0	12345	39.36 ns

Parse	netcoreapp2.1	9223372036854775807	150.60 ns
Parse	netcoreapp2.2	9223372036854775807	137.71 ns
Parse	netcoreapp3.0	9223372036854775807	50.35 ns

@tannergooding do you think that there is still a place for improvement?

tannergooding commented 5 years ago

Yes, I think there is still room for improvement. However, I'm not sure we can expose those improvements directly via mechanisms like long.Parse.

The reasoning is that the default parsing logic needs to be able to handle various cultures, styles, etc. We are also culture aware by default. This is not very efficient when it comes to wanting a "fast path" that just spits out invariant culture data as fast as possible.

We might be able to get some gains, but it will never quite be there as you will always have a check on if ((formatProvide == CultureInfo.InvariantCulture) && (numberStyles == numberStyles.None)). There are also other numberStyles cases (like hex) that we can and sometimes do fast path, and then you have more checks to get to those fast-paths as well (each further check generally being a mispredicted branch).

That being said, currently all of our parsing/formatting code is fairly centralized in the internal Number class. This class is shared between our UTF8 and UTF16 parsers (where applicable) and contains all the fast and slow path code. It might be worth considering if we can "productize" these APIs and make them available for public use as this would allow users to directly call something like NumberFormatter.ParseInt64Invariant(), bypassing the surrounding checks and just getting the direct "fast-path" (while still allowing the default case to be culture and format aware).

dotnet / runtime

Simple number parsing performance (NumberStyles.None) #28885