dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
14.98k stars 4.66k forks source link

Port performance improvements to UTF16 parsers and formatters #26586

Closed Tornhoof closed 4 years ago

Tornhoof commented 6 years ago

I noticed that there is a large difference between the Parser Performance in Utf8Parser and the Utf16 parser for Guid and Timespan. Code is in the details at the end.

Timespan

For Format 'c' of Timespan the performance of Utf16 is pretty much equal to Utf8, for the other two formats it's quite different.

Guid

Guid suffers from similar problems, looking into Guid.cs, I see that the checks for invalid symbols is fairly inefficient:

https://github.com/dotnet/corefx/blob/39e96cd8e3b97f8b1a5fce86211cf8fec7ea478a/src/Common/src/CoreLib/System/Guid.cs#L430

https://github.com/dotnet/corefx/blob/39e96cd8e3b97f8b1a5fce86211cf8fec7ea478a/src/Common/src/CoreLib/System/Guid.cs#L452

https://github.com/dotnet/corefx/blob/39e96cd8e3b97f8b1a5fce86211cf8fec7ea478a/src/Common/src/CoreLib/System/Guid.cs#L474

I guess that's visible in the benchmark below too, but it does not appear to be the bulk of the difference (as 'B' and 'P' are still twice as slow as the UTF8 version).

If I didn't mess up the benchmarks too much (again, both @ahsonkhan and @stephentoub fixed my bad benchmark last time), it might be useful to port the utf8 code to the utf16 code base.


BenchmarkDotNet=v0.10.14, OS=Windows 10.0.17134
Intel Core i7-4790K CPU 4.00GHz (Haswell), 1 CPU, 8 logical and 4 physical cores
Frequency=3906246 Hz, Resolution=256.0003 ns, Timer=TSC
.NET Core SDK=2.1.300
  [Host]     : .NET Core 2.1.0 (CoreCLR 4.6.26515.07, CoreFX 4.6.26515.06), 64bit RyuJIT
  DefaultJob : .NET Core 2.1.0 (CoreCLR 4.6.26515.07, CoreFX 4.6.26515.06), 64bit RyuJIT
Method Mean Error StdDev
UTF16_Guid_TryParse_Format_D 215.37 ns 1.5212 ns 1.2702 ns
UTF8_Guid_TryParse_Format_D 86.62 ns 0.3426 ns 0.3205 ns
UTF16_Guid_TryParse_Format_N 281.59 ns 4.3998 ns 4.1156 ns
UTF8_Guid_TryParse_Format_N 69.79 ns 0.1989 ns 0.1763 ns
UTF16_Guid_TryParse_Format_B 202.41 ns 1.8355 ns 1.6271 ns
UTF8_Guid_TryParse_Format_B 90.40 ns 0.1929 ns 0.1710 ns
UTF16_Guid_TryParse_Format_P 203.18 ns 0.1306 ns 0.1158 ns
UTF8_Guid_TryParse_Format_P 90.31 ns 0.4293 ns 0.4015 ns
UTF16_TimeSpan_TryParse_Format_c 113.62 ns 0.0920 ns 0.0816 ns
UTF8_TimeSpan_TryParse_Format_c 96.62 ns 0.1512 ns 0.1415 ns
UTF16_TimeSpan_TryParse_Format_G 934.01 ns 8.0425 ns 7.5230 ns
UTF8_TimeSpan_TryParse_Format_G 57.75 ns 0.0955 ns 0.0893 ns
UTF16_TimeSpan_TryParse_Format_g 923.88 ns 0.9195 ns 0.7678 ns
UTF8_TimeSpan_TryParse_Format_g 95.00 ns 0.1921 ns 0.1797 ns
```csharp public class ParserBenchmark { private static readonly Guid Guid = Guid.NewGuid(); private static readonly string GuidStringD = Guid.ToString("D"); private static readonly string GuidStringN = Guid.ToString("N"); private static readonly string GuidStringB = Guid.ToString("B"); private static readonly string GuidStringP = Guid.ToString("P"); private static readonly byte[] GuidBytesD = Encoding.UTF8.GetBytes(GuidStringD); private static readonly byte[] GuidBytesN = Encoding.UTF8.GetBytes(GuidStringN); private static readonly byte[] GuidBytesB = Encoding.UTF8.GetBytes(GuidStringB); private static readonly byte[] GuidBytesP = Encoding.UTF8.GetBytes(GuidStringP); private static readonly TimeSpan TimeSpan = TimeSpan.MinValue; private static readonly string TimeSpanStringc = TimeSpan.ToString("c", CultureInfo.InvariantCulture); private static readonly string TimeSpanStringG = TimeSpan.ToString("G", CultureInfo.InvariantCulture); private static readonly string TimeSpanStringg = TimeSpan.ToString("g", CultureInfo.InvariantCulture); private static readonly byte[] TimeSpanBytesc = Encoding.UTF8.GetBytes(TimeSpanStringc); private static readonly byte[] TimeSpanBytesG = Encoding.UTF8.GetBytes(TimeSpanStringG); private static readonly byte[] TimeSpanBytesg = Encoding.UTF8.GetBytes(TimeSpanStringg); [Benchmark] public Guid UTF16_Guid_TryParse_Format_D() { Guid.TryParseExact(GuidStringD, "D", out var result); return result; } [Benchmark] public Guid UTF8_Guid_TryParse_Format_D() { Utf8Parser.TryParse(GuidBytesD, out Guid result, out _, 'D'); return result; } [Benchmark] public Guid UTF16_Guid_TryParse_Format_N() { Guid.TryParseExact(GuidStringN, "N", out var result); return result; } [Benchmark] public Guid UTF8_Guid_TryParse_Format_N() { Utf8Parser.TryParse(GuidBytesN, out Guid result, out _, 'N'); return result; } [Benchmark] public Guid UTF16_Guid_TryParse_Format_B() { Guid.TryParseExact(GuidStringB, "B", out var result); return result; } [Benchmark] public Guid UTF8_Guid_TryParse_Format_B() { Utf8Parser.TryParse(GuidBytesB, out Guid result, out _, 'B'); return result; } [Benchmark] public Guid UTF16_Guid_TryParse_Format_P() { Guid.TryParseExact(GuidStringP, "P", out var result); return result; } [Benchmark] public Guid UTF8_Guid_TryParse_Format_P() { Utf8Parser.TryParse(GuidBytesP, out Guid result, out _, 'P'); return result; } [Benchmark] public TimeSpan UTF16_TimeSpan_TryParse_Format_c() { TimeSpan.TryParseExact(TimeSpanStringc, "c", CultureInfo.InvariantCulture, out var result); return result; } [Benchmark] public TimeSpan UTF8_TimeSpan_TryParse_Format_c() { Utf8Parser.TryParse(TimeSpanBytesc, out TimeSpan result, out _, 'c'); return result; } [Benchmark] public TimeSpan UTF16_TimeSpan_TryParse_Format_G() { TimeSpan.TryParseExact(TimeSpanStringG, "G", CultureInfo.InvariantCulture, out var result); return result; } [Benchmark] public TimeSpan UTF8_TimeSpan_TryParse_Format_G() { Utf8Parser.TryParse(TimeSpanBytesG, out TimeSpan result, out _, 'G'); return result; } [Benchmark] public TimeSpan UTF16_TimeSpan_TryParse_Format_g() { TimeSpan.TryParseExact(TimeSpanStringg, "g", CultureInfo.InvariantCulture, out var result); return result; } [Benchmark] public TimeSpan UTF8_TimeSpan_TryParse_Format_g() { Utf8Parser.TryParse(TimeSpanBytesg, out TimeSpan result, out _, 'g'); return result; } } ```
stephentoub commented 6 years ago

Thanks. I've not yet reviewed your benchmark, but in general a lot of attention was paid to the performance of the new parsers/formatters, and not all of that work was ported back to the original parsers/formatters... but should be.

@joshfree, @ahsonkhan, it'd be great if the remaining work there could be catalogued and either done or issues opened so that others can tackle it.

jkotas commented 6 years ago

Here is the list of all UTF8 formaters and parsers. We should go through each of them and port any applicable optimizations to the CoreLib utf16 ones:

Formatters:

Parsers:

Misc:

Zhentar commented 6 years ago

To add some support for this, in my test benchmarks I found it to be twice as fast to copy ASCII bytes from a string to a byte array and then call Utf8Parser.TryParse instead of just calling int.TryParse directly on the strings.

stephentoub commented 6 years ago

To add some support for this, in my test benchmarks I found it to be twice as fast to copy ASCII bytes from a string to a byte array and then call Utf8Parser.TryParse instead of just calling int.TryParse directly on the strings.

Can you share those benchmarks? What inputs? What version of .NET Core? (There's definitely significant room for improvement; seeing your benchmarks will help whoever works on improving it.)

Thanks.

Zhentar commented 6 years ago

Oops, I left out the link: https://gist.github.com/Zhentar/07b92a52c619641ab61aab50b1e5ec91

stephentoub commented 6 years ago

Timespan. For Format 'c' of Timespan the performance of Utf16 is pretty much equal to Utf8, for the other two formats it's quite different.

Those other formats are culture-sensitive with TimeSpan.ToString/TryFormat, but Utf8Formatter ignores culture; while I'm sure there are improvements that can/should be made to TimeSpan.ToString/TryFormat for those formats, it needs to continue to respect the current culture, which incurs cost.

stephentoub commented 5 years ago

@pentp, is there anything relevant from the decimal Utf8Parser/Formatter support to port over to coreclr, or should we check those of as well?

pentp commented 5 years ago

I don't think there's anything special about decimal in Utf8Parser, it uses the general TryParseNumber method which is in Utf8Parser.Number.cs.

The Utf8Formatter part is more involved, I don't know if it's faster or not, so probably needs some investigation at least.

stephentoub commented 5 years ago

Ok, thanks.

danmoseley commented 5 years ago

Will not make 3.0

stephentoub commented 5 years ago

Will not make 3.0

Most of them did. I'd be ok closing this at this point.