Closed Tornhoof closed 4 years ago
Thanks. I've not yet reviewed your benchmark, but in general a lot of attention was paid to the performance of the new parsers/formatters, and not all of that work was ported back to the original parsers/formatters... but should be.
@joshfree, @ahsonkhan, it'd be great if the remaining work there could be catalogued and either done or issues opened so that others can tackle it.
Here is the list of all UTF8 formaters and parsers. We should go through each of them and port any applicable optimizations to the CoreLib utf16 ones:
Formatters:
Parsers:
Misc:
To add some support for this, in my test benchmarks I found it to be twice as fast to copy ASCII bytes from a string
to a byte array and then call Utf8Parser.TryParse
instead of just calling int.TryParse
directly on the strings.
To add some support for this, in my test benchmarks I found it to be twice as fast to copy ASCII bytes from a string to a byte array and then call Utf8Parser.TryParse instead of just calling int.TryParse directly on the strings.
Can you share those benchmarks? What inputs? What version of .NET Core? (There's definitely significant room for improvement; seeing your benchmarks will help whoever works on improving it.)
Thanks.
Oops, I left out the link: https://gist.github.com/Zhentar/07b92a52c619641ab61aab50b1e5ec91
Timespan. For Format 'c' of Timespan the performance of Utf16 is pretty much equal to Utf8, for the other two formats it's quite different.
Those other formats are culture-sensitive with TimeSpan.ToString/TryFormat, but Utf8Formatter ignores culture; while I'm sure there are improvements that can/should be made to TimeSpan.ToString/TryFormat for those formats, it needs to continue to respect the current culture, which incurs cost.
@pentp, is there anything relevant from the decimal Utf8Parser/Formatter support to port over to coreclr, or should we check those of as well?
I don't think there's anything special about decimal in Utf8Parser
, it uses the general TryParseNumber
method which is in Utf8Parser.Number.cs.
The Utf8Formatter
part is more involved, I don't know if it's faster or not, so probably needs some investigation at least.
Ok, thanks.
Will not make 3.0
Will not make 3.0
Most of them did. I'd be ok closing this at this point.
I noticed that there is a large difference between the Parser Performance in Utf8Parser and the Utf16 parser for
Guid
andTimespan
. Code is in the details at the end.Timespan
For Format 'c' of Timespan the performance of Utf16 is pretty much equal to Utf8, for the other two formats it's quite different.
Guid
Guid suffers from similar problems, looking into Guid.cs, I see that the checks for invalid symbols is fairly inefficient:
https://github.com/dotnet/corefx/blob/39e96cd8e3b97f8b1a5fce86211cf8fec7ea478a/src/Common/src/CoreLib/System/Guid.cs#L430
https://github.com/dotnet/corefx/blob/39e96cd8e3b97f8b1a5fce86211cf8fec7ea478a/src/Common/src/CoreLib/System/Guid.cs#L452
https://github.com/dotnet/corefx/blob/39e96cd8e3b97f8b1a5fce86211cf8fec7ea478a/src/Common/src/CoreLib/System/Guid.cs#L474
I guess that's visible in the benchmark below too, but it does not appear to be the bulk of the difference (as 'B' and 'P' are still twice as slow as the UTF8 version).
If I didn't mess up the benchmarks too much (again, both @ahsonkhan and @stephentoub fixed my bad benchmark last time), it might be useful to port the utf8 code to the utf16 code base.