dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
14.63k stars 4.57k forks source link

[API Proposal]: Ascii.ToUtf16 overload that treats `\0` as invalid #80366

Closed gfoidl closed 3 days ago

gfoidl commented 1 year ago

Background and motivation

For ASP.NET Core's StringUtilities the ASCII values of the range (0x00, 0x80) are considered valid, whilst Ascii.ToUtf16 treats the whole ASCII range [0x00, 0x80) as valid. In order to base StringUtilities on the Ascii-APIs and avoid custom vectorized code in ASP.NET Core internals \0 should be allowed to be treated as invalid. See https://github.com/dotnet/aspnetcore/issues/45962 for further info.

API Proposal

namespace System.Buffers.Text
{
    public static class Ascii
    {
        // existing methods
+       public static OperationStatus ToUtf16(ReadOnlySpan<byte> source, Span<char> destination, out int bytesConsumed, out int charsWritten, bool treatNullAsInvalid = false);
    }
}

The new ASCII-APIs will get added to .NET 8, so w/o breaking change an optional argument could be added.

namespace System.Buffers.Text
{
    public static class Ascii
    {
        // existing methods
-       public static OperationStatus ToUtf16(ReadOnlySpan<byte> source, Span<char> destination, out int bytesConsumed, out int charsWritten);
+       public static OperationStatus ToUtf16(ReadOnlySpan<byte> source, Span<char> destination, out int bytesConsumed, out int charsWritten, bool treatNullAsInvalid = false);
    }
}

API Usage

    private static unsafe void GetHeaderName(ReadOnlySpan<byte> source, Span<char> buffer)
    {
        OperationStatus status = Ascii.ToUtf16(source, buffer, out _, out _, treatNullAsInvalid: true);

        if (status != OperationStatus.Done)
        {
            KestrelBadHttpRequestException.Throw(RequestRejectionReason.InvalidCharactersInHeaderName);
        }
    }

Alternative Designs

No response

Risks

The value for treatNullAsInvalid will be given as constant, so the JIT should be able to dead-code eliminate any code needed for "default case" (whole ASCII-range incl. \0), so no perf-regression should be expected.

Besides treating \0 as special value which is optinally treated as invalid I don't expect any other value to be considered special enough for optional exclusion.

dotnet-issue-labeler[bot] commented 1 year ago

I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label.

stephentoub commented 1 year ago

Does the ASP.NET code measurably regress if the newly-added ToUtf16 is used plus a call to IndexOf((byte)'\0) to validate there wasn't a null?

    private static unsafe void GetHeaderName(ReadOnlySpan<byte> source, Span<char> buffer)
    {
        OperationStatus status = Ascii.ToUtf16(source, buffer, out _, out _);
        if (status != OperationStatus.Done || source.IndexOf((byte)'\0') >= 0)
        {
            KestrelBadHttpRequestException.Throw(RequestRejectionReason.InvalidCharactersInHeaderName);
        }
    }

@GrabYourPitchforks had some fairly strong opinions about special-casing '\0'.

gfoidl commented 1 year ago

In local micro-benchmarks yes, mainly due to the $O(2n)$-nature. And of course it depends on the input (length and position of \0).

It would be more interesting to see real-usage benchmarks, like how that would impact Techempower, etc. But unfortunately I don't know how to run such benchmarks (at the moment).

fairly strong opinions about special-casing \0

I'm looking forward to read them.

ghost commented 1 year ago

Tagging subscribers to this area: @dotnet/area-system-buffers See info in area-owners.md if you want to be subscribed.

Issue Details
### Background and motivation For ASP.NET Core's `StringUtilities` the ASCII values of the range `(0x00, 0x80)` are considered valid, whilst `Ascii.ToUtf16` treats the whole ASCII range `[0x00, 0x80)` as valid. In order to base `StringUtilities` on the Ascii-APIs and avoid custom vectorized code in ASP.NET Core internals `\0` should be allowed to be treated as invalid. See https://github.com/dotnet/aspnetcore/issues/45962 for further info. ### API Proposal ```diff namespace System.Buffers.Text { public static class Ascii { // existing methods + public static OperationStatus ToUtf16(ReadOnlySpan source, Span destination, out int bytesConsumed, out int charsWritten, bool treatNullAsInvalid = false); } } ``` The new ASCII-APIs will get added to .NET 8, so w/o breaking change an optional argument could be added. ```diff namespace System.Buffers.Text { public static class Ascii { // existing methods - public static OperationStatus ToUtf16(ReadOnlySpan source, Span destination, out int bytesConsumed, out int charsWritten); + public static OperationStatus ToUtf16(ReadOnlySpan source, Span destination, out int bytesConsumed, out int charsWritten, bool treatNullAsInvalid = false); } } ``` ### API Usage ```csharp private static unsafe void GetHeaderName(ReadOnlySpan source, Span buffer) { OperationStatus status = Ascii.ToUtf16(source, buffer, out _, out _, treatNullAsInvalid: true); if (status != OperationStatus.Done) { KestrelBadHttpRequestException.Throw(RequestRejectionReason.InvalidCharactersInHeaderName); } } ``` ### Alternative Designs _No response_ ### Risks The value for `treatNullAsInvalid` will be given as constant, so the JIT should be able to dead-code eliminate any code needed for "default case" (whole ASCII-range incl. `\0`), so no perf-regression should be expected. Besides treating `\0` as special value which is optinally treated as invalid I don't expect any other value to be considered special enough for optional exclusion.
Author: gfoidl
Assignees: -
Labels: `api-suggestion`, `area-System.Buffers`, `untriaged`
Milestone: -
benaadams commented 1 year ago

Does the ASP.NET code measurably regress if the newly-added ToUtf16 is used plus a call to IndexOf((byte)'\0) to validate there wasn't a null?

    private static unsafe void GetHeaderName(ReadOnlySpan<byte> source, Span<char> buffer)
    {
        OperationStatus status = Ascii.ToUtf16(source, buffer, out _, out _);
        if (status != OperationStatus.Done || source.IndexOf((byte)'\0') >= 0)
        {
            KestrelBadHttpRequestException.Throw(RequestRejectionReason.InvalidCharactersInHeaderName);
        }
    }

@GrabYourPitchforks had some fairly strong opinions about special-casing '\0'.

99% of the time there won't be any nulls but headers average 800 bytes to 2kB with cookies; so scanning the headers an additional time to check for nulls can be significant

gfoidl commented 1 year ago

headers average 800 bytes to 2kB with cookies

Thanks for these numbers! Out of interest: how / where did you get these from?

benaadams commented 1 year ago

headers average 800 bytes to 2kB with cookies

Thanks for these numbers! Out of interest: how / where did you get these from?

Googled average headers size πŸ˜…

As an anecdote going to Google homepage logged in my headers are 2158 bytes and it makes 27 requests to that domain (26 to other domains); so in total 58kB for that page and one domain

svick commented 1 year ago

Wouldn't it be better to treat \0 as special at the highest level, instead of at the lowest level?

For example, I think that when parsing HTTP 1 headers, you could look for \r, \n or \0 as the first step, instead of just \r and \n, and deal with \0 at that point. That would then mean you could safely use the current version of Ascii.ToUtf16 to convert the header bytes to UTF-16.

tannergooding commented 1 year ago

Assigning to @GrabYourPitchforks for now until the necessary input can be given.

GrabYourPitchforks commented 1 year ago

I am strongly against this proposal. ASCII is defined as characters in the range 0x00 .. 0x7F, inclusive. Sometimes a protocol will exclude certain characters (0x00, or the entire control character range 0x00 .. 0x1F and 0x7F), but at that point you're making something tied to a particular protocol rather than something that is a general-purpose ASCII API. It's similar to the reason we don't support WTF-8 within any of our UTF-8 APIs: certain protocols may utilize it, but it doesn't belong in a general-purpose UTF-8 processing API.

Since this is protocol-specific for aspnet, I recommend the code remain in that project.

GrabYourPitchforks commented 1 year ago

It would be more interesting to see real-usage benchmarks, like how that would impact Techempower, etc.

@sebastienros Is there a way to measure real-world impact for this API? I've laid out above my arguments against doing this - namely, that protocol-level concerns don't belong in a general-purpose API. But if there is strongly compelling evidence that this is a real perf bottleneck and the runtime layer is the only layer that can provide this functionality properly, that should be weighed in favor of this API, even against my concerns.

sebastienros commented 1 year ago

I will need to check what can be impacted in ASP.NET and see what benchmarks would exercise this code path. If someone knows which scenarios are useful here then I can start it.

stephentoub commented 1 year ago

Do we know why ASP.NET special-cases \0 here? What happens if we just stop doing that? If ASP.NET needs to do that, is it likely that others will similarly need to special-case certain values?

I'd really like to be able to encapsulate this in a core library provided helper, for ASP.NET to use and for others to use. Vectorizing such a thing is very non-trivial. Is there a shape of an API we could come up with that would enable efficiently doing this, e.g. a default overload that is for [0, 127] but another overload that lets you opt-out one or more values or ranges, or some such thing?

Note that one of the primary uses of IndexOfAnyValues is for use in protocols, where protocols need to search for or exempt certain things. Could/should we incorporate that somehow?

benaadams commented 1 year ago

Do we know why ASP.NET special-cases \0 here?

Spec wise it shouldn't; however if its a front-end server that passes requests to another server; if and that server uses null terminated strings then the request can change in the second layer accessing url's which weren't mapped to the internet, which could be a security risk (along lines of https://en.wikipedia.org/wiki/HTTP_request_smuggling though different)

benaadams commented 1 year ago

Note that one of the primary uses of IndexOfAnyValues is for use in protocols, where protocols need to search for or exempt certain things. Could/should we incorporate that somehow?

As @svick says, it just needs to check for 3 rather than 2, then throw if its \0; is already an api for it

For example, I think that when parsing HTTP 1 headers, you could look for \r, \n or \0 as the first step, instead of just \r and \n, and deal with \0 at that point. That would then mean you could safely use the current version of Ascii.ToUtf16 to convert the header bytes to UTF-16.

stephentoub commented 1 year ago

As @svick says, it just needs to check for 3 rather than 2, then throw if its \0; is already an api for it

I hadn't seen @svick's comment:

For example, I think that when parsing HTTP 1 headers, you could look for \r, \n or \0 as the first step, instead of just \r and \n, and deal with \0 at that point. That would then mean you could safely use the current version of Ascii.ToUtf16 to convert the header bytes to UTF-16.

Is that viable? Can all of the places that call into this shared routine be updated trivially to ensure the data passed in doesn't contain a \0? If so, let's do that, add the AScii.ToUtf16 that's for the whole [0, 127] range, update ASP.NET to use that, and call it a good day.

stephentoub commented 1 year ago

@BrennanConroy, do you have a suggestion for how we could make forward progress on this?

BrennanConroy commented 1 year ago

For example, I think that when parsing HTTP 1 headers, you could look for \r, \n or \0 as the first step, instead of just \r and \n, and deal with \0 at that point. That would then mean you could safely use the current version of Ascii.ToUtf16 to convert the header bytes to UTF-16.

Is that viable? Can all of the places that call into this shared routine be updated trivially to ensure the data passed in doesn't contain a \0? If so, let's do that, add the AScii.ToUtf16 that's for the whole [0, 127] range, update ASP.NET to use that, and call it a good day.

Looks like it would be "easy" to do this. If we updated https://github.com/dotnet/aspnetcore/blob/f62f12357c49c4f1cca502e8f4cf57353f0b320f/src/Servers/Kestrel/Core/src/Internal/Http/HttpParser.cs#LL64C37-L64C37 to instead be

private static ReadOnlySpan<byte> Delimiters => new byte[] { ByteLF, 0 };

if (reader.TryReadToAny(out ReadOnlySpan<byte> requestLine, Delimiters, advancePastDelimiter: true))
{
    if (requestLine.Length == 0 || (reader.TryPeek(out var next) && next == 0))
    {
        RejectRequestLine(requestLine);
    }
    ParseRequestLine(handler, requestLine);
    return true;
}

This is where we start parsing the HTTP/1 request so we're already going through the entire request line looking for \n, this just updates to TryReadToAny to search for \0 at the same time, which I hope is optimized for single pass πŸ˜ƒ.

The concerns are:

  1. We're removing the \0 check from the GetAsciiStringNonNullCharacters method which means new callers could accidentally allow null
  2. The errors returned if \0 is found are lacking detail (although a lot faster πŸ˜†)
  3. There is some HTTP/2 and HTTP/3 code calling GetAsciiStringNonNullCharacters although it looked like they were supposed to already be null character checking before calling this method
stephentoub commented 1 year ago

Looks like it would be "easy" to do this

Thanks, I sketched it out in: https://github.com/dotnet/aspnetcore/compare/main...stephentoub:aspnetcore:asciitoutf16 Not exactly what you suggested, but similar.

A bunch of tests failed, and I haven't gone through to see which would be expected (the tests have internals access) and which would be real problems.

Any interest in picking it up and seeing how far we can run with it?

BrennanConroy commented 1 year ago

Yeah, I'll pick it up and see what the team thinks

BrennanConroy commented 1 year ago

It looks like Ascii.ToUtf16 is still slower than the custom code in aspnetcore. https://github.com/dotnet/aspnetcore/issues/45962#issuecomment-1402255853

https://github.com/dotnet/runtime/issues/80245 is open which might be indirectly tracking part of the work to improve the performance. And there is a recent PR that might improve performance https://github.com/dotnet/runtime/pull/85266.

stephentoub commented 1 year ago

It looks like Ascii.ToUtf16 is still slower than the custom code in aspnetcore. https://github.com/dotnet/aspnetcore/issues/45962#issuecomment-1402255853

I'm not aware of any fundamental reason that should be the case. We should fix anything in the core routine that might be contributing those few additional cycles. I'd hope it's not just the difference between returning a bool and returning an OperationStatus.

cc: @adamsitnik, @GrabYourPitchforks

BrennanConroy commented 9 months ago

Grabbed the assembly of Ascii.ToUtf16 and Kestrel's TryGetAsciiString

One very obvious difference is that the core processing is not inlined in the Ascii.ToUtf16 case. That's the first thing I would try when comparing perf, but it always takes me a couple hours to figure out how to get a custom runtime again, so I haven't tried yet πŸ˜†

But if someone wants to take a look at the assembly in the meantime and see if there is anything obviously worse in the Ascii.ToUtf16 case please do!

Ascii.ToUtf16 ``` ; Total bytes of code 171 ; Assembly listing for method System.Text.Ascii:ToUtf16(System.ReadOnlySpan`1[ubyte],System.Span`1[ushort],byref):int (Tier1) ; Emitting BLENDED_CODE for X64 with AVX - Windows ; Tier1 code ; optimized code ; optimized using Dynamic PGO ; rsp based frame ; partially interruptible ; with Dynamic PGO: edge weights are valid, and fgCalledCount is 17696 ; 0 inlinees with PGO data; 2 single block inlinees; 0 inlinees without PGO data G_M000_IG01: ;; offset=0x0000 push r15 push r14 push rdi push rsi push rbp push rbx sub rsp, 56 xor eax, eax mov qword ptr [rsp+0x30], rax mov qword ptr [rsp+0x28], rax mov rbx, r8 G_M000_IG02: ;; offset=0x001B mov rsi, bword ptr [rdx] mov edi, dword ptr [rdx+0x08] mov rbp, bword ptr [rcx] mov ecx, dword ptr [rcx+0x08] cmp ecx, edi jg SHORT G_M000_IG05 mov r14d, ecx xor r15d, r15d G_M000_IG03: ;; offset=0x0031 mov bword ptr [rsp+0x30], rbp mov rcx, rbp mov bword ptr [rsp+0x28], rsi mov rdx, rsi mov r8, r14 call [System.Text.Ascii:WidenAsciiToUtf16(ulong,ulong,ulong):ulong] mov dword ptr [rbx], eax mov ecx, 3 cmp r14, rax mov eax, ecx cmove eax, r15d G_M000_IG04: ;; offset=0x005A add rsp, 56 pop rbx pop rbp pop rsi pop rdi pop r14 pop r15 ret G_M000_IG05: ;; offset=0x0067 mov r14d, edi mov r15d, 1 jmp SHORT G_M000_IG03 ; Total bytes of code 114 ----------------------------------------------------------------------------------------------------------------------------------------- ; Assembly listing for method System.Text.Ascii:WidenAsciiToUtf16(ulong,ulong,ulong):ulong (Tier1) ; Emitting BLENDED_CODE for X64 with AVX - Windows ; Tier1 code ; optimized code ; optimized using Dynamic PGO ; rsp based frame ; fully interruptible ; with Dynamic PGO: edge weights are valid, and fgCalledCount is 12930 ; 0 inlinees with PGO data; 6 single block inlinees; 2 inlinees without PGO data G_M000_IG01: ;; offset=0x0000 vzeroupper G_M000_IG02: ;; offset=0x0003 xor eax, eax cmp r8, 16 jb SHORT G_M000_IG04 mov r10, rdx cmp r8, 32 jb G_M000_IG11 lea r9, [r8-0x20] G_M000_IG03: ;; offset=0x001C vmovups ymm0, ymmword ptr [rcx+rax] vpmovmskb r11d, ymm0 test r11d, r11d jne SHORT G_M000_IG04 vmovaps ymm1, ymm0 vpmovzxbw ymm1, ymm1 vextracti128 xmm0, ymm0, 1 vpmovzxbw ymm0, ymm0 vmovups ymmword ptr [r10], ymm1 vmovups ymmword ptr [r10+0x20], ymm0 add rax, 32 add r10, 64 cmp rax, r9 jbe SHORT G_M000_IG03 G_M000_IG04: ;; offset=0x0056 sub r8, rax cmp r8, 4 jb SHORT G_M000_IG07 G_M000_IG05: ;; offset=0x005F lea r10, [rax+r8-0x04] align [0 bytes for IG06] G_M000_IG06: ;; offset=0x0064 mov r9d, dword ptr [rcx+rax] test r9d, 0xFFFFFFFF80808080 jne SHORT G_M000_IG10 vmovd xmm0, r9 vpmovzxbw xmm0, xmm0 vmovd qword ptr [rdx+2*rax], xmm0 add rax, 4 cmp rax, r10 jbe SHORT G_M000_IG06 G_M000_IG07: ;; offset=0x008A test r8b, 2 jne SHORT G_M000_IG13 test r8b, 1 jne G_M000_IG14 G_M000_IG08: ;; offset=0x009A vzeroupper ret G_M000_IG09: ;; offset=0x009E movzx rcx, r9b mov word ptr [rdx+2*rax], cx inc rax shr r9d, 8 G_M000_IG10: ;; offset=0x00AD movzx rcx, r9b test cl, 128 je SHORT G_M000_IG09 jmp SHORT G_M000_IG08 G_M000_IG11: ;; offset=0x00B8 lea r9, [r8-0x10] G_M000_IG12: ;; offset=0x00BC vmovups xmm0, xmmword ptr [rcx+rax] vptest xmm0, xmmword ptr [reloc @RWD00] jne SHORT G_M000_IG04 vpmovzxbw xmm1, xmm0 vpsrldq xmm0, xmm0, 8 vpmovzxbw xmm0, xmm0 vmovups xmmword ptr [r10], xmm1 vmovups xmmword ptr [r10+0x10], xmm0 add rax, 16 add r10, 32 cmp rax, r9 jbe SHORT G_M000_IG12 jmp G_M000_IG04 G_M000_IG13: ;; offset=0x00F8 movzx r9, word ptr [rcx+rax] test r9d, 0xFFFFFFFF80808080 jne SHORT G_M000_IG10 movzx r10, r9b mov word ptr [rdx+2*rax], r10w shr r9d, 8 mov word ptr [rdx+2*rax+0x02], r9w add rax, 2 test r8b, 1 je G_M000_IG08 G_M000_IG14: ;; offset=0x0127 movzx r9, byte ptr [rcx+rax] test r9b, 128 jne G_M000_IG08 mov word ptr [rdx+2*rax], r9w inc rax jmp G_M000_IG08 RWD00 dq 8080808080808080h, 8080808080808080h ; Total bytes of code 323 ```
TryGetAsciiString ``` ; Assembly listing for method StringUtilities:TryGetAsciiString(ulong,ulong,int):bool (Tier1) ; Emitting BLENDED_CODE for X64 with AVX - Windows ; Tier1 code ; optimized code ; rsp based frame ; fully interruptible ; No PGO data ; 0 inlinees with PGO data; 8 single block inlinees; 2 inlinees without PGO data G_M000_IG01: ;; offset=0x0000 vzeroupper G_M000_IG02: ;; offset=0x0003 movsxd rax, r8d add rax, rcx lea r8, [rax-0x20] cmp rcx, r8 ja SHORT G_M000_IG05 align [0 bytes for IG03] G_M000_IG03: ;; offset=0x0012 vmovups ymm0, ymmword ptr [rcx] vxorps ymm1, ymm1, ymm1 vpcmpgtb ymm1, ymm0, ymm1 vpmovmskb r10d, ymm1 cmp r10d, -1 jne G_M000_IG15 vxorps ymm1, ymm1, ymm1 vpunpcklbw ymm1, ymm0, ymm1 vxorps ymm2, ymm2, ymm2 vpunpckhbw ymm0, ymm0, ymm2 vperm2i128 ymm2, ymm1, ymm0, 32 vperm2i128 ymm0, ymm1, ymm0, 49 vmovups ymmword ptr [rdx], ymm2 vmovups ymmword ptr [rdx+0x20], ymm0 add rcx, 32 add rdx, 64 cmp rcx, r8 jbe SHORT G_M000_IG03 G_M000_IG04: ;; offset=0x005E cmp rcx, rax je G_M000_IG13 G_M000_IG05: ;; offset=0x0067 lea r8, [rax-0x10] cmp rcx, r8 ja SHORT G_M000_IG08 align [0 bytes for IG06] G_M000_IG06: ;; offset=0x0070 vmovups xmm0, xmmword ptr [rcx] vxorps xmm1, xmm1, xmm1 vpcmpgtb xmm1, xmm0, xmm1 vpmovmskb r10d, xmm1 cmp r10d, 0xFFFF jne G_M000_IG15 vxorps xmm1, xmm1, xmm1 vpunpcklbw xmm1, xmm0, xmm1 vxorps xmm2, xmm2, xmm2 vpunpckhbw xmm0, xmm0, xmm2 vmovups xmmword ptr [rdx], xmm1 vmovups xmmword ptr [rdx+0x10], xmm0 add rcx, 16 add rdx, 32 cmp rcx, r8 jbe SHORT G_M000_IG06 G_M000_IG07: ;; offset=0x00B3 cmp rcx, rax je G_M000_IG13 G_M000_IG08: ;; offset=0x00BC lea r8, [rax-0x08] cmp rcx, r8 ja SHORT G_M000_IG10 align [0 bytes for IG09] G_M000_IG09: ;; offset=0x00C5 mov r10, qword ptr [rcx] mov r9, 0xFEFEFEFEFEFEFEFF add r9, r10 or r9, r10 mov r11, 0x8080808080808080 test r9, r11 jne G_M000_IG15 vmovd xmm0, r10 vxorps xmm1, xmm1, xmm1 vpunpcklbw xmm0, xmm0, xmm1 vmovups xmmword ptr [rdx], xmm0 add rcx, 8 add rdx, 16 cmp rcx, r8 jbe SHORT G_M000_IG09 G_M000_IG10: ;; offset=0x0109 lea r8, [rax-0x04] cmp rcx, r8 ja SHORT G_M000_IG11 mov r8d, dword ptr [rcx] lea r10d, [r8+0xFEFEFEFF] or r10d, r8d test r10d, 0xFFFFFFFF80808080 jne SHORT G_M000_IG15 vmovd xmm0, r8 vxorps xmm1, xmm1, xmm1 vpunpcklbw xmm0, xmm0, xmm1 vmovd qword ptr [rdx], xmm0 add rcx, 4 add rdx, 8 G_M000_IG11: ;; offset=0x0142 lea r8, [rax-0x02] cmp rcx, r8 ja SHORT G_M000_IG12 movsx r8, word ptr [rcx] lea r10d, [r8-0x101] movsx r10, r10w or r8d, r10d test r8d, -0x7F80 jne SHORT G_M000_IG15 movzx r8, byte ptr [rcx] mov word ptr [rdx], r8w movzx r8, byte ptr [rcx+0x01] mov word ptr [rdx+0x02], r8w add rcx, 2 add rdx, 4 G_M000_IG12: ;; offset=0x0180 cmp rcx, rax jae SHORT G_M000_IG13 cmp byte ptr [rcx], 0 jle SHORT G_M000_IG15 movzx rax, byte ptr [rcx] mov word ptr [rdx], ax G_M000_IG13: ;; offset=0x0190 mov eax, 1 G_M000_IG14: ;; offset=0x0195 vzeroupper ret G_M000_IG15: ;; offset=0x0199 xor eax, eax G_M000_IG16: ;; offset=0x019B vzeroupper ret ; Total bytes of code 415 ```
GrabYourPitchforks commented 9 months ago

Below are the results I'm getting on my machine. This seems very much within the range of noise.

@BrennanConroy Are you seeing different results than below?


BenchmarkDotNet v0.13.8, Windows 11 (10.0.22621.2283/22H2/2022Update/SunValley2) (Hyper-V)
Intel Core i9-10900K CPU 3.70GHz, 1 CPU, 20 logical and 10 physical cores
.NET SDK 8.0.100-rc.1.23455.8
  [Host]   : .NET 8.0.0 (8.0.23.41904), X64 RyuJIT AVX2
  .NET 8.0 : .NET 8.0.0 (8.0.23.41904), X64 RyuJIT AVX2

Job=.NET 8.0  Runtime=.NET 8.0  
Method StringLength Mean Error StdDev Ratio
Ascii_ToUtf16 4 4.763 ns 0.0336 ns 0.0314 ns 1.00
StringUtilities_TryGetAscii 4 4.123 ns 0.0335 ns 0.0313 ns 0.87
Ascii_ToUtf16 8 4.968 ns 0.0201 ns 0.0168 ns 1.00
StringUtilities_TryGetAscii 8 4.714 ns 0.0307 ns 0.0287 ns 0.95
Ascii_ToUtf16 16 5.159 ns 0.0405 ns 0.0359 ns 1.00
StringUtilities_TryGetAscii 16 3.625 ns 0.0387 ns 0.0362 ns 0.70
Ascii_ToUtf16 24 6.823 ns 0.0681 ns 0.0637 ns 1.00
StringUtilities_TryGetAscii 24 5.212 ns 0.0336 ns 0.0298 ns 0.76
Ascii_ToUtf16 128 8.177 ns 0.0519 ns 0.0485 ns 1.00
StringUtilities_TryGetAscii 128 9.791 ns 0.0186 ns 0.0155 ns 1.20
Ascii_ToUtf16 256 10.232 ns 0.0517 ns 0.0458 ns 1.00
StringUtilities_TryGetAscii 256 11.014 ns 0.0491 ns 0.0459 ns 1.08
Ascii_ToUtf16 1024 28.757 ns 0.1900 ns 0.1777 ns 1.00
StringUtilities_TryGetAscii 1024 33.437 ns 0.2068 ns 0.1727 ns 1.16
stephentoub commented 6 months ago

@BrennanConroy, can you comment on the above?

stephentoub commented 3 days ago

Closing given https://github.com/dotnet/aspnetcore/pull/56578