StringUtilities based on vectorized helpers provided by core libraries

gfoidl commented 1 year ago

StringUtilities currently use a vectorized approach based on hw-intrinsics and doesn't include Arm. In https://github.com/dotnet/aspnetcore/pull/44040 there's an attempt to use the xplat-intrinsics, but in the meantime https://github.com/dotnet/runtime/issues/28230 got done thus it would be the best option to base StringUtilities on these new APIs so that the custom vectorized code can go away (cf. https://github.com/dotnet/aspnetcore/pull/44040#issuecomment-1272062476).

With the ASCII-APIs it could look like https://github.com/dotnet/aspnetcore/compare/d2a1c23d90d4e69df47616b59b7215b3da8cc9f6...25c56209d2e3e8b9f8922b260aff5d2271f5d021, but there are some pieces missing. Copied from in https://github.com/dotnet/aspnetcore/pull/44040#issuecomment-1289517635:

What's missing to achieve this?

ASCII

In StringUtilities for ASCII values of the range (0x00, 0x80) are considered valid. Ascii.ToUtf16 treats the whole ASCII range [0x00, 0x80) as valid.

Thus something like

namespace System.Buffers.Text
{
    public static class Ascii
    {
        // existing methods
-       public static OperationStatus ToUtf16(ReadOnlySpan<byte> source, Span<char> destination, out int bytesConsumed, out int charsWritten);
+       public static OperationStatus ToUtf16(ReadOnlySpan<byte> source, Span<char> destination, out int bytesConsumed, out int charsWritten, bool treatNullAsInvalid = false);
    }
}

is needed.

Latin1

I don't know how hot Latin1 is here, but as it's special cased in https://github.com/dotnet/aspnetcore/blob/d3259f92851e4772d3230177be5b71be20d3ff6d/src/Servers/Kestrel/Core/src/Internal/Infrastructure/HttpUtilities.cs#L140-L143 I think it's hot enough to be optimized. Besided that standard Encoding.Latin1 can't be used solely, as 0x00 is considered invalid.

Thus basically the same as for ASCII above applies, i.e.

namespace System.Buffers.Text
{
+   public static class Latin1
+   {
+       // other methods similar as Ascii?
+       public static OperationStatus ToUtf16(ReadOnlySpan<byte> source, Span<char> destination, out int bytesConsumed, out int charsWritten, bool treatNullAsInvalid = false);
+   }
}

If the type Latin1 seems too heavy, too niche, whatever, as alternative one could use something like https://github.com/dotnet/aspnetcore/compare/25c56209d2e3e8b9f8922b260aff5d2271f5d021...e3afae2dcf436dfe357b5a961f0184e51681325e where Latin1 bytes are expanded to UTF-16 via Asii.ToUtf16 and if non-ASCII is met, then the remainder is done in scalar way. Though this is a naive approach, which should be perf-tested -- I don't have numbers on how likeley Latin1 inputs with ranges [0x80, 0xFF] are, but if they are rare then that (simple) approach should be good enough.

gfoidl commented 1 year ago

For the ASCII-part --> https://github.com/dotnet/runtime/issues/80366

For Latin1 I hope that anyone has information on how hot it is or if the "fallback-solution" is good enough.

mgravell commented 1 year ago

Did some benchmarks to get an idea of what we're looking at here with different lengths - focusing on the header-name path, but ultimately the same decode route is used from the other scenarios too; tl,dr; this would purely be for code reduction - even in the no-check path (i.e. without having to consider nul), this looks marginally slower if we make this change. Since we're hyper-conscious of performance in these paths, I'm tempted to say "leave alone" for now? benchmark code here

BenchmarkDotNet=v0.13.4, OS=Windows 11 (10.0.22621.1105)
Intel Core i9-9900K CPU 3.60GHz (Coffee Lake), 1 CPU, 16 logical and 8 physical cores
.NET SDK=8.0.100-alpha.1.23073.1
  [Host]   : .NET 8.0.0 (8.0.23.7101), X64 RyuJIT AVX2
  .NET 8.0 : .NET 8.0.0 (8.0.23.7101), X64 RyuJIT AVX2

Job=.NET 8.0  Runtime=.NET 8.0

|            Method | Length |       Mean |     Error |    StdDev |     Median | Ratio | RatioSD |
|------------------ |------- |-----------:|----------:|----------:|-----------:|------:|--------:|
| TryGetAsciiString |      0 |  0.9079 ns | 0.0158 ns | 0.0132 ns |  0.9071 ns |  1.00 |    0.00 |
| ToUtf16_WithCheck |      0 |  0.9080 ns | 0.0096 ns | 0.0085 ns |  0.9079 ns |  1.00 |    0.02 |
|   ToUtf16_NoCheck |      0 |  0.9076 ns | 0.0173 ns | 0.0199 ns |  0.9058 ns |  1.00 |    0.03 |
|                   |        |            |           |           |            |       |         |
| TryGetAsciiString |      5 | 11.0452 ns | 0.1648 ns | 0.1376 ns | 11.0280 ns |  1.00 |    0.00 |
| ToUtf16_WithCheck |      5 | 14.9856 ns | 0.1071 ns | 0.0949 ns | 15.0108 ns |  1.36 |    0.02 |
|   ToUtf16_NoCheck |      5 | 12.5450 ns | 0.2485 ns | 0.3942 ns | 12.4213 ns |  1.16 |    0.04 |
|                   |        |            |           |           |            |       |         |
| TryGetAsciiString |     10 | 12.9616 ns | 0.2560 ns | 0.3752 ns | 12.9021 ns |  1.00 |    0.00 |
| ToUtf16_WithCheck |     10 | 19.3689 ns | 0.2962 ns | 0.2473 ns | 19.3027 ns |  1.48 |    0.04 |
|   ToUtf16_NoCheck |     10 | 15.0387 ns | 0.2627 ns | 0.2457 ns | 15.0060 ns |  1.14 |    0.04 |
|                   |        |            |           |           |            |       |         |
| TryGetAsciiString |     20 | 13.9427 ns | 0.2654 ns | 0.5300 ns | 13.8382 ns |  1.00 |    0.00 |
| ToUtf16_WithCheck |     20 | 17.9072 ns | 0.2683 ns | 0.2379 ns | 17.9686 ns |  1.25 |    0.04 |
|   ToUtf16_NoCheck |     20 | 14.8805 ns | 0.2764 ns | 0.4221 ns | 14.8519 ns |  1.06 |    0.04 |
|                   |        |            |           |           |            |       |         |
| TryGetAsciiString |     40 | 15.3223 ns | 0.3028 ns | 0.7371 ns | 15.0601 ns |  1.00 |    0.00 |
| ToUtf16_WithCheck |     40 | 22.2503 ns | 0.4411 ns | 0.8285 ns | 22.0874 ns |  1.44 |    0.06 |
|   ToUtf16_NoCheck |     40 | 17.5770 ns | 0.3427 ns | 0.8470 ns | 17.2736 ns |  1.15 |    0.06 |
|                   |        |            |           |           |            |       |         |
| TryGetAsciiString |     80 | 19.4507 ns | 0.3867 ns | 1.0254 ns | 19.0594 ns |  1.00 |    0.00 |
| ToUtf16_WithCheck |     80 | 27.5822 ns | 0.5413 ns | 1.1180 ns | 27.1287 ns |  1.41 |    0.07 |
|   ToUtf16_NoCheck |     80 | 23.3071 ns | 0.4649 ns | 1.1139 ns | 22.9619 ns |  1.20 |    0.07 |
|                   |        |            |           |           |            |       |         |
| TryGetAsciiString |    128 | 25.0076 ns | 0.5693 ns | 1.6425 ns | 24.2883 ns |  1.00 |    0.00 |
| ToUtf16_WithCheck |    128 | 31.0631 ns | 0.6206 ns | 1.4384 ns | 30.7101 ns |  1.25 |    0.08 |
|   ToUtf16_NoCheck |    128 | 27.0650 ns | 0.5338 ns | 1.2581 ns | 26.5618 ns |  1.09 |    0.07 |
|                   |        |            |           |           |            |       |         |
| TryGetAsciiString |    256 | 40.6382 ns | 0.9963 ns | 2.8904 ns | 39.3476 ns |  1.00 |    0.00 |
| ToUtf16_WithCheck |    256 | 46.9761 ns | 0.9321 ns | 2.6593 ns | 45.9430 ns |  1.16 |    0.08 |
|   ToUtf16_NoCheck |    256 | 40.5977 ns | 1.5492 ns | 4.5435 ns | 38.9056 ns |  1.00 |    0.10 |

danmoseley commented 1 year ago

cc @MihaZupan does this surprise you? (I think you worked on these API?)

We'd love to take a change if it could make us faster.

mgravell commented 1 year ago

Note: I haven't considered the arm aspect; that may make the change compelling, if we can resolve the nul problem. Would welcome your thoughts on that aspect, @MihaZupan

Now all I need is an arm CPU 🤔

MihaZupan commented 1 year ago

(I think you worked on these API?)

That was @adamsitnik for these ones :)

Now all I need is an arm CPU 🤔

I've found using crank with aspnet-citrine-arm-lin and similar arm VMs to be very handy when running microbenchmarks.

E.g. something like this benchmarks.yml and running

crank --config .\benchmarks.yml --scenario custom --profile aspnet-citrine-arm-lin

gfoidl commented 1 year ago

If it's slower, then a goal should be to improve the runtime-part so that it is at least on-par with the custom solution (https://github.com/dotnet/aspnetcore/pull/44040#issuecomment-1272062476 is in the same vein).

(at the moment I'm not so much in the office, but I'll do some investigations when I have time again)

mgravell commented 1 year ago

ARM numbers look very similar:

BenchmarkDotNet=v0.13.4, OS=debian 11 (container)
Unknown processor
.NET SDK=7.0.102 [/usr/share/dotnet/sdk]
  [Host] : .NET 8.0.0 (8.0.23.7601), Arm64 RyuJIT AdvSIMD

Toolchain=InProcessEmitToolchain

            Method | Length |      Mean |     Error |    StdDev | Ratio |   Gen0 | Allocated | Alloc Ratio |
------------------ |------- |----------:|----------:|----------:|------:|-------:|----------:|------------:|
 TryGetAsciiString |      0 |  1.341 ns | 0.0002 ns | 0.0002 ns |  1.00 |      - |         - |          NA |
 ToUtf16_WithCheck |      0 |  1.341 ns | 0.0000 ns | 0.0000 ns |  1.00 |      - |         - |          NA |
   ToUtf16_NoCheck |      0 |  1.341 ns | 0.0000 ns | 0.0000 ns |  1.00 |      - |         - |          NA |
                   |        |           |           |           |       |        |           |             |
 TryGetAsciiString |      5 | 21.586 ns | 0.0171 ns | 0.0160 ns |  1.00 | 0.0019 |      32 B |        1.00 |
 ToUtf16_WithCheck |      5 | 29.631 ns | 0.0166 ns | 0.0147 ns |  1.37 | 0.0019 |      32 B |        1.00 |
   ToUtf16_NoCheck |      5 | 23.289 ns | 0.0123 ns | 0.0109 ns |  1.08 | 0.0019 |      32 B |        1.00 |
                   |        |           |           |           |       |        |           |             |
 TryGetAsciiString |     10 | 24.067 ns | 0.0077 ns | 0.0065 ns |  1.00 | 0.0029 |      48 B |        1.00 |
 ToUtf16_WithCheck |     10 | 34.350 ns | 0.0355 ns | 0.0332 ns |  1.43 | 0.0029 |      48 B |        1.00 |
   ToUtf16_NoCheck |     10 | 25.809 ns | 0.0162 ns | 0.0151 ns |  1.07 | 0.0029 |      48 B |        1.00 |
                   |        |           |           |           |       |        |           |             |
 TryGetAsciiString |     20 | 24.910 ns | 0.0148 ns | 0.0123 ns |  1.00 | 0.0038 |      64 B |        1.00 |
 ToUtf16_WithCheck |     20 | 32.247 ns | 0.0104 ns | 0.0092 ns |  1.29 | 0.0038 |      64 B |        1.00 |
   ToUtf16_NoCheck |     20 | 25.126 ns | 0.0162 ns | 0.0143 ns |  1.01 | 0.0038 |      64 B |        1.00 |
                   |        |           |           |           |       |        |           |             |
 TryGetAsciiString |     40 | 30.101 ns | 0.0340 ns | 0.0284 ns |  1.00 | 0.0062 |     104 B |        1.00 |
 ToUtf16_WithCheck |     40 | 38.205 ns | 0.0393 ns | 0.0367 ns |  1.27 | 0.0062 |     104 B |        1.00 |
   ToUtf16_NoCheck |     40 | 30.887 ns | 0.0147 ns | 0.0131 ns |  1.03 | 0.0062 |     104 B |        1.00 |
                   |        |           |           |           |       |        |           |             |
 TryGetAsciiString |     80 | 36.480 ns | 0.0293 ns | 0.0274 ns |  1.00 | 0.0110 |     184 B |        1.00 |
 ToUtf16_WithCheck |     80 | 45.461 ns | 0.1825 ns | 0.1707 ns |  1.25 | 0.0110 |     184 B |        1.00 |
   ToUtf16_NoCheck |     80 | 36.487 ns | 0.0188 ns | 0.0176 ns |  1.00 | 0.0110 |     184 B |        1.00 |
                   |        |           |           |           |       |        |           |             |
 TryGetAsciiString |    128 | 46.440 ns | 0.0214 ns | 0.0189 ns |  1.00 | 0.0167 |     280 B |        1.00 |
 ToUtf16_WithCheck |    128 | 55.810 ns | 0.1230 ns | 0.1090 ns |  1.20 | 0.0167 |     280 B |        1.00 |
   ToUtf16_NoCheck |    128 | 44.214 ns | 0.0427 ns | 0.0378 ns |  0.95 | 0.0167 |     280 B |        1.00 |
                   |        |           |           |           |       |        |           |             |
 TryGetAsciiString |    256 | 70.572 ns | 0.0963 ns | 0.0901 ns |  1.00 | 0.0319 |     536 B |        1.00 |
 ToUtf16_WithCheck |    256 | 85.058 ns | 0.0704 ns | 0.0659 ns |  1.21 | 0.0319 |     536 B |        1.00 |
   ToUtf16_NoCheck |    256 | 68.230 ns | 0.0853 ns | 0.0798 ns |  0.97 | 0.0319 |     536 B |        1.00 |

adamsitnik commented 1 year ago

bool treatNullAsInvalid = false

I am the person who has picked up @GrabYourPitchforks work and finished it. We have discussed the special handling of null character and got to the conclusion that before we make any progress on the proposal we need to get a better understanding of why null has special treatment and why not other control characters?

mgravell commented 1 year ago

@adamsitnik I wasn't around for the original design here, but I guess ultimately other characters don't have the risks of things like null byte injection attacks (OWASP: https://owasp.org/www-community/attacks/Embedding_Null_Code). nul is always going to be a bit different because of the C-style string semantics, where-as other control characters are ... certainly unusual, but not actively risky. But from a pure decode standpoint: yes, nul is (at least contextually) valid.

ghost commented 1 year ago

Thanks for contacting us.

We're moving this issue to the .NET 8 Planning milestone for future evaluation / consideration. We would like to keep this around to collect more feedback, which can help us with prioritizing this work. We will re-evaluate this issue, during our next planning meeting(s). If we later determine, that the issue has no community involvement, or it's very rare and low-impact issue, we will close it - so that the team can focus on more important and high impact issues. To learn more about what to expect next and how this issue will be handled you can read more about our triage process here.

mgravell commented 1 year ago

At the current time, I don't think this gives us good options that don't have significant compromises; after consideration, we're going to leave this "as-is" for now, at the expense of retaining a big chunk of similar code. It could be worth revisiting again in the future.

gfoidl commented 1 year ago

@mgravell should https://github.com/dotnet/aspnetcore/pull/44040 be re-opened (and re-freshed) then? This would add vector goodness for ARM too.

mgravell commented 1 year ago

oof, that one looks fun! just to check: the topic there is additional perf work in the retained "big chunk of similar code" that I mentioned before? if so, I guess so - the numbers look promising - assuming of course that @GrabYourPitchforks isn't promising us a behaviour-compatible inbuilt method :)

MihaZupan commented 1 year ago

My impression was that there are still some unanswered questions from ASP.NET's perspective on the runtime issue on whether this special case should exist at all, and if yes, could it be handled differently such that there would be no need for new APIs.

dotnet / aspnetcore