Closed JeremyKuhne closed 5 years ago
Added the core of the BufferReader struct changes. Working on the actual meat of the parser changes now.
I've changed the request line parser. As part of this change I'm changing tests to no longer expect carriage returns and line feeds in error messages. I presume this is ok?
Okay, the header parsing is there too. Again, this change makes the detail reported on errors end before any CR/LF. It seems like that is the right answer, it would be difficult/costly to pass the context along to include up to the first LF as before.
The next step is to call straight into the methods that take the reader. That has a significant impact when parsing the entire request.
Fyi, I'm using this as a sanity check for getting SequenceReader
into CoreFX.
Ok, I've updated the benchmarks to use the overloads that take the reader and done the initial microbenchmarks.
Before
Method | Mean | Error | StdDev | Op/s | Scaled | ScaledSD | Allocated |
--------------------- |---------:|---------:|---------:|------------:|-------:|---------:|----------:|
PlaintextTechEmpower | 266.6 ns | 1.689 ns | 1.580 ns | 3,751,042.0 | 1.00 | 0.00 | 0 B |
LiveAspNet | 464.1 ns | 1.390 ns | 1.300 ns | 2,154,713.7 | 1.74 | 0.01 | 0 B |
Unicode | 674.9 ns | 2.223 ns | 2.079 ns | 1,481,655.4 | 2.53 | 0.02 | 0 B |
After
Method | Mean | Error | StdDev | Op/s | Scaled | ScaledSD | Allocated |
--------------------- |---------:|----------:|----------:|------------:|-------:|---------:|----------:|
PlaintextTechEmpower | 178.0 ns | 1.3724 ns | 1.2837 ns | 5,617,275.3 | 1.00 | 0.00 | 0 B |
LiveAspNet | 352.3 ns | 0.9841 ns | 0.8218 ns | 2,838,783.7 | 1.98 | 0.01 | 0 B |
Unicode | 434.5 ns | 2.2884 ns | 2.0286 ns | 2,301,551.1 | 2.44 | 0.02 | 0 B |
Looking promising. 😄
Ok, I've updated the benchmarks to use the overloads that take the reader and done the initial microbenchmarks.
Nice results @JeremyKuhne !
/cc @stephentoub @danmosemsft
@davidfowl, @pakrym I think this is now in a finished state. I've run all of the Kestrel.Performance
benchmarks. The key inner measure is the HttpParserBenchmark
which I shared initial performance numbers with above (in the 50% faster range):
Before
Method | Mean | Error | StdDev | Op/s | Scaled | Allocated |
--------------------- |---------:|---------:|---------:|------------:|-------:|----------:|
PlaintextTechEmpower | 266.3 ns | 1.138 ns | 1.064 ns | 3,755,168.1 | 1.00 | 0 B |
LiveAspNet | 462.6 ns | 3.124 ns | 2.922 ns | 2,161,579.6 | 1.74 | 0 B |
Unicode | 655.3 ns | 2.428 ns | 2.152 ns | 1,525,975.5 | 2.46 | 0 B |
After
Method | Mean | Error | StdDev | Op/s | Scaled | ScaledSD | Allocated |
--------------------- |---------:|---------:|---------:|------------:|-------:|---------:|----------:|
PlaintextTechEmpower | 175.9 ns | 1.138 ns | 1.064 ns | 5,684,818.2 | 1.00 | 0.00 | 0 B |
LiveAspNet | 344.3 ns | 1.672 ns | 1.482 ns | 2,904,259.7 | 1.96 | 0.01 | 0 B |
Unicode | 428.0 ns | 2.015 ns | 1.885 ns | 2,336,299.4 | 2.43 | 0.02 | 0 B |
Http1ConnectionBenchmark
is faster too, in the ~10-15% range.
Before
Method | Mean | Error | StdDev | Op/s | Scaled | ScaledSD | Gen 0 | Allocated |
--------------------- |-----------:|----------:|---------:|------------:|-------:|---------:|-------:|----------:|
PlaintextTechEmpower | 525.8 ns | 3.155 ns | 2.951 ns | 1,901,935.3 | 1.00 | 0.00 | 0.4883 | 172 KB |
LiveAspNet | 1,362.6 ns | 10.312 ns | 9.141 ns | 733,874.6 | 2.59 | 0.02 | 1.9531 | 528 KB |
After
Method | Mean | Error | StdDev | Op/s | Scaled | ScaledSD | Gen 0 | Allocated |
--------------------- |-----------:|---------:|---------:|------------:|-------:|---------:|-------:|----------:|
PlaintextTechEmpower | 458.2 ns | 2.341 ns | 2.190 ns | 2,182,667.0 | 1.00 | 0.00 | 0.7324 | 172 KB |
LiveAspNet | 1,248.9 ns | 5.623 ns | 4.984 ns | 800,682.8 | 2.73 | 0.02 | 1.9531 | 528 KB |
RequestParsingBenchmark
is slightly faster, but isn't written efficiently. It resets the connection between the request line and the rest of the header, requiring the reader to be reinitialized. There is one exception and that is PipelinedPlaintextTechEmpowerDrainBuffer
, which uses the reader through the full request and measures the sort of improvement I'd expect (almost 20%).
Before
Method | Mean | Error | StdDev | Op/s | Scaled | ScaledSD | Gen 0 | Allocated |
----------------------------------------- |-----------:|----------:|----------:|------------:|-------:|---------:|--------:|------------:|
PlaintextTechEmpower | 950.9 ns | 7.243 ns | 6.775 ns | 1,051,650.0 | 1.00 | 0.00 | 0.9766 | 172.13 KB |
PlaintextAbsoluteUri | 1,363.7 ns | 5.267 ns | 4.669 ns | 733,309.4 | 1.43 | 0.01 | 0.9766 | 284.13 KB |
PipelinedPlaintextTechEmpower | 747.6 ns | 4.768 ns | 3.982 ns | 1,337,616.0 | 0.79 | 0.01 | 15.6250 | 2753.01 KB |
PipelinedPlaintextTechEmpowerDrainBuffer | 540.9 ns | 1.284 ns | 1.072 ns | 1,848,742.9 | 0.57 | 0.00 | 7.8125 | 2753.01 KB |
LiveAspNet | 1,754.7 ns | 6.878 ns | 6.097 ns | 569,907.4 | 1.85 | 0.01 | 2.9297 | 528.13 KB |
PipelinedLiveAspNet | 1,672.5 ns | 6.553 ns | 5.809 ns | 597,891.3 | 1.76 | 0.01 | 31.2500 | 8466.04 KB |
Unicode | 2,545.9 ns | 14.950 ns | 13.985 ns | 392,789.6 | 2.68 | 0.02 | 3.9063 | 916.25 KB |
UnicodePipelined | 2,550.1 ns | 12.750 ns | 11.926 ns | 392,146.8 | 2.68 | 0.02 | 62.5000 | 14824.08 KB |
After
Method | Mean | Error | StdDev | Op/s | Scaled | ScaledSD | Gen 0 | Allocated |
----------------------------------------- |-----------:|----------:|----------:|------------:|-------:|---------:|--------:|------------:|
PlaintextTechEmpower | 922.9 ns | 6.138 ns | 5.742 ns | 1,083,502.7 | 1.00 | 0.00 | - | 172.13 KB |
PlaintextAbsoluteUri | 1,262.7 ns | 6.163 ns | 5.765 ns | 791,967.3 | 1.37 | 0.01 | 0.9766 | 284.13 KB |
PipelinedPlaintextTechEmpower | 713.1 ns | 5.334 ns | 4.728 ns | 1,402,313.1 | 0.77 | 0.01 | 15.6250 | 2753.01 KB |
PipelinedPlaintextTechEmpowerDrainBuffer | 453.5 ns | 3.220 ns | 2.689 ns | 2,204,956.1 | 0.49 | 0.00 | 15.6250 | 2752.51 KB |
LiveAspNet | 1,707.9 ns | 13.422 ns | 11.898 ns | 585,520.3 | 1.85 | 0.02 | 1.9531 | 528.13 KB |
PipelinedLiveAspNet | 1,607.5 ns | 14.062 ns | 12.465 ns | 622,102.2 | 1.74 | 0.02 | 46.8750 | 8450.04 KB |
Unicode | 2,404.5 ns | 11.966 ns | 10.608 ns | 415,885.8 | 2.61 | 0.02 | 5.8594 | 916.25 KB |
UnicodePipelined | 2,416.9 ns | 15.996 ns | 14.180 ns | 413,744.8 | 2.62 | 0.02 | 62.5000 | 14820.08 KB |
KnownStringsBenchmark
is significantly faster (20% and up). Http1ConnectionParsingOverheadBenchmark
is slower, but that is expected as we're creating a reader and not actually using it (this test uses a null parser). Http1WritingBenchmark
is broken in master.
The rest of the Kestrel.Performance
benchmarks showed no impact.
This change removes essentially all of the unsafe code in HttpParser and makes it significantly faster. It relies on the BufferReader
holding context to maximize the performance gains. There is a good chance you might find more headroom in other areas by leveraging a similar approach.
It also gets performance gains by changing the way we parse. It breaks up valid lines by CRLF and only passes the "good" line for further parsing, which simplifies the inner parsing logic. When iterating through the design in CoreFXlab I found that to be the source of about half of the performance gains.
The BufferReader
code is a copy of what will go into CoreFX as SequenceReader
. I'll be working on getting that through and then the BufferReader
can be dropped from Kestrel.
cc @sebastienros
Note that @GrabYourPitchforks Span/Memory perf changes should hopefully improve things even further.
Nice 😄
@JeremyKuhne is this good to go? @sebastienros We want to run an end to end perf test here, where is the bottt 😄
I've run plaintext on this:
Before:
RequestsPerSecond: 1,913,912
Max CPU (%): 93
WorkingSet (MB): 384
Avg. Latency (ms): 3.8
Startup (ms): 388
First Request (ms): 85.9
Latency (ms): 0.6
Total Requests: 19,309,615
Duration: (ms) 10,090
Socket Errors: 0
Bad Responses: 0
SDK: 2.2.100-rtm-009578
Runtime: 3.0.0-preview1-27101-02
ASP.NET Core: 3.0.0-alpha1-10655
After:
RequestsPerSecond: 1,913,873
Max CPU (%): 92
WorkingSet (MB): 382
Avg. Latency (ms): 5.4
Startup (ms): 393
First Request (ms): 82.2
Latency (ms): 0.4
Total Requests: 19,305,979
Duration: (ms) 10,087
Socket Errors: 0
Bad Responses: 0
SDK: 2.2.100-rtm-009578
Runtime: 3.0.0-preview1-27101-02
ASP.NET Core: 3.0.0-alpha1-10655
Platform benchmark is unfortunately broken in master.
@davidfowl Yes, I think I'm done here.
@dotnet-bot test Ubuntu 16.04 Debug Build
@davidfowl, @pakrym Is there anything else you want done here or can this be merged? Also, how should we track flipping to SequenceReader
from CoreFX?
We can't merge this until we have a 3.0 baseline (which I'm hoping will be soon).
cc @sebastienros
@davidfowl did you see @pakrym ran these already ?
@sebastienros that's not good enough. I'd like us to do regular 3.0 runs against master so that we're not surprised anymore. Once we close down 2.2 and switch to 3.0 then we can take this change and see if it affects that baseline. I'm going to be super anal about changes we can't detect now 😄
Kestrel work has moved to aspnetcore repo, and this repo will be archived. If you're still interested in pursuing this PR, please move it over to aspnetcore. Thanks.
@pakrym, @davidfowl This is the simple "convert to span" change. I'm porting the changes I made in CoreFxLab on top of this.