aspnet / KestrelHttpServer

[Archived] A cross platform web server for ASP.NET Core. Project moved to https://github.com/aspnet/AspNetCore
Apache License 2.0
2.63k stars 528 forks source link

Make parser "safe" #3068

Closed JeremyKuhne closed 5 years ago

JeremyKuhne commented 5 years ago

@pakrym, @davidfowl This is the simple "convert to span" change. I'm porting the changes I made in CoreFxLab on top of this.

JeremyKuhne commented 5 years ago

Added the core of the BufferReader struct changes. Working on the actual meat of the parser changes now.

JeremyKuhne commented 5 years ago

I've changed the request line parser. As part of this change I'm changing tests to no longer expect carriage returns and line feeds in error messages. I presume this is ok?

JeremyKuhne commented 5 years ago

Okay, the header parsing is there too. Again, this change makes the detail reported on errors end before any CR/LF. It seems like that is the right answer, it would be difficult/costly to pass the context along to include up to the first LF as before.

The next step is to call straight into the methods that take the reader. That has a significant impact when parsing the entire request.

JeremyKuhne commented 5 years ago

Fyi, I'm using this as a sanity check for getting SequenceReader into CoreFX.

JeremyKuhne commented 5 years ago

Ok, I've updated the benchmarks to use the overloads that take the reader and done the initial microbenchmarks.

Before

               Method |     Mean |    Error |   StdDev |        Op/s | Scaled | ScaledSD | Allocated |
--------------------- |---------:|---------:|---------:|------------:|-------:|---------:|----------:|
 PlaintextTechEmpower | 266.6 ns | 1.689 ns | 1.580 ns | 3,751,042.0 |   1.00 |     0.00 |       0 B |
           LiveAspNet | 464.1 ns | 1.390 ns | 1.300 ns | 2,154,713.7 |   1.74 |     0.01 |       0 B |
              Unicode | 674.9 ns | 2.223 ns | 2.079 ns | 1,481,655.4 |   2.53 |     0.02 |       0 B |

After

               Method |     Mean |     Error |    StdDev |        Op/s | Scaled | ScaledSD | Allocated |
--------------------- |---------:|----------:|----------:|------------:|-------:|---------:|----------:|
 PlaintextTechEmpower | 178.0 ns | 1.3724 ns | 1.2837 ns | 5,617,275.3 |   1.00 |     0.00 |       0 B |
           LiveAspNet | 352.3 ns | 0.9841 ns | 0.8218 ns | 2,838,783.7 |   1.98 |     0.01 |       0 B |
              Unicode | 434.5 ns | 2.2884 ns | 2.0286 ns | 2,301,551.1 |   2.44 |     0.02 |       0 B |

Looking promising. 😄

joshfree commented 5 years ago

Ok, I've updated the benchmarks to use the overloads that take the reader and done the initial microbenchmarks.

Nice results @JeremyKuhne !

joshfree commented 5 years ago

/cc @stephentoub @danmosemsft

JeremyKuhne commented 5 years ago

@davidfowl, @pakrym I think this is now in a finished state. I've run all of the Kestrel.Performance benchmarks. The key inner measure is the HttpParserBenchmark which I shared initial performance numbers with above (in the 50% faster range):

Before

               Method |     Mean |    Error |   StdDev |        Op/s | Scaled | Allocated |
--------------------- |---------:|---------:|---------:|------------:|-------:|----------:|
 PlaintextTechEmpower | 266.3 ns | 1.138 ns | 1.064 ns | 3,755,168.1 |   1.00 |       0 B |
           LiveAspNet | 462.6 ns | 3.124 ns | 2.922 ns | 2,161,579.6 |   1.74 |       0 B |
              Unicode | 655.3 ns | 2.428 ns | 2.152 ns | 1,525,975.5 |   2.46 |       0 B |

After

               Method |     Mean |    Error |   StdDev |        Op/s | Scaled | ScaledSD | Allocated |
--------------------- |---------:|---------:|---------:|------------:|-------:|---------:|----------:|
 PlaintextTechEmpower | 175.9 ns | 1.138 ns | 1.064 ns | 5,684,818.2 |   1.00 |     0.00 |       0 B |
           LiveAspNet | 344.3 ns | 1.672 ns | 1.482 ns | 2,904,259.7 |   1.96 |     0.01 |       0 B |
              Unicode | 428.0 ns | 2.015 ns | 1.885 ns | 2,336,299.4 |   2.43 |     0.02 |       0 B |

Http1ConnectionBenchmark is faster too, in the ~10-15% range.

Before

               Method |       Mean |     Error |   StdDev |        Op/s | Scaled | ScaledSD |  Gen 0 | Allocated |
--------------------- |-----------:|----------:|---------:|------------:|-------:|---------:|-------:|----------:|
 PlaintextTechEmpower |   525.8 ns |  3.155 ns | 2.951 ns | 1,901,935.3 |   1.00 |     0.00 | 0.4883 |    172 KB |
           LiveAspNet | 1,362.6 ns | 10.312 ns | 9.141 ns |   733,874.6 |   2.59 |     0.02 | 1.9531 |    528 KB |

After

               Method |       Mean |    Error |   StdDev |        Op/s | Scaled | ScaledSD |  Gen 0 | Allocated |
--------------------- |-----------:|---------:|---------:|------------:|-------:|---------:|-------:|----------:|
 PlaintextTechEmpower |   458.2 ns | 2.341 ns | 2.190 ns | 2,182,667.0 |   1.00 |     0.00 | 0.7324 |    172 KB |
           LiveAspNet | 1,248.9 ns | 5.623 ns | 4.984 ns |   800,682.8 |   2.73 |     0.02 | 1.9531 |    528 KB |

RequestParsingBenchmark is slightly faster, but isn't written efficiently. It resets the connection between the request line and the rest of the header, requiring the reader to be reinitialized. There is one exception and that is PipelinedPlaintextTechEmpowerDrainBuffer, which uses the reader through the full request and measures the sort of improvement I'd expect (almost 20%).

Before

                                   Method |       Mean |     Error |    StdDev |        Op/s | Scaled | ScaledSD |   Gen 0 |   Allocated |
----------------------------------------- |-----------:|----------:|----------:|------------:|-------:|---------:|--------:|------------:|
                     PlaintextTechEmpower |   950.9 ns |  7.243 ns |  6.775 ns | 1,051,650.0 |   1.00 |     0.00 |  0.9766 |   172.13 KB |
                     PlaintextAbsoluteUri | 1,363.7 ns |  5.267 ns |  4.669 ns |   733,309.4 |   1.43 |     0.01 |  0.9766 |   284.13 KB |
            PipelinedPlaintextTechEmpower |   747.6 ns |  4.768 ns |  3.982 ns | 1,337,616.0 |   0.79 |     0.01 | 15.6250 |  2753.01 KB |
 PipelinedPlaintextTechEmpowerDrainBuffer |   540.9 ns |  1.284 ns |  1.072 ns | 1,848,742.9 |   0.57 |     0.00 |  7.8125 |  2753.01 KB |
                               LiveAspNet | 1,754.7 ns |  6.878 ns |  6.097 ns |   569,907.4 |   1.85 |     0.01 |  2.9297 |   528.13 KB |
                      PipelinedLiveAspNet | 1,672.5 ns |  6.553 ns |  5.809 ns |   597,891.3 |   1.76 |     0.01 | 31.2500 |  8466.04 KB |
                                  Unicode | 2,545.9 ns | 14.950 ns | 13.985 ns |   392,789.6 |   2.68 |     0.02 |  3.9063 |   916.25 KB |
                         UnicodePipelined | 2,550.1 ns | 12.750 ns | 11.926 ns |   392,146.8 |   2.68 |     0.02 | 62.5000 | 14824.08 KB |

After

                                   Method |       Mean |     Error |    StdDev |        Op/s | Scaled | ScaledSD |   Gen 0 |   Allocated |
----------------------------------------- |-----------:|----------:|----------:|------------:|-------:|---------:|--------:|------------:|
                     PlaintextTechEmpower |   922.9 ns |  6.138 ns |  5.742 ns | 1,083,502.7 |   1.00 |     0.00 |       - |   172.13 KB |
                     PlaintextAbsoluteUri | 1,262.7 ns |  6.163 ns |  5.765 ns |   791,967.3 |   1.37 |     0.01 |  0.9766 |   284.13 KB |
            PipelinedPlaintextTechEmpower |   713.1 ns |  5.334 ns |  4.728 ns | 1,402,313.1 |   0.77 |     0.01 | 15.6250 |  2753.01 KB |
 PipelinedPlaintextTechEmpowerDrainBuffer |   453.5 ns |  3.220 ns |  2.689 ns | 2,204,956.1 |   0.49 |     0.00 | 15.6250 |  2752.51 KB |
                               LiveAspNet | 1,707.9 ns | 13.422 ns | 11.898 ns |   585,520.3 |   1.85 |     0.02 |  1.9531 |   528.13 KB |
                      PipelinedLiveAspNet | 1,607.5 ns | 14.062 ns | 12.465 ns |   622,102.2 |   1.74 |     0.02 | 46.8750 |  8450.04 KB |
                                  Unicode | 2,404.5 ns | 11.966 ns | 10.608 ns |   415,885.8 |   2.61 |     0.02 |  5.8594 |   916.25 KB |
                         UnicodePipelined | 2,416.9 ns | 15.996 ns | 14.180 ns |   413,744.8 |   2.62 |     0.02 | 62.5000 | 14820.08 KB |

KnownStringsBenchmark is significantly faster (20% and up). Http1ConnectionParsingOverheadBenchmark is slower, but that is expected as we're creating a reader and not actually using it (this test uses a null parser). Http1WritingBenchmark is broken in master.

The rest of the Kestrel.Performance benchmarks showed no impact.

Current state and next steps

This change removes essentially all of the unsafe code in HttpParser and makes it significantly faster. It relies on the BufferReader holding context to maximize the performance gains. There is a good chance you might find more headroom in other areas by leveraging a similar approach.

It also gets performance gains by changing the way we parse. It breaks up valid lines by CRLF and only passes the "good" line for further parsing, which simplifies the inner parsing logic. When iterating through the design in CoreFXlab I found that to be the source of about half of the performance gains.

The BufferReader code is a copy of what will go into CoreFX as SequenceReader. I'll be working on getting that through and then the BufferReader can be dropped from Kestrel.

pakrym commented 5 years ago

cc @sebastienros

JeremyKuhne commented 5 years ago

Note that @GrabYourPitchforks Span/Memory perf changes should hopefully improve things even further.

benaadams commented 5 years ago

Nice 😄

davidfowl commented 5 years ago

@JeremyKuhne is this good to go? @sebastienros We want to run an end to end perf test here, where is the bottt 😄

pakrym commented 5 years ago

I've run plaintext on this:

Before:

RequestsPerSecond:           1,913,912
Max CPU (%):                 93
WorkingSet (MB):             384
Avg. Latency (ms):           3.8
Startup (ms):                388
First Request (ms):          85.9
Latency (ms):                0.6
Total Requests:              19,309,615
Duration: (ms)               10,090
Socket Errors:               0
Bad Responses:               0
SDK:                         2.2.100-rtm-009578
Runtime:                     3.0.0-preview1-27101-02
ASP.NET Core:                3.0.0-alpha1-10655

After:

RequestsPerSecond:           1,913,873
Max CPU (%):                 92
WorkingSet (MB):             382
Avg. Latency (ms):           5.4
Startup (ms):                393
First Request (ms):          82.2
Latency (ms):                0.4
Total Requests:              19,305,979
Duration: (ms)               10,087
Socket Errors:               0
Bad Responses:               0
SDK:                         2.2.100-rtm-009578
Runtime:                     3.0.0-preview1-27101-02
ASP.NET Core:                3.0.0-alpha1-10655

Platform benchmark is unfortunately broken in master.

JeremyKuhne commented 5 years ago

@davidfowl Yes, I think I'm done here.

davidfowl commented 5 years ago

@dotnet-bot test Ubuntu 16.04 Debug Build

JeremyKuhne commented 5 years ago

@davidfowl, @pakrym Is there anything else you want done here or can this be merged? Also, how should we track flipping to SequenceReader from CoreFX?

davidfowl commented 5 years ago

We can't merge this until we have a 3.0 baseline (which I'm hoping will be soon).

cc @sebastienros

sebastienros commented 5 years ago

@davidfowl did you see @pakrym ran these already ?

davidfowl commented 5 years ago

@sebastienros that's not good enough. I'd like us to do regular 3.0 runs against master so that we're not surprised anymore. Once we close down 2.2 and switch to 3.0 then we can take this change and see if it affects that baseline. I'm going to be super anal about changes we can't detect now 😄

muratg commented 5 years ago

Kestrel work has moved to aspnetcore repo, and this repo will be archived. If you're still interested in pursuing this PR, please move it over to aspnetcore. Thanks.