dotnet / aspnetcore

ASP.NET Core is a cross-platform .NET framework for building modern cloud-based web applications on Windows, Mac, or Linux.
https://asp.net
MIT License
35.38k stars 10k forks source link

ResponseCachingKeyProvider Performance Optimisations #10290

Open stevejgordon opened 5 years ago

stevejgordon commented 5 years ago

tl;dr; This is a query about whether the team would accept performance optimisations for the ResponseCachingKeyProvider & related components?

I've recently been playing around with String.Create for an upcoming blog post and I remembered a discussion with @rynowak about ObjectPool being used to pool StringBuilder instances for some of the ASP.NET Core middleware.

One example I found is in ResponseCachingKeyProvider

Just for my own curiosity, I had a play with a new version of CreateBaseKey which doesn't require a StringBuilder and uses String.Create instead. My theory is that with the other methods updated also (if possible) then maybe the need to depend on an ObjectPoolProvider and StringBuilderPool could be removed?

I did a very quick and basic version using String.Create and benchmarked it with the following results...

|         Method |     Mean |     Error |    StdDev |  Gen 0 | Gen 1 | Gen 2 | Allocated |
|--------------- |---------:|----------:|----------:|-------:|------:|------:|----------:|
| GetKeyOriginal | 435.5 ns | 8.2836 ns | 8.1356 ns | 0.0191 |     - |     - |     120 B |
|      GetKeyNew | 341.1 ns | 0.6806 ns | 0.5313 ns | 0.0191 |     - |     - |     120 B |

Is there any value in pursuing this for this provider and perhaps others as part of the on-going performance work? If so, I'd be keen to try and spend a little time looking deeper on this particular feature. Note that the code is untested and just a proof of concept at this point.

Perhaps @davidfowl and @benaadams also have some thoughts on the value of this?

For completeness, my rough code used for the benchmark:

public string CreateBaseKeyStringCreate(ResponseCachingContext context)
{
    if (context == null)
    {
        throw new ArgumentNullException(nameof(context));
    }

    var request = context.HttpContext.Request;

    var length = request.Method.Length +
                 1 +
                 request.Scheme.Length +
                 1 +
                 request.Host.Value.Length +
                 request.PathBase.Value.Length +
                 request.Path.Value.Length;

    var key = string.Create(length, (request, _options), (chars, state) => 
    {
        var (request, options) = state;
        var position = 0;

        request.Method.AsSpan().ToUpperInvariant(chars);
        position += request.Method.Length;

        chars[position++] = KeyDelimiter;

        request.Scheme.AsSpan().ToUpperInvariant(chars.Slice(position));
        position += request.Scheme.Length;

        chars[position++] = KeyDelimiter;

        request.Host.Value.AsSpan().ToUpperInvariant(chars.Slice(position));
        position += request.Host.Value.Length;

        var pathBaseSpan = request.PathBase.Value.AsSpan();
        var pathSpan = request.Path.Value.AsSpan();

        if (options.UseCaseSensitivePaths)
        {
            pathBaseSpan.ToUpperInvariant(chars.Slice(position));
            position += pathBaseSpan.Length;

            pathSpan.ToUpperInvariant(chars.Slice(position));
        }
        else
        {
            pathBaseSpan.CopyTo(chars.Slice(position));
            position += pathBaseSpan.Length;

            pathSpan.CopyTo(chars.Slice(position));
        }
    });

    return key;
}
rynowak commented 5 years ago

Generally perf improvments are something we want, especially if it's something that runs during request processing 👍

@Tratcher @anurse - who can help steve figure out if this is a good approach?

Tratcher commented 5 years ago

@JunTaoLuo

analogrelay commented 5 years ago

@rynowak is right, perf gains are generally awesome as long as they don't introduce large amounts of complexity (and even then it's a matter of risk/cost vs. reward). Switching from pooled StringBuilders to String.Create seems reasonable to me. I think the next step would be a PR to show the code change. If you don't want to iterate fully and produce a "production-ready" PR but could hack something together that shows the approach you could throw a Draft PR up and put me, @Tratcher, @davidfowl and @JunTaoLuo as reviewers.

Including a benchmark in the PR and the results of said benchmark would also be good in order to show the hard data that illustrates the improvement.

davidfowl commented 5 years ago

This looks pretty nice! We should avoid object pooling at all costs if we can find a simple alternative like this.

stevejgordon commented 5 years ago

Thanks, @anurse and co! I'm happy to take a look at this over the next week or so and get a PR pushed with some changes. I grabbed the code to a local solution just for this hack. I'll try to get the repo building properly if possible.

analogrelay commented 5 years ago

If you get stuck, feel free to open a draft PR and mention that you're stuck on some issue. As long as you don't need to add new projects it should be simple, adding new projects is non-trivial but not too hard.

JunTaoLuo commented 5 years ago

Looks good, I can review the PR when it's ready.

stevejgordon commented 5 years ago

@anurse, with regards to including benchmarks. Is there a standard preferred approach? Or would just including a console app for now be fine? Presumably can remove before merging.

analogrelay commented 5 years ago

We use BenchmarkDotNet. A microbenchmark would probably be sufficient here. If there isn't already a project near the response caching project that's a bit unfortunate. Here's an example of one: https://github.com/aspnet/AspNetCore/tree/master/src/Http/perf/Microbenchmarks

stevejgordon commented 5 years ago

I'll add a perf folder and a BenchmarkDotNet project for this.

stevejgordon commented 5 years ago

A small update on this. I've progressed with adding a benchmark project and an optimised version of the code. The saving for CreateBaseKeyStringCreate is evident but the gain is a little less than my original test.

I'm working through a version of CreateStorageVaryByKey too which so far allocates a little less, but is slower as there's more upfront work to calculate the length. I'm going to play with that some more and see if there are better options.

I'll be on vacation for a week or so from this weekend so I'll likely spend more time on this when I'm back.

analogrelay commented 5 years ago

Putting this in the backlog since we don't plan to do optimization ourselves here in 3.0. Definitely still interested in a PR any time you're ready!