dotnet / aspnetcore

ASP.NET Core is a cross-platform .NET framework for building modern cloud-based web applications on Windows, Mac, or Linux.
https://asp.net
MIT License
34.92k stars 9.87k forks source link

Use FrozenDictionary where appropriate #47569

Closed JamesNK closed 2 months ago

JamesNK commented 1 year ago

Is there an existing issue for this?

Is your feature request related to a problem? Please describe the problem.

.NET 8 adds FrozenDictionary. It's a read-only dictionary with a higher creation cost but better read performance. Good for lookup dictionaries that live a long time.

See https://github.com/dotnet/runtime/issues/67209

Describe the solution you'd like

Look to see where dictionaries are used today. Consider replacing them with FrozenDictionary.

Additional context

No response

richiebijoy commented 1 year ago

If you are looking for help on this.. I would be glad to take it up

It would be my first open-source contribution

Thanks In Advance :)

danmoseley commented 1 year ago

@richiebijoy you'd be welcome to.

@BrennanConroy @JamesNK do you have a suggestion for a dictionary that would be a good candidate? (And presumably one that could have perf measured, if needed)

BrennanConroy commented 1 year ago

Not too sure. We should do a pass of all Dictionary usage in the repo and look for cases where we construct it once and don't readd to it.

One example I found was https://github.com/dotnet/aspnetcore/blob/ff2241b006bbe28d2b8f642028fca3ba18e8ec1a/src/Http/Routing/src/Matching/DictionaryJumpTable.cs#L13 where we construct the dictionary once, and I think it's only at startup which is ideal for the high cost of creating a frozen dictionary. But it should be double checked.

richiebijoy commented 1 year ago

Thank You so much for this opportunity @danmoseley

I'll look at this example to start with . Thanks for your help @BrennanConroy

:)

adityamandaleeka commented 1 year ago

Welcome to the repo @richiebijoy! Since you mentioned it'll be your first OSS contribution, if you need any support getting your first pull request going, let us know and one of us can help you.

richiebijoy commented 1 year ago

Thank you so much for the warm welcome, @adityamandaleeka ! I really appreciate your offer to help me with my first pull request. It's great to see such a supportive and welcoming community here. I'll definitely reach out if I need any assistance.

I'm still working on it. Took me a while to set up and I ran into some issues with my C Drive during that time, But everything is fine now

Thanks 😄

richiebijoy commented 1 year ago

Hi everyone. I have a quick question... Are the existing tests enough or do I need to create new ones?

richiebijoy commented 1 year ago

I made the change in that one file and ran the existing tests. All passed.

image

I am guessing I should make each of these changes in separate PRs so that it is easier for the reviewer. @adityamandaleeka Could you please help me with this. Thanks In Advance :)

ilmax commented 1 year ago

Given how expensive is to create a frozen dictionary, the change may end up regressing startup time. @richiebijoy did you run some benchmarks on current state vs frozen dictionary?

danmoseley commented 1 year ago

I should make each of these changes in separate PRs

I'd suggest having only one change in the first PR.

Just to warn you I expect that most of the work here is going to be performance validation. 🙂 You can do quick and dirty measurements first with Stopwatch to get an idea. Someone more familiar will have to suggest how to test startup properly and what scenario will realistically test it.

Are we assuming that after construction the lookups will be faster, or should we validate somehow? Obviously that's the assumption but FrozenDictionary is quite new and I don't think we have a sense of any performance gotchas yet. That would be good to understand before subsequent PR's.

richiebijoy commented 1 year ago

@danmoseley Okay I'll do that and in the mean time I'll also look up on how to test start up properly. If you know anyone that can guide me in the right direction that would be a huge help. Thanks a lot 😃

richiebijoy commented 1 year ago

@ilmax I will get back to you on this.

richiebijoy commented 1 year ago

Hey @danmoseley & @ilmax

I noticed something, I ran our existing tests on the code without using the FrozenDictionary and I got the following results. If you notice when a read operation occurs along with the initialization of the dictionary it take 7ms.

Before

Similarly when we implement FrozenDictionary the same test executes in less than 1ms

After

So overall this shows an improvement of at least 86% in time and 600% (or 6 times the initial performance) in terms of performance. Please let me know if this is enough or shall I look at some other metrics to evaluate.

Thanks In Advance :)

JamesNK commented 1 year ago

Routing jump tables are a good place to use this. They live a long time (usually the app's lifetime), and they're on the hot path of routing. There is DictionaryJumpTable and also the HTTP method jump table:

https://github.com/dotnet/aspnetcore/blob/ff2241b006bbe28d2b8f642028fca3ba18e8ec1a/src/Http/Routing/src/Matching/HttpMethodDictionaryPolicyJumpTable.cs#L11-L13

The best place to measure improvement is in microbenchmarks. Fortunately, we have a lot of benchmarks around routing already:

Microbenchmark read me

richiebijoy commented 1 year ago

Thanks @JamesNK

I will do this and get back to you. Currently at office, I'll look into this once I get back.

😃

richiebijoy commented 1 year ago

Hi @JamesNK and @danmoseley

I ran the benchmark test JumpTableMultipleEntryBenchmark.cs

I am attaching the before and after files.

The before file is the test I ran before implementing FrozenDictionary in DictionaryJumpTable BenchMark_Before.txt and the after file is the test I ran after implementing FrozenDictionary in DictionaryJumpTable BenchMark_After.txt

Please let me know what you think of these results.

Thanks In Advance :)

danmoseley commented 1 year ago

@richiebijoy thanks, the tables are in markdown format so we usually paste them right in here rather than attaching. It's also great if you can add your analysis of them.

Aside

What would be even better is to use the Benchmark.NET feature to compare a baseline to a proposed change, so they appear in the same table, eg.,

Method Runtime N Mean Error StdDev Ratio
Sha256 .NET 4.7.2 1000 7.735 us 0.1913 us 0.4034 us 1.00
Sha256 .NET Core 3.0 1000 3.989 us 0.0796 us 0.0745 us 0.50

This one shows multiple runtimes, in your case you have multiple implementations. To do this of course you'll need to build the code both ways, into separate places, so it can test with them both in the same run.

Then you set up the comparison with Job attributes, defining Jobs imperatively, or simply by passing multiple paths on the command line (the first will be treated as baseline)

https://benchmarkdotnet.org/articles/configs/jobs.html https://benchmarkdotnet.org/articles/guides/console-args.html

Having said that -- that's what we did in dotnet/runtime. I don't know how it's typically done in this repo. I see that readme above doesn't say how. @BrennanConroy @JamesNK @Tratcher @sebastienros has anyone got a good workflow in aspnet for getting these ratio columns?

I am guessing most perf measurements in this repo are system-level, through Crank, not microbenchmarks, through Benchmark.NET, so this hasn't been documented.

danmoseley commented 1 year ago

Actually, I realize the best way forward here. Rather than using Benchmark.NET jobs to compare base/diff, which requires having both sets of binaries, just run the benchmarks in more or less the way you did -- with and without your change -- then use results comparer to look across all the benchmarks and flag any statistically significant changes.

https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md#preventing-regressions

https://github.com/dotnet/performance/blob/72514fa1b2e3cf321b05e6af885aee87580437f0/src/tools/ResultsComparer/README.md#L1

https://github.com/dotnet/performance/tree/72514fa1b2e3cf321b05e6af885aee87580437f0/src/tools/ResultsComparer

should be all you need. We should document this specifically for ASP.NET.

danmoseley commented 1 year ago

cc @adamsitnik but I think this is the right info.

richiebijoy commented 1 year ago

@danmoseley Wow this is a great tool. I will run the Becnhmark tests and compare using this tool and let you'll know what the results look like.

Thanks a lot :)

richiebijoy commented 1 year ago

Hey @danmoseley & @adamsitnik

I ran the Bench Mark tests and then compared the results using the ResultsComparer. It took me a while to figure out how to run the ResultsComparer. Ran into some unknown errors that were solved after installing .NET 8 from the daily builds. I set the threshold at 2%. Please let me know if I should change that.

These are the results . summary: better: 2, geomean: 1.204 worse: 7, geomean: 1.102 total diff: 9

Slower diff/base Base Median (ns) Diff Median (ns) Modality
Microsoft.AspNetCore.Routing.Matching.JumpTableMultipleEntryBenchmark.Dictionary 1.22 35.68 43.47 several?
Microsoft.AspNetCore.Routing.Matching.JumpTableMultipleEntryBenchmark.VectorTrie 1.17 77.22 90.33 several?
Microsoft.AspNetCore.Routing.Matching.JumpTableMultipleEntryBenchmark.Trie(Count 1.17 46.29 54.04 several?
Microsoft.AspNetCore.Routing.Matching.JumpTableMultipleEntryBenchmark.Dictionary 1.10 36.40 40.09 several?
Microsoft.AspNetCore.Routing.Matching.JumpTableMultipleEntryBenchmark.Trie(Count 1.08 114.04 122.91
Microsoft.AspNetCore.Routing.Matching.JumpTableMultipleEntryBenchmark.VectorTrie 1.04 32.29 33.57
Microsoft.AspNetCore.Routing.Matching.JumpTableMultipleEntryBenchmark.Dictionary 0.96 41.13 39.41 bimodal
Faster base/diff Base Median (ns) Diff Median (ns) Modality
Microsoft.AspNetCore.Routing.Matching.JumpTableMultipleEntryBenchmark.VectorTrie 1.35 13.55 10.07 bimodal
Microsoft.AspNetCore.Routing.Matching.JumpTableMultipleEntryBenchmark.Trie(Count 1.08 12.60 11.70

VectorTrie and Trie both pass the dictionary as parameters during their initialization. I think the FrozenDictionary is good for large collections. Please let me know what you think.

danmoseley commented 1 year ago

@richiebijoy can you push a branch to your fork showing the change you made?

I'm trying to interpret your results. I think I should expect to see 5 rows, one for each benchmark in JumpTableMultipleEntryBenchmark. And if you only replaced Dictionary with FrozenDictionary in DictionaryJumpTable, then I should expect base/diff = 1.00 roughtly for all the rows other than the Dictionary, row, right?

richiebijoy commented 1 year ago

@danmoseley

I have pushed the changes to the following branch https://github.com/richiebijoy/aspnetcore/tree/FrozenDictionary

I think out of the 5 BenchMarks, there are absolutely no differences to 2 of them. 3 of them are influenced by the FrozenDictionary. So we are seeing results related to those 3 here.

Maybe I am missing something.

danmoseley commented 1 year ago

sorry, missed your comment.

direct link to your simple change -- https://github.com/dotnet/aspnetcore/compare/main...richiebijoy:aspnetcore:FrozenDictionary

I'm still a bit confused about the benchmark -- there should be 5 results for Dictionary -- 5 different sizes.

@BrennanConroy is this benchmark specifically https://github.com/danmoseley/aspnetcore/blob/d6d85eda7ed5b7a9c72ef9f36166f8ec9f850fd2/src/Http/Routing/perf/Microbenchmarks/Matching/JumpTableMultipleEntryBenchmark.cs#L104 representative enough that if it improves it's sufficient to merge this PR? Also, what is your workflow to run these vs baseline in the most convenient way? We can probably improve our docs.

BrennanConroy commented 1 year ago

is this benchmark specifically https://github.com/danmoseley/aspnetcore/blob/d6d85eda7ed5b7a9c72ef9f36166f8ec9f850fd2/src/Http/Routing/perf/Microbenchmarks/Matching/JumpTableMultipleEntryBenchmark.cs#L104 representative enough that if it improves it's sufficient to merge this PR?

It should be. I noticed that the PR doesn't set optimizeForReading which could help perf.

I'm still a bit confused about the benchmark -- there should be 5 results for Dictionary -- 5 different sizes.

Same, that is odd. As well as the other benchmarks not being really close to 1.00 base/diff

Also, what is your workflow to run these vs baseline in the most convenient way?

Depends on what is being tested. If it's a single class without any major dependencies (dependencies on non public code in the repo) then it's simplest to just copy the class into the benchmark code and have both versions locally. e.g. DictionaryJumpTableCurrent and DictionaryJumpTableNew then add a new benchmark case that calls the new one and of course initialize it in the global setup func.

If it's a more complex case, like Kestrel Http2Connection tests, then I have to switch between branches to get both benchmarks and manually compare.

BrennanConroy commented 1 year ago
Gave this a try: Current PR method with entries.ToFrozenDictionary: Note, these numbers are slightly worse than with plain Dictionary Method Count Mean Error StdDev Op/s Gen 0 Gen 1 Gen 2 Allocated
Dictionary 2 21.90 ns 0.100 ns 0.084 ns 45,655,772.2 - - - -
Dictionary 5 22.58 ns 0.293 ns 0.259 ns 44,293,206.1 - - - -
Dictionary 10 22.83 ns 0.163 ns 0.136 ns 43,797,501.4 - - - -
Dictionary 25 21.63 ns 0.125 ns 0.111 ns 46,225,786.2 - - - -
Dictionary 50 22.76 ns 0.237 ns 0.210 ns 43,942,348.6 - - - -
Dictionary 100 22.93 ns 0.159 ns 0.149 ns 43,615,623.6 - - - -

Frozen dictionary with read optimized entries.ToDictionary(x => x.text, x => x.destination).ToFrozenDictionary(StringComparer.OrdinalIgnoreCase, true); Method Count Mean Error StdDev Op/s Gen 0 Gen 1 Gen 2 Allocated
Dictionary 2 12.92 ns 0.256 ns 0.227 ns 77,425,450.0 - - - -
Dictionary 5 13.92 ns 0.053 ns 0.044 ns 71,861,659.2 - - - -
Dictionary 10 12.27 ns 0.065 ns 0.058 ns 81,521,206.6 - - - -
Dictionary 25 12.52 ns 0.064 ns 0.060 ns 79,887,219.8 - - - -
Dictionary 50 12.76 ns 0.043 ns 0.040 ns 78,343,779.9 - - - -
Dictionary 100 13.22 ns 0.081 ns 0.071 ns 75,665,602.1 - - - -

Looks like the ToFrozenDictionary method that takes the key and value mapper functions doesn't have a way to optimize for reading. So had to convert to dictionary first and use the method that does allow passing in optimizeForReading.

It would also be useful to measure the startup perf difference when using FrozenDictionary.

danmoseley commented 1 year ago

Thanks @BrennanConroy . @richiebijoy do you want to look at startup perf as suggested? Might be easiest to just paste code into the benchmark code so you can measure both in one run. And add simple benchmarks in your file that just cover the initialization.

But pretty nice wins!

Also @richiebijoy if you like you could open an API proposal in dotnet/runtime (click new issue and choose the new API template) to propose the missing overload with optimize for read.

BrennanConroy commented 1 year ago

FrozenDictionary startup performance has increased a lot in the past couple months. I know we didn't measure before. Below is the current measurements.

TLDR; Slower by a couple hundred microseconds (1/1000 millisecond) when someone has 1000 routes. ~5-6 times more allocations (31kb -> 180kb for 1000 routes). ~60% faster. I think that's worth changing in DictionaryJumpTable for 8.0. cc @davidfowl

Initialization cost: Method Count Mean Error StdDev Op/s Ratio RatioSD Gen 0 Gen 1 Gen 2 Allocated
Dictionary 25 674.2 ns 2.61 ns 2.18 ns 1,483,171.9 1.00 0.00 0.0153 - - 944 B
Frozen 25 5,198.7 ns 9.46 ns 7.90 ns 192,356.1 7.71 0.02 0.0687 - - 4,288 B
Baseline 50 1,271.3 ns 3.06 ns 2.56 ns 786,620.6 1.00 0.00 0.0286 - - 1,784 B
Frozen 50 15,965.4 ns 53.45 ns 47.38 ns 62,635.5 12.56 0.04 0.1221 - - 8,672 B
Baseline 100 2,620.7 ns 6.34 ns 5.62 ns 381,574.3 1.00 0.00 0.0496 - - 3,128 B
Frozen 100 29,286.5 ns 72.82 ns 60.81 ns 34,145.4 11.17 0.04 0.2747 - - 17,672 B
Baseline 1000 28,037.1 ns 181.38 ns 169.66 ns 35,667.0 1.00 0.00 0.4883 0.0305 - 31,016 B
Frozen 1000 325,022.9 ns 1,633.19 ns 1,527.69 ns 3,076.7 11.59 0.11 2.9297 - - 180,088 B

Newer per GetDestination perf:

Method Count Mean Error StdDev Op/s Ratio Gen 0 Gen 1 Gen 2 Allocated
Dictionary 25 260.49 ns 3.013 ns 2.818 ns 38,388,891 1.00 - - - -
FrozenDictionary 25 89.00 ns 0.796 ns 0.744 ns 112,353,403 0.34 - - - -
Dictionary 50 255.38 ns 2.636 ns 2.466 ns 39,157,401 1.00 - - - -
FrozenDictionary 50 103.66 ns 0.983 ns 0.871 ns 96,472,762 0.41 - - - -
Dictionary 100 258.09 ns 0.667 ns 0.592 ns 38,745,970 1.00 - - - -
FrozenDictionary 100 86.72 ns 0.406 ns 0.339 ns 115,308,799 0.34 - - - -
Dictionary 1000 255.99 ns 0.476 ns 0.422 ns 39,064,169 1.00 - - - -
FrozenDictionary 1000 93.04 ns 0.604 ns 0.565 ns 107,477,029 0.36 - - - -
ladeak commented 9 months ago

Is it possible / suggested to pick up further Dictionaries to be converted to Frozen? I would be interested in doing so.

BrennanConroy commented 9 months ago

We haven't done an exhaustive search of our code base to find good candidates. But there is a possible one in https://github.com/dotnet/aspnetcore/blob/main/src/Http/Routing/src/Matching/HttpMethodDictionaryPolicyJumpTable.cs that could be looked into.

ladeak commented 8 months ago

I see HeaderDictionaryTypeExtensions and RouteValuesAddressScheme seems to have a few candidates.

Would you agree to updating these too?

ladeak commented 8 months ago

@JamesNK I was expecting a few more places to cover, ie. the HeaderDictionaryTypeExtensions and RouteValuesAddressScheme seems to be a good candidate actually for FrozenDictionary, no? Would it make sense to keep this issue open, or there is no more appetite to freeze dictionaries?

ladeak commented 8 months ago

A quick performance test for KnowValues in HeaderDictionaryTypeExtensions, with one single difference, I replaced the long? implementation to return 0;, so it shows clearly the overhead of the dictionary compared to the TryParse operations in the other Func<>s.

I table below indicates invoking Get with

The three implementations:

| Method                         | Mean      | Error    | StdDev   | Gen0   | Allocated |
|------------------------------- |----------:|---------:|---------:|-------:|----------:|
| DictionaryDateTime             | 176.85 ns | 2.361 ns | 1.972 ns |      - |         - |
| DictionaryNoOp                 |  15.91 ns | 0.344 ns | 0.410 ns |      - |         - |
| DictionaryMediaTypeHeaderValue |  76.23 ns | 1.552 ns | 2.175 ns | 0.0356 |     224 B |
| SwitchDateTime                 | 163.80 ns | 2.169 ns | 2.029 ns |      - |         - |
| SwitchNoOp                     |  12.45 ns | 0.266 ns | 0.296 ns |      - |         - |
| SwitchMediaTypeHeaderValue     |  70.96 ns | 0.944 ns | 0.788 ns | 0.0356 |     224 B |
| FrozenDateTime                 | 171.26 ns | 3.341 ns | 3.282 ns |      - |         - |
| FrozenNoOp                     |  14.67 ns | 0.255 ns | 0.239 ns |      - |         - |
| FrozenMediaTypeHeaderValue     |  72.22 ns | 1.413 ns | 1.387 ns | 0.0356 |     224 B |

This is the overhead for creating the dictionaries:

| Method                 | Mean     | Error   | StdDev  | Gen0   | Gen1   | Allocated |
|----------------------- |---------:|--------:|--------:|-------:|-------:|----------:|
| CreateDictionary       | 179.6 ns | 3.59 ns | 3.99 ns | 0.1581 | 0.0002 |     992 B |
| CreateFrozenDictionary | 581.9 ns | 5.18 ns | 4.85 ns | 0.3462 | 0.0010 |    2176 B |

Note, that the switch-case solution is free, as the Func<> becomes compiled method, and the cost of the switch-case also happens at compile time.

Does it worth converting? I think it does in general, but the actual gains are amortized by the underlying TryParse operation.

@JamesNK and @BrennanConroy does it worth creating a PR?

danmoseley commented 8 months ago

@ladeak is it possible to push your branch to see what the switch one looks like?

JamesNK commented 8 months ago

Curious to see the switch approach too.

If the gains are very small, it probably isn't worth complicating this area. HeaderDictionaryTypeExtensions isn't a performance-focused API. That doesn't mean we shouldn't make it faster, but the dial is tipped towards maintainability/readability instead of performance.

ladeak commented 8 months ago

@danmoseley and @JamesNK : https://github.com/ladeak/aspnetcore/commit/11c9b3a76fdc85df379d08b0d75e9f3566d80d14

ladeak commented 8 months ago

@JamesNK any suggestion, shall we use the proposed or simply a FrozenDictionary?

jkotas commented 8 months ago

Calling reflection and switching based on the type name looks expensive. Have you tried the simplest pattern that just compares the types? Note that the comparison of typeofs is a JIT intrinsic and so it is very fast.

if (typeof(T) == typeof(CacheControlHeaderValue) {
    ...
}
else if (typeof(T) == typeof(ContentDispositionHeaderValue)) {
   ...
}
...
ladeak commented 8 months ago

These are the results with the suggestion of @jkotas (named with ByType prefix):

| Method                     | Mean       | Error     | StdDev    | Gen0   | Allocated |
|--------------------------- |-----------:|----------:|----------:|-------:|----------:|
| ByTypeDateTime             | 160.682 ns | 1.3177 ns | 1.1681 ns |      - |         - |
| ByTypeNoOp                 |   9.175 ns | 0.0818 ns | 0.0765 ns |      - |         - |
| ByTypeMediaTypeHeaderValue |  66.874 ns | 0.4164 ns | 0.3691 ns | 0.0356 |     224 B |
| FrozenDateTime             | 167.591 ns | 2.0111 ns | 1.8812 ns |      - |         - |
| FrozenNoOp                 |  13.138 ns | 0.1188 ns | 0.1112 ns |      - |         - |
| FrozenMediaTypeHeaderValue |  69.858 ns | 0.6917 ns | 0.6132 ns | 0.0356 |     224 B |

This solution is the fastest overall.

BrennanConroy commented 2 months ago

I think we can close this issue now. Obviously this doesn't mean we can't add more frozen collection usage if we find places that could benefit, but the initial effort of adding usage is done.

Thanks everyone for the help, especially @richiebijoy and @ladeak!