Closed JamesNK closed 2 months ago
If you are looking for help on this.. I would be glad to take it up
It would be my first open-source contribution
Thanks In Advance :)
@richiebijoy you'd be welcome to.
@BrennanConroy @JamesNK do you have a suggestion for a dictionary that would be a good candidate? (And presumably one that could have perf measured, if needed)
Not too sure. We should do a pass of all Dictionary usage in the repo and look for cases where we construct it once and don't readd to it.
One example I found was https://github.com/dotnet/aspnetcore/blob/ff2241b006bbe28d2b8f642028fca3ba18e8ec1a/src/Http/Routing/src/Matching/DictionaryJumpTable.cs#L13 where we construct the dictionary once, and I think it's only at startup which is ideal for the high cost of creating a frozen dictionary. But it should be double checked.
Thank You so much for this opportunity @danmoseley
I'll look at this example to start with . Thanks for your help @BrennanConroy
:)
Welcome to the repo @richiebijoy! Since you mentioned it'll be your first OSS contribution, if you need any support getting your first pull request going, let us know and one of us can help you.
Thank you so much for the warm welcome, @adityamandaleeka ! I really appreciate your offer to help me with my first pull request. It's great to see such a supportive and welcoming community here. I'll definitely reach out if I need any assistance.
I'm still working on it. Took me a while to set up and I ran into some issues with my C Drive during that time, But everything is fine now
Thanks 😄
Hi everyone. I have a quick question... Are the existing tests enough or do I need to create new ones?
I made the change in that one file and ran the existing tests. All passed.
I am guessing I should make each of these changes in separate PRs so that it is easier for the reviewer. @adityamandaleeka Could you please help me with this. Thanks In Advance :)
Given how expensive is to create a frozen dictionary, the change may end up regressing startup time. @richiebijoy did you run some benchmarks on current state vs frozen dictionary?
I should make each of these changes in separate PRs
I'd suggest having only one change in the first PR.
Just to warn you I expect that most of the work here is going to be performance validation. 🙂 You can do quick and dirty measurements first with Stopwatch to get an idea. Someone more familiar will have to suggest how to test startup properly and what scenario will realistically test it.
Are we assuming that after construction the lookups will be faster, or should we validate somehow? Obviously that's the assumption but FrozenDictionary is quite new and I don't think we have a sense of any performance gotchas yet. That would be good to understand before subsequent PR's.
@danmoseley Okay I'll do that and in the mean time I'll also look up on how to test start up properly. If you know anyone that can guide me in the right direction that would be a huge help. Thanks a lot 😃
@ilmax I will get back to you on this.
Hey @danmoseley & @ilmax
I noticed something, I ran our existing tests on the code without using the FrozenDictionary and I got the following results. If you notice when a read operation occurs along with the initialization of the dictionary it take 7ms.
Similarly when we implement FrozenDictionary the same test executes in less than 1ms
So overall this shows an improvement of at least 86% in time and 600% (or 6 times the initial performance) in terms of performance. Please let me know if this is enough or shall I look at some other metrics to evaluate.
Thanks In Advance :)
Routing jump tables are a good place to use this. They live a long time (usually the app's lifetime), and they're on the hot path of routing. There is DictionaryJumpTable
and also the HTTP method jump table:
The best place to measure improvement is in microbenchmarks. Fortunately, we have a lot of benchmarks around routing already:
Thanks @JamesNK
I will do this and get back to you. Currently at office, I'll look into this once I get back.
😃
Hi @JamesNK and @danmoseley
I ran the benchmark test JumpTableMultipleEntryBenchmark.cs
I am attaching the before and after files.
The before file is the test I ran before implementing FrozenDictionary in DictionaryJumpTable BenchMark_Before.txt and the after file is the test I ran after implementing FrozenDictionary in DictionaryJumpTable BenchMark_After.txt
Please let me know what you think of these results.
Thanks In Advance :)
@richiebijoy thanks, the tables are in markdown format so we usually paste them right in here rather than attaching. It's also great if you can add your analysis of them.
What would be even better is to use the Benchmark.NET feature to compare a baseline to a proposed change, so they appear in the same table, eg.,
Method | Runtime | N | Mean | Error | StdDev | Ratio |
---|---|---|---|---|---|---|
Sha256 | .NET 4.7.2 | 1000 | 7.735 us | 0.1913 us | 0.4034 us | 1.00 |
Sha256 | .NET Core 3.0 | 1000 | 3.989 us | 0.0796 us | 0.0745 us | 0.50 |
This one shows multiple runtimes, in your case you have multiple implementations. To do this of course you'll need to build the code both ways, into separate places, so it can test with them both in the same run.
Then you set up the comparison with Job attributes, defining Jobs imperatively, or simply by passing multiple paths on the command line (the first will be treated as baseline)
https://benchmarkdotnet.org/articles/configs/jobs.html https://benchmarkdotnet.org/articles/guides/console-args.html
Having said that -- that's what we did in dotnet/runtime. I don't know how it's typically done in this repo. I see that readme above doesn't say how. @BrennanConroy @JamesNK @Tratcher @sebastienros has anyone got a good workflow in aspnet for getting these ratio columns?
I am guessing most perf measurements in this repo are system-level, through Crank, not microbenchmarks, through Benchmark.NET, so this hasn't been documented.
Actually, I realize the best way forward here. Rather than using Benchmark.NET jobs to compare base/diff, which requires having both sets of binaries, just run the benchmarks in more or less the way you did -- with and without your change -- then use results comparer to look across all the benchmarks and flag any statistically significant changes.
should be all you need. We should document this specifically for ASP.NET.
cc @adamsitnik but I think this is the right info.
@danmoseley Wow this is a great tool. I will run the Becnhmark tests and compare using this tool and let you'll know what the results look like.
Thanks a lot :)
Hey @danmoseley & @adamsitnik
I ran the Bench Mark tests and then compared the results using the ResultsComparer. It took me a while to figure out how to run the ResultsComparer. Ran into some unknown errors that were solved after installing .NET 8 from the daily builds. I set the threshold at 2%. Please let me know if I should change that.
These are the results . summary: better: 2, geomean: 1.204 worse: 7, geomean: 1.102 total diff: 9
Slower | diff/base | Base Median (ns) | Diff Median (ns) | Modality |
---|---|---|---|---|
Microsoft.AspNetCore.Routing.Matching.JumpTableMultipleEntryBenchmark.Dictionary | 1.22 | 35.68 | 43.47 | several? |
Microsoft.AspNetCore.Routing.Matching.JumpTableMultipleEntryBenchmark.VectorTrie | 1.17 | 77.22 | 90.33 | several? |
Microsoft.AspNetCore.Routing.Matching.JumpTableMultipleEntryBenchmark.Trie(Count | 1.17 | 46.29 | 54.04 | several? |
Microsoft.AspNetCore.Routing.Matching.JumpTableMultipleEntryBenchmark.Dictionary | 1.10 | 36.40 | 40.09 | several? |
Microsoft.AspNetCore.Routing.Matching.JumpTableMultipleEntryBenchmark.Trie(Count | 1.08 | 114.04 | 122.91 | |
Microsoft.AspNetCore.Routing.Matching.JumpTableMultipleEntryBenchmark.VectorTrie | 1.04 | 32.29 | 33.57 | |
Microsoft.AspNetCore.Routing.Matching.JumpTableMultipleEntryBenchmark.Dictionary | 0.96 | 41.13 | 39.41 | bimodal |
Faster | base/diff | Base Median (ns) | Diff Median (ns) | Modality |
---|---|---|---|---|
Microsoft.AspNetCore.Routing.Matching.JumpTableMultipleEntryBenchmark.VectorTrie | 1.35 | 13.55 | 10.07 | bimodal |
Microsoft.AspNetCore.Routing.Matching.JumpTableMultipleEntryBenchmark.Trie(Count | 1.08 | 12.60 | 11.70 |
VectorTrie and Trie both pass the dictionary as parameters during their initialization. I think the FrozenDictionary is good for large collections. Please let me know what you think.
@richiebijoy can you push a branch to your fork showing the change you made?
I'm trying to interpret your results. I think I should expect to see 5 rows, one for each benchmark in JumpTableMultipleEntryBenchmark. And if you only replaced Dictionary with FrozenDictionary in DictionaryJumpTable, then I should expect base/diff = 1.00 roughtly for all the rows other than the Dictionary, row, right?
@danmoseley
I have pushed the changes to the following branch https://github.com/richiebijoy/aspnetcore/tree/FrozenDictionary
I think out of the 5 BenchMarks, there are absolutely no differences to 2 of them. 3 of them are influenced by the FrozenDictionary. So we are seeing results related to those 3 here.
Maybe I am missing something.
sorry, missed your comment.
direct link to your simple change -- https://github.com/dotnet/aspnetcore/compare/main...richiebijoy:aspnetcore:FrozenDictionary
I'm still a bit confused about the benchmark -- there should be 5 results for Dictionary -- 5 different sizes.
@BrennanConroy is this benchmark specifically https://github.com/danmoseley/aspnetcore/blob/d6d85eda7ed5b7a9c72ef9f36166f8ec9f850fd2/src/Http/Routing/perf/Microbenchmarks/Matching/JumpTableMultipleEntryBenchmark.cs#L104 representative enough that if it improves it's sufficient to merge this PR? Also, what is your workflow to run these vs baseline in the most convenient way? We can probably improve our docs.
is this benchmark specifically https://github.com/danmoseley/aspnetcore/blob/d6d85eda7ed5b7a9c72ef9f36166f8ec9f850fd2/src/Http/Routing/perf/Microbenchmarks/Matching/JumpTableMultipleEntryBenchmark.cs#L104 representative enough that if it improves it's sufficient to merge this PR?
It should be. I noticed that the PR doesn't set optimizeForReading
which could help perf.
I'm still a bit confused about the benchmark -- there should be 5 results for Dictionary -- 5 different sizes.
Same, that is odd. As well as the other benchmarks not being really close to 1.00 base/diff
Also, what is your workflow to run these vs baseline in the most convenient way?
Depends on what is being tested. If it's a single class without any major dependencies (dependencies on non public code in the repo) then it's simplest to just copy the class into the benchmark code and have both versions locally. e.g. DictionaryJumpTableCurrent
and DictionaryJumpTableNew
then add a new benchmark case that calls the new one and of course initialize it in the global setup func.
If it's a more complex case, like Kestrel Http2Connection tests, then I have to switch between branches to get both benchmarks and manually compare.
Gave this a try:
Current PR method with entries.ToFrozenDictionary :
Note, these numbers are slightly worse than with plain Dictionary |
Method | Count | Mean | Error | StdDev | Op/s | Gen 0 | Gen 1 | Gen 2 | Allocated |
---|---|---|---|---|---|---|---|---|---|---|
Dictionary | 2 | 21.90 ns | 0.100 ns | 0.084 ns | 45,655,772.2 | - | - | - | - | |
Dictionary | 5 | 22.58 ns | 0.293 ns | 0.259 ns | 44,293,206.1 | - | - | - | - | |
Dictionary | 10 | 22.83 ns | 0.163 ns | 0.136 ns | 43,797,501.4 | - | - | - | - | |
Dictionary | 25 | 21.63 ns | 0.125 ns | 0.111 ns | 46,225,786.2 | - | - | - | - | |
Dictionary | 50 | 22.76 ns | 0.237 ns | 0.210 ns | 43,942,348.6 | - | - | - | - | |
Dictionary | 100 | 22.93 ns | 0.159 ns | 0.149 ns | 43,615,623.6 | - | - | - | - |
Frozen dictionary with read optimized entries.ToDictionary(x => x.text, x => x.destination).ToFrozenDictionary(StringComparer.OrdinalIgnoreCase, true); |
Method | Count | Mean | Error | StdDev | Op/s | Gen 0 | Gen 1 | Gen 2 | Allocated |
---|---|---|---|---|---|---|---|---|---|---|
Dictionary | 2 | 12.92 ns | 0.256 ns | 0.227 ns | 77,425,450.0 | - | - | - | - | |
Dictionary | 5 | 13.92 ns | 0.053 ns | 0.044 ns | 71,861,659.2 | - | - | - | - | |
Dictionary | 10 | 12.27 ns | 0.065 ns | 0.058 ns | 81,521,206.6 | - | - | - | - | |
Dictionary | 25 | 12.52 ns | 0.064 ns | 0.060 ns | 79,887,219.8 | - | - | - | - | |
Dictionary | 50 | 12.76 ns | 0.043 ns | 0.040 ns | 78,343,779.9 | - | - | - | - | |
Dictionary | 100 | 13.22 ns | 0.081 ns | 0.071 ns | 75,665,602.1 | - | - | - | - |
Looks like the ToFrozenDictionary
method that takes the key and value mapper functions doesn't have a way to optimize for reading. So had to convert to dictionary first and use the method that does allow passing in optimizeForReading
.
It would also be useful to measure the startup perf difference when using FrozenDictionary.
Thanks @BrennanConroy . @richiebijoy do you want to look at startup perf as suggested? Might be easiest to just paste code into the benchmark code so you can measure both in one run. And add simple benchmarks in your file that just cover the initialization.
But pretty nice wins!
Also @richiebijoy if you like you could open an API proposal in dotnet/runtime (click new issue and choose the new API template) to propose the missing overload with optimize for read.
FrozenDictionary startup performance has increased a lot in the past couple months. I know we didn't measure before. Below is the current measurements.
TLDR; Slower by a couple hundred microseconds (1/1000 millisecond) when someone has 1000 routes. ~5-6 times more allocations (31kb -> 180kb for 1000 routes). ~60% faster. I think that's worth changing in DictionaryJumpTable
for 8.0. cc @davidfowl
Initialization cost: | Method | Count | Mean | Error | StdDev | Op/s | Ratio | RatioSD | Gen 0 | Gen 1 | Gen 2 | Allocated |
---|---|---|---|---|---|---|---|---|---|---|---|---|
Dictionary | 25 | 674.2 ns | 2.61 ns | 2.18 ns | 1,483,171.9 | 1.00 | 0.00 | 0.0153 | - | - | 944 B | |
Frozen | 25 | 5,198.7 ns | 9.46 ns | 7.90 ns | 192,356.1 | 7.71 | 0.02 | 0.0687 | - | - | 4,288 B | |
Baseline | 50 | 1,271.3 ns | 3.06 ns | 2.56 ns | 786,620.6 | 1.00 | 0.00 | 0.0286 | - | - | 1,784 B | |
Frozen | 50 | 15,965.4 ns | 53.45 ns | 47.38 ns | 62,635.5 | 12.56 | 0.04 | 0.1221 | - | - | 8,672 B | |
Baseline | 100 | 2,620.7 ns | 6.34 ns | 5.62 ns | 381,574.3 | 1.00 | 0.00 | 0.0496 | - | - | 3,128 B | |
Frozen | 100 | 29,286.5 ns | 72.82 ns | 60.81 ns | 34,145.4 | 11.17 | 0.04 | 0.2747 | - | - | 17,672 B | |
Baseline | 1000 | 28,037.1 ns | 181.38 ns | 169.66 ns | 35,667.0 | 1.00 | 0.00 | 0.4883 | 0.0305 | - | 31,016 B | |
Frozen | 1000 | 325,022.9 ns | 1,633.19 ns | 1,527.69 ns | 3,076.7 | 11.59 | 0.11 | 2.9297 | - | - | 180,088 B |
Newer per GetDestination
perf:
Method | Count | Mean | Error | StdDev | Op/s | Ratio | Gen 0 | Gen 1 | Gen 2 | Allocated |
---|---|---|---|---|---|---|---|---|---|---|
Dictionary | 25 | 260.49 ns | 3.013 ns | 2.818 ns | 38,388,891 | 1.00 | - | - | - | - |
FrozenDictionary | 25 | 89.00 ns | 0.796 ns | 0.744 ns | 112,353,403 | 0.34 | - | - | - | - |
Dictionary | 50 | 255.38 ns | 2.636 ns | 2.466 ns | 39,157,401 | 1.00 | - | - | - | - |
FrozenDictionary | 50 | 103.66 ns | 0.983 ns | 0.871 ns | 96,472,762 | 0.41 | - | - | - | - |
Dictionary | 100 | 258.09 ns | 0.667 ns | 0.592 ns | 38,745,970 | 1.00 | - | - | - | - |
FrozenDictionary | 100 | 86.72 ns | 0.406 ns | 0.339 ns | 115,308,799 | 0.34 | - | - | - | - |
Dictionary | 1000 | 255.99 ns | 0.476 ns | 0.422 ns | 39,064,169 | 1.00 | - | - | - | - |
FrozenDictionary | 1000 | 93.04 ns | 0.604 ns | 0.565 ns | 107,477,029 | 0.36 | - | - | - | - |
Is it possible / suggested to pick up further Dictionaries to be converted to Frozen? I would be interested in doing so.
We haven't done an exhaustive search of our code base to find good candidates. But there is a possible one in https://github.com/dotnet/aspnetcore/blob/main/src/Http/Routing/src/Matching/HttpMethodDictionaryPolicyJumpTable.cs that could be looked into.
I see HeaderDictionaryTypeExtensions and RouteValuesAddressScheme seems to have a few candidates.
Would you agree to updating these too?
@JamesNK I was expecting a few more places to cover, ie. the HeaderDictionaryTypeExtensions and RouteValuesAddressScheme seems to be a good candidate actually for FrozenDictionary, no? Would it make sense to keep this issue open, or there is no more appetite to freeze dictionaries?
A quick performance test for KnowValues in HeaderDictionaryTypeExtensions, with one single difference, I replaced the long?
implementation to return 0;
, so it shows clearly the overhead of the dictionary compared to the TryParse
operations in the other Func<>
s.
I table below indicates invoking Get with
long?
)The three implementations:
| Method | Mean | Error | StdDev | Gen0 | Allocated |
|------------------------------- |----------:|---------:|---------:|-------:|----------:|
| DictionaryDateTime | 176.85 ns | 2.361 ns | 1.972 ns | - | - |
| DictionaryNoOp | 15.91 ns | 0.344 ns | 0.410 ns | - | - |
| DictionaryMediaTypeHeaderValue | 76.23 ns | 1.552 ns | 2.175 ns | 0.0356 | 224 B |
| SwitchDateTime | 163.80 ns | 2.169 ns | 2.029 ns | - | - |
| SwitchNoOp | 12.45 ns | 0.266 ns | 0.296 ns | - | - |
| SwitchMediaTypeHeaderValue | 70.96 ns | 0.944 ns | 0.788 ns | 0.0356 | 224 B |
| FrozenDateTime | 171.26 ns | 3.341 ns | 3.282 ns | - | - |
| FrozenNoOp | 14.67 ns | 0.255 ns | 0.239 ns | - | - |
| FrozenMediaTypeHeaderValue | 72.22 ns | 1.413 ns | 1.387 ns | 0.0356 | 224 B |
This is the overhead for creating the dictionaries:
| Method | Mean | Error | StdDev | Gen0 | Gen1 | Allocated |
|----------------------- |---------:|--------:|--------:|-------:|-------:|----------:|
| CreateDictionary | 179.6 ns | 3.59 ns | 3.99 ns | 0.1581 | 0.0002 | 992 B |
| CreateFrozenDictionary | 581.9 ns | 5.18 ns | 4.85 ns | 0.3462 | 0.0010 | 2176 B |
Note, that the switch-case solution is free, as the Func<>
becomes compiled method, and the cost of the switch-case also happens at compile time.
Does it worth converting? I think it does in general, but the actual gains are amortized by the underlying TryParse operation.
@JamesNK and @BrennanConroy does it worth creating a PR?
@ladeak is it possible to push your branch to see what the switch one looks like?
Curious to see the switch approach too.
If the gains are very small, it probably isn't worth complicating this area. HeaderDictionaryTypeExtensions
isn't a performance-focused API. That doesn't mean we shouldn't make it faster, but the dial is tipped towards maintainability/readability instead of performance.
@danmoseley and @JamesNK : https://github.com/ladeak/aspnetcore/commit/11c9b3a76fdc85df379d08b0d75e9f3566d80d14
@JamesNK any suggestion, shall we use the proposed or simply a FrozenDictionary?
Calling reflection and switching based on the type name looks expensive. Have you tried the simplest pattern that just compares the types? Note that the comparison of typeofs is a JIT intrinsic and so it is very fast.
if (typeof(T) == typeof(CacheControlHeaderValue) {
...
}
else if (typeof(T) == typeof(ContentDispositionHeaderValue)) {
...
}
...
These are the results with the suggestion of @jkotas (named with ByType
prefix):
| Method | Mean | Error | StdDev | Gen0 | Allocated |
|--------------------------- |-----------:|----------:|----------:|-------:|----------:|
| ByTypeDateTime | 160.682 ns | 1.3177 ns | 1.1681 ns | - | - |
| ByTypeNoOp | 9.175 ns | 0.0818 ns | 0.0765 ns | - | - |
| ByTypeMediaTypeHeaderValue | 66.874 ns | 0.4164 ns | 0.3691 ns | 0.0356 | 224 B |
| FrozenDateTime | 167.591 ns | 2.0111 ns | 1.8812 ns | - | - |
| FrozenNoOp | 13.138 ns | 0.1188 ns | 0.1112 ns | - | - |
| FrozenMediaTypeHeaderValue | 69.858 ns | 0.6917 ns | 0.6132 ns | 0.0356 | 224 B |
This solution is the fastest overall.
I think we can close this issue now. Obviously this doesn't mean we can't add more frozen collection usage if we find places that could benefit, but the initial effort of adding usage is done.
Thanks everyone for the help, especially @richiebijoy and @ladeak!
Is there an existing issue for this?
Is your feature request related to a problem? Please describe the problem.
.NET 8 adds
FrozenDictionary
. It's a read-only dictionary with a higher creation cost but better read performance. Good for lookup dictionaries that live a long time.See https://github.com/dotnet/runtime/issues/67209
Describe the solution you'd like
Look to see where dictionaries are used today. Consider replacing them with
FrozenDictionary
.Additional context
No response