Closed MarkCiliaVincenti closed 3 months ago
Hi @MarkCiliaVincenti , good catch!
I was not sure about the perf though, meaning how much it would be faster on get VS how slower the initial setup would be, so I did some benchmarks which I'd like to share.
For Get/TryGet operations (which are more frequent than the initial setup) we can see the same allocations (ofcourse), but around -20% cpu time: nothing monumental, sure, but still quite nice.
Method | CachesCount | Rounds | Mean | Error | StdDev | P95 | Ratio | RatioSD | Gen0 | Allocated | Alloc Ratio |
---|---|---|---|---|---|---|---|---|---|---|---|
Dict | 1 | 1 | 25.53 ns | 0.193 ns | 0.171 ns | 25.74 ns | 1.00 | 0.00 | 0.0032 | 40 B | 1.00 |
FrozenDict | 1 | 1 | 21.61 ns | 0.438 ns | 0.431 ns | 22.36 ns | 0.85 | 0.02 | 0.0032 | 40 B | 1.00 |
Dict | 1 | 50 | 1,187.55 ns | 16.320 ns | 15.266 ns | 1,207.00 ns | 1.00 | 0.00 | 0.1583 | 2000 B | 1.00 |
FrozenDict | 1 | 50 | 1,025.46 ns | 18.075 ns | 16.023 ns | 1,053.19 ns | 0.86 | 0.02 | 0.1583 | 2000 B | 1.00 |
Dict | 10 | 1 | 265.67 ns | 4.642 ns | 4.115 ns | 272.17 ns | 1.00 | 0.00 | 0.0315 | 400 B | 1.00 |
FrozenDict | 10 | 1 | 237.32 ns | 3.382 ns | 3.163 ns | 241.99 ns | 0.89 | 0.02 | 0.0317 | 400 B | 1.00 |
Dict | 10 | 50 | 13,233.85 ns | 153.521 ns | 136.092 ns | 13,426.19 ns | 1.00 | 0.00 | 1.5869 | 20000 B | 1.00 |
FrozenDict | 10 | 50 | 10,680.41 ns | 150.454 ns | 140.734 ns | 10,842.57 ns | 0.81 | 0.02 | 1.5869 | 20000 B | 1.00 |
Dict | 50 | 1 | 1,345.35 ns | 10.495 ns | 9.304 ns | 1,361.36 ns | 1.00 | 0.00 | 0.1583 | 2000 B | 1.00 |
FrozenDict | 50 | 1 | 1,477.01 ns | 28.189 ns | 30.162 ns | 1,535.61 ns | 1.10 | 0.02 | 0.1583 | 2000 B | 1.00 |
Dict | 50 | 50 | 69,932.06 ns | 1,326.372 ns | 1,527.452 ns | 72,373.69 ns | 1.00 | 0.00 | 7.9346 | 100000 B | 1.00 |
FrozenDict | 50 | 50 | 57,328.07 ns | 592.618 ns | 554.335 ns | 58,219.32 ns | 0.82 | 0.02 | 7.9346 | 100000 B | 1.00 |
The setup is slower for the frozen one, basically 2 times slower:
Method | CachesCount | Mean | Error | StdDev | P95 | Ratio | RatioSD | Gen0 | Gen1 | Allocated | Alloc Ratio |
---|---|---|---|---|---|---|---|---|---|---|---|
Dict | 1 | 65.46 ns | 1.323 ns | 1.358 ns | 67.23 ns | 1.00 | 0.00 | 0.0293 | - | 368 B | 1.00 |
FrozenDict | 1 | 197.94 ns | 2.880 ns | 2.553 ns | 201.63 ns | 3.02 | 0.07 | 0.0598 | - | 752 B | 2.04 |
Dict | 10 | 613.80 ns | 8.810 ns | 8.241 ns | 624.51 ns | 1.00 | 0.00 | 0.1993 | 0.0010 | 2512 B | 1.00 |
FrozenDict | 10 | 1,437.42 ns | 13.200 ns | 12.347 ns | 1,453.61 ns | 2.34 | 0.04 | 0.3643 | 0.0019 | 4592 B | 1.83 |
Dict | 50 | 3,098.58 ns | 31.471 ns | 29.438 ns | 3,145.35 ns | 1.00 | 0.00 | 0.9727 | 0.0381 | 12224 B | 1.00 |
FrozenDict | 50 | 7,009.71 ns | 133.205 ns | 158.571 ns | 7,277.33 ns | 2.26 | 0.05 | 1.6251 | 0.0534 | 20448 B | 1.67 |
But we are talking about 4μs (note: micro, non milli) for 50 caches, so I think it doesn't really matter that much. Allocations are higher, but we are talking about very small numbers, and since get operations (way more frequent) would become faster, it seems like a nice win.
To get this perf boost a new dependency on System.Collections.Immutable is needed, and this is the part I don't particularly like: an extra dependency for such a small improvement. On one hand we are talking about a core one, not a random 3rd party one, but on the other hand if we consider scenarios like AOT + tree shaking, I'm not that sure it is worth the added dependency.
As of now I'm for merging the PR and just take the extra dependency, it doesn't seem like a very big deal.
Just give me some time to better research the extra dependency and make up my mind about it.
Meanwhile, thanks for the PR, it is really appreciated!
One thing you might want to do is make the dependency available only for pre .NET 8.0. If you're using net8 or later you don't need this dependency.
One thing you might want to do is make the dependency available only for pre .NET 8.0. If you're using net8 or later you don't need this dependency.
I thought about it, in fact I was just checking since which version it was not needed anymore.
I recently added multi-targeting to reduce dependencies in some TFMs, but adding another one is... eh, I don't know. What are your thoughts about the balance between one less dep for a specific TFM, and more explicit targets specified and, in the end, a bigger resulting dll?
I'm not sure but I don't think it actually results in a bigger dll, only a bigger NuGet package. Anyway I pushed another commit doing just that, see what you think about it. I would have it this way to be honest, it's neater.
I'm not sure but I don't think it actually results in a bigger dll, only a bigger NuGet package.
Yeah sorry I meant nuget package, not dll. This has been a rough week 😅
Anyway I pushed another commit doing just that, see what you think about it. I would have it this way to be honest, it's neater.
Yep, it looks nicer in fact. Give me some time to think about it and compare it with other projects, jus to make sure, but yeah it looks def nice.
Just to be sure I checked the DLL sizes and they remained 308KB between the main branch and this branch.
I wouldn't worry about NuGet package size to be honest. It's much better to drop hard dependencies. Just imagine someone might be using some new feature of .NET 9 (still in preview) from System.Collections.Immutable but after they bring in the dependency to 8.0.0 it would stop working. Or if they made an optimization for FrozenDictionary in .NET 9 you wouldn't be getting its advantage.
By the way, since we're talking about targeting, you'll likely want to put in a target for .NET 9.0 sooner or later. The reason is that .NET 9.0 provides System.Threading.Lock which could and should replace locking on object mutexes. It can make your code a bit messy, using #IF statements, but it's worth considering. It's a pity that Microsoft don't seem eager to backport it to earlier versions of .NET through NuGet.
Addendum to the above: I've seen this implementation but I'm not convinced it doesn't harm performance for < .NET 9.0 https://github.com/dotnet/winforms/pull/11841/files
Addendum to the above: I've seen this implementation but I'm not convinced it doesn't harm performance for < .NET 9.0 https://github.com/dotnet/winforms/pull/11841/files
This is an interesting trick.
Just imagine someone might be using some new feature of .NET 9 (still in preview) from System.Collections.Immutable but after they bring in the dependency to 8.0.0 it would stop working.
What do you mean? If in a project there are 2 refs to System.Collections.Immutable, one for 8.0.0 and one for 9.0.0, the higher one will be used so nothing would stop working.
Just imagine someone might be using some new feature of .NET 9 (still in preview) from System.Collections.Immutable but after they bring in the dependency to 8.0.0 it would stop working.
What do you mean? If in a project there are 2 refs to System.Collections.Immutable, one for 8.0.0 and one for 9.0.0, the higher one will be used so nothing would stop working.
I'm not sure I never tried it. I know that with NuGet packages they get consolidated to the higher version but not sure what happens when one is a package dependency and the other is coming from the framework itself.
I'm not sure I never tried it. I know that with NuGet packages they get consolidated to the higher version but not sure what happens when one is a package dependency and the other is coming from the framework itself.
Same thing, it's what happened before I added multi-targeting.
Hi @MarkCiliaVincenti I've made a little internal change (I just realized the name was not 100% correct, and will fix an additional thing later on) and merged the PR.
Thanks!
Brilliant @jodydonetti!
By the way, have you given any thoughts to #134 ?
Also re net9, I've created a micro library. https://github.com/MarkCiliaVincenti/Backport.System.Threading.Lock
This improves performance because FrozenDictionary is optimized for reading and is more performant than a normal Dictionary.