dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15k stars 4.67k forks source link

Iterating with ForEach over Immutable collections is very slow #29092

Open adamsitnik opened 5 years ago

adamsitnik commented 5 years ago

How to run the benchmarks:

git clone https://github.com/dotnet/performance.git
# if you have .NET Core 3.0 installed
dotnet run -c Release -f netcoreapp3.0 -p .\performance\src\benchmarks\micro\MicroBenchmarks.csproj --filter *IterateForEach<* --join
# if you don't have .NET Core 3.0 installed
py .\performance\scripts\benchmarks_ci.py -f netcoreapp3.0 --filter *IterateForEach<* --bdn-arguments="--join true"
Type Method Size Mean
IterateForEach<Int32> Array 512 222.1 ns
IterateForEach<String> Array 512 216.8 ns
IterateForEach<Int32> Span 512 213.7 ns
IterateForEach<String> Span 512 230.2 ns
IterateForEach<Int32> ReadOnlySpan 512 210.6 ns
IterateForEach<String> ReadOnlySpan 512 227.8 ns
IterateForEach<Int32> IEnumerable 512 2,247.3 ns
IterateForEach<String> IEnumerable 512 2,465.8 ns
IterateForEach<Int32> List 512 1,125.1 ns
IterateForEach<String> List 512 2,401.7 ns
IterateForEach<Int32> LinkedList 512 1,972.2 ns
IterateForEach<String> LinkedList 512 3,284.3 ns
IterateForEach<Int32> HashSet 512 1,565.0 ns
IterateForEach<String> HashSet 512 2,344.5 ns
IterateForEach<Int32> Dictionary 512 3,432.4 ns
IterateForEach<String> Dictionary 512 3,159.9 ns
IterateForEach<Int32> Queue 512 1,539.8 ns
IterateForEach<String> Queue 512 3,834.0 ns
IterateForEach<Int32> Stack 512 1,806.3 ns
IterateForEach<String> Stack 512 3,876.8 ns
IterateForEach<Int32> SortedList 512 5,771.9 ns
IterateForEach<String> SortedList 512 7,221.3 ns
IterateForEach<Int32> SortedSet 512 8,825.5 ns
IterateForEach<String> SortedSet 512 9,210.3 ns
IterateForEach<Int32> SortedDictionary 512 9,784.2 ns
IterateForEach<String> SortedDictionary 512 13,343.0 ns
IterateForEach<Int32> ConcurrentDictionary 512 15,821.2 ns
IterateForEach<String> ConcurrentDictionary 512 21,915.9 ns
IterateForEach<Int32> ConcurrentQueue 512 4,228.2 ns
IterateForEach<String> ConcurrentQueue 512 5,546.4 ns
IterateForEach<Int32> ConcurrentStack 512 3,286.0 ns
IterateForEach<String> ConcurrentStack 512 4,345.6 ns
IterateForEach<Int32> ConcurrentBag 512 2,682.9 ns
IterateForEach<String> ConcurrentBag 512 4,924.7 ns
IterateForEach<Int32> ImmutableArray 512 1,137.9 ns
IterateForEach<String> ImmutableArray 512 1,137.7 ns
IterateForEach<Int32> ImmutableDictionary 512 57,456.2 ns
IterateForEach<String> ImmutableDictionary 512 88,192.5 ns
IterateForEach<Int32> ImmutableHashSet 512 66,958.3 ns
IterateForEach<String> ImmutableHashSet 512 107,369.1 ns
IterateForEach<Int32> ImmutableList 512 25,530.7 ns
IterateForEach<String> ImmutableList 512 45,287.9 ns
IterateForEach<Int32> ImmutableQueue 512 4,073.7 ns
IterateForEach<String> ImmutableQueue 512 4,510.7 ns
IterateForEach<Int32> ImmutableStack 512 3,891.0 ns
IterateForEach<String> ImmutableStack 512 4,181.2 ns
IterateForEach<Int32> ImmutableSortedDictionary 512 27,616.2 ns
IterateForEach<String> ImmutableSortedDictionary 512 44,664.0 ns
IterateForEach<Int32> ImmutableSortedSet 512 27,055.9 ns
IterateForEach<String> ImmutableSortedSet 512 45,142.9 ns

Full docs for the new benchmarking workflow: https://github.com/dotnet/performance/blob/master/docs/benchmarking-workflow-corefx.md

danmoseley commented 5 years ago

Do we allow benchmarks in the perf repo for corefxlab types? (I'm curious where DictionarySlim fits in this)

It's interesting that enumerating HashSet is signficantly faster than Dictionary. And the size is sufficiently small, that I would think (?) the better locality (smaller Entries) would not be relevant.

adamsitnik commented 5 years ago

Do we allow benchmarks in the perf repo for corefxlab types?

It would prolong the time it takes to run all the benchmarks. Also, the main focus in the perf repo is on the things we ship.

But we have a BDN based benchmarking solution in corefxlab which can be used to compare the perf https://github.com/dotnet/corefxlab/tree/master/tests/Benchmarks

yahorsi commented 5 years ago

Wait, iterating ofer Array is SO faster than over List? >5x times faster for Int's and omg 10 times for strings? Hell List is the base collection used basically 90% of the time, e.g. when we query database and neeed results. If List is Array based why it is so slower. Don't you think this is actually bigger issue than Immutable ? I beleive List is used and iterated in the 100% of .NET programs unlike Immutable collections

huoyaoyuan commented 5 years ago

@yahorsi List iterator must check that you haven't modified the list during the loop. Only rust-style immutable/mutable ownership checking can solve this gracefully. C++ chooses for speed but lacks safety.

PiotrKowalski93 commented 5 years ago

It seems like interesting topic for me. Can work on it? I mean, I want to be a contributor but i have never worked on OpenSource. I can work in cooperation too :)

Wraith2 commented 5 years ago

@PiotrKowalski93 no permission needed just go for it. The code standard here are high so be prepared to be able to justify and measure everything but improvements are always welcome.

PiotrKowalski93 commented 5 years ago

Great, I wanted to debug the code but I hit the wall. Build went great. I wanted to see how the code is working in System.Collections.Immutable so I created .sln with simple console app in which I create ImmutableList and iterate through it.

Separately when I open solution System.Collections.Immutable.sln i can build code etc. but when I add System.Collections.Immutable.csproj to my "testing" .sln i see:

The TargetFramework value '' was not recognized. It may be misspelled. If not, then the TargetFrameworkIdentifier and/or TargetFrameworkVersion properties must be specified explicitly.

And indeed it is not specified and in properties of csproj i see empty list in TargetFramework property.

How can Debug this specific dll?

Wraith2 commented 5 years ago

the build system relies on a lot of complex logic so you won't be able to easily get a project that needs it to compile without it. I'd suggest you take the code logic from the collections you want to debug and add that code directly to your project then debug it that way.

PiotrKowalski93 commented 5 years ago

@Wraith2 Thank you, fair point, I will do that

simonthum commented 3 years ago

Hi, I just saw this issue after investigating observed slowness of immutable collection iteration. I used benchmarkdotnet, I share the numbers below, but the first part (no parallel prefix) is similar to the opening table. I only use Int32 collections.

What really worries me is how much slower the parallel code (explicit Task.Run with #CPUs) seems to be.

I would have expected this to show the benefit of concurrent/immutable collections as there is no locking, but instead I seem to be looking at some obscure bottleneck. I am confident that real world usage will be less impacted, but apparently there is a lot of room for improvement.


BenchmarkDotNet=v0.12.1, OS=Windows 10.0.18363.1198 (1909/November2018Update/19H2)
Intel Core i5-6500 CPU 3.20GHz (Skylake), 1 CPU, 4 logical and 4 physical cores
.NET Core SDK=5.0.201
  [Host]     : .NET Core 5.0.4 (CoreCLR 5.0.421.11614, CoreFX 5.0.421.11614), X64 RyuJIT
  Job-CMPGYI : .NET Core 5.0.4 (CoreCLR 5.0.421.11614, CoreFX 5.0.421.11614), X64 RyuJIT

InvocationCount=1  UnrollFactor=1  
Method Mean Error StdDev Median Ratio RatioSD
'enumeration of plain list' 0.2561 ns 0.0057 ns 0.0162 ns 0.2500 ns 4.06 1.06
'enumeration of immutable list' 8.7836 ns 0.1629 ns 0.3060 ns 8.7750 ns 137.01 30.59
'enumeration of immutable array' 0.0679 ns 0.0062 ns 0.0177 ns 0.0700 ns 1.00 0.00
'enumeration of concurrent bag' 0.9049 ns 0.0285 ns 0.0790 ns 0.8700 ns 14.36 4.01
'enumeration of async locked list' 0.3947 ns 0.0192 ns 0.0536 ns 0.3800 ns 6.23 1.67
'enumeration of sync locked list' 0.2616 ns 0.0064 ns 0.0180 ns 0.2600 ns 4.13 1.09
'inline enumeration of async locked list' 4.2055 ns 0.2361 ns 0.6660 ns 4.0950 ns 66.13 17.81
'parallel enumeration of async locked list' 3.9957 ns 0.2384 ns 0.6916 ns 3.8600 ns 62.32 18.28
'parallel enumeration of sync locked list' 2.9224 ns 0.1702 ns 0.4855 ns 2.8150 ns 46.06 14.57
'parallel enumeration of immutable list (non-volatile)' 13.7143 ns 1.2159 ns 3.5660 ns 11.7200 ns 211.85 77.57
'parallel enumeration of immutable array (non-volatile)' 2.7011 ns 0.1569 ns 0.4425 ns 2.6450 ns 42.74 12.49
'parallel enumeration of concurrent bag' 4.2211 ns 0.2558 ns 0.7338 ns 3.9900 ns 65.77 18.50
eiriktsarpalis commented 2 years ago

Most immutable collections use class enumerators, so for example the difference in performance between for example T[] and ImmutableArray<T> is to be expected.

The real outliers here seem to be immutable dictionaries and sets: without having profiled the benchmarks, the enumerator implementations seem to be allocating nested enumerator objects for each bucket within the data structure. I'm sure we could optimize this somewhat.

simonthum commented 2 years ago

@eiriktsarpalis Now that you highlighted it, sharing iteration logic between immutable and mutable code also seems likely to impact performance: https://github.com/dotnet/runtime/blob/e5f3fa0ed0f52b5073dbfcc7fa800246b9e17adf/src/libraries/System.Collections.Immutable/src/System/Collections/Immutable/ImmutableDictionary_2.Enumerator.cs#L80