Iterating with ForEach over Immutable collections is very slow

adamsitnik commented 5 years ago

How to run the benchmarks:

git clone https://github.com/dotnet/performance.git
# if you have .NET Core 3.0 installed
dotnet run -c Release -f netcoreapp3.0 -p .\performance\src\benchmarks\micro\MicroBenchmarks.csproj --filter *IterateForEach<* --join
# if you don't have .NET Core 3.0 installed
py .\performance\scripts\benchmarks_ci.py -f netcoreapp3.0 --filter *IterateForEach<* --bdn-arguments="--join true"

Type	Method	Size	Mean
IterateForEach<Int32>	Array	512	222.1 ns
IterateForEach<String>	Array	512	216.8 ns
IterateForEach<Int32>	Span	512	213.7 ns
IterateForEach<String>	Span	512	230.2 ns
IterateForEach<Int32>	ReadOnlySpan	512	210.6 ns
IterateForEach<String>	ReadOnlySpan	512	227.8 ns
IterateForEach<Int32>	IEnumerable	512	2,247.3 ns
IterateForEach<String>	IEnumerable	512	2,465.8 ns
IterateForEach<Int32>	List	512	1,125.1 ns
IterateForEach<String>	List	512	2,401.7 ns
IterateForEach<Int32>	LinkedList	512	1,972.2 ns
IterateForEach<String>	LinkedList	512	3,284.3 ns
IterateForEach<Int32>	HashSet	512	1,565.0 ns
IterateForEach<String>	HashSet	512	2,344.5 ns
IterateForEach<Int32>	Dictionary	512	3,432.4 ns
IterateForEach<String>	Dictionary	512	3,159.9 ns
IterateForEach<Int32>	Queue	512	1,539.8 ns
IterateForEach<String>	Queue	512	3,834.0 ns
IterateForEach<Int32>	Stack	512	1,806.3 ns
IterateForEach<String>	Stack	512	3,876.8 ns
IterateForEach<Int32>	SortedList	512	5,771.9 ns
IterateForEach<String>	SortedList	512	7,221.3 ns
IterateForEach<Int32>	SortedSet	512	8,825.5 ns
IterateForEach<String>	SortedSet	512	9,210.3 ns
IterateForEach<Int32>	SortedDictionary	512	9,784.2 ns
IterateForEach<String>	SortedDictionary	512	13,343.0 ns
IterateForEach<Int32>	ConcurrentDictionary	512	15,821.2 ns
IterateForEach<String>	ConcurrentDictionary	512	21,915.9 ns
IterateForEach<Int32>	ConcurrentQueue	512	4,228.2 ns
IterateForEach<String>	ConcurrentQueue	512	5,546.4 ns
IterateForEach<Int32>	ConcurrentStack	512	3,286.0 ns
IterateForEach<String>	ConcurrentStack	512	4,345.6 ns
IterateForEach<Int32>	ConcurrentBag	512	2,682.9 ns
IterateForEach<String>	ConcurrentBag	512	4,924.7 ns
IterateForEach<Int32>	ImmutableArray	512	1,137.9 ns
IterateForEach<String>	ImmutableArray	512	1,137.7 ns
IterateForEach<Int32>	ImmutableDictionary	512	57,456.2 ns
IterateForEach<String>	ImmutableDictionary	512	88,192.5 ns
IterateForEach<Int32>	ImmutableHashSet	512	66,958.3 ns
IterateForEach<String>	ImmutableHashSet	512	107,369.1 ns
IterateForEach<Int32>	ImmutableList	512	25,530.7 ns
IterateForEach<String>	ImmutableList	512	45,287.9 ns
IterateForEach<Int32>	ImmutableQueue	512	4,073.7 ns
IterateForEach<String>	ImmutableQueue	512	4,510.7 ns
IterateForEach<Int32>	ImmutableStack	512	3,891.0 ns
IterateForEach<String>	ImmutableStack	512	4,181.2 ns
IterateForEach<Int32>	ImmutableSortedDictionary	512	27,616.2 ns
IterateForEach<String>	ImmutableSortedDictionary	512	44,664.0 ns
IterateForEach<Int32>	ImmutableSortedSet	512	27,055.9 ns
IterateForEach<String>	ImmutableSortedSet	512	45,142.9 ns

Full docs for the new benchmarking workflow: https://github.com/dotnet/performance/blob/master/docs/benchmarking-workflow-corefx.md

danmoseley commented 5 years ago

Do we allow benchmarks in the perf repo for corefxlab types? (I'm curious where DictionarySlim fits in this)

It's interesting that enumerating HashSet is signficantly faster than Dictionary. And the size is sufficiently small, that I would think (?) the better locality (smaller Entries) would not be relevant.

adamsitnik commented 5 years ago

Do we allow benchmarks in the perf repo for corefxlab types?

It would prolong the time it takes to run all the benchmarks. Also, the main focus in the perf repo is on the things we ship.

But we have a BDN based benchmarking solution in corefxlab which can be used to compare the perf https://github.com/dotnet/corefxlab/tree/master/tests/Benchmarks

yahorsi commented 5 years ago

Wait, iterating ofer Array is SO faster than over List? >5x times faster for Int's and omg 10 times for strings? Hell List is the base collection used basically 90% of the time, e.g. when we query database and neeed results. If List is Array based why it is so slower. Don't you think this is actually bigger issue than Immutable ? I beleive List is used and iterated in the 100% of .NET programs unlike Immutable collections

huoyaoyuan commented 5 years ago

@yahorsi List iterator must check that you haven't modified the list during the loop. Only rust-style immutable/mutable ownership checking can solve this gracefully. C++ chooses for speed but lacks safety.

PiotrKowalski93 commented 5 years ago

It seems like interesting topic for me. Can work on it? I mean, I want to be a contributor but i have never worked on OpenSource. I can work in cooperation too :)

Wraith2 commented 5 years ago

@PiotrKowalski93 no permission needed just go for it. The code standard here are high so be prepared to be able to justify and measure everything but improvements are always welcome.

PiotrKowalski93 commented 5 years ago

Great, I wanted to debug the code but I hit the wall. Build went great. I wanted to see how the code is working in System.Collections.Immutable so I created .sln with simple console app in which I create ImmutableList and iterate through it.

Separately when I open solution System.Collections.Immutable.sln i can build code etc. but when I add System.Collections.Immutable.csproj to my "testing" .sln i see:

The TargetFramework value '' was not recognized. It may be misspelled. If not, then the TargetFrameworkIdentifier and/or TargetFrameworkVersion properties must be specified explicitly.

And indeed it is not specified and in properties of csproj i see empty list in TargetFramework property.

How can Debug this specific dll?

Wraith2 commented 5 years ago

the build system relies on a lot of complex logic so you won't be able to easily get a project that needs it to compile without it. I'd suggest you take the code logic from the collections you want to debug and add that code directly to your project then debug it that way.

PiotrKowalski93 commented 5 years ago

@Wraith2 Thank you, fair point, I will do that

simonthum commented 3 years ago

Hi, I just saw this issue after investigating observed slowness of immutable collection iteration. I used benchmarkdotnet, I share the numbers below, but the first part (no parallel prefix) is similar to the opening table. I only use Int32 collections.

What really worries me is how much slower the parallel code (explicit Task.Run with #CPUs) seems to be.

I would have expected this to show the benefit of concurrent/immutable collections as there is no locking, but instead I seem to be looking at some obscure bottleneck. I am confident that real world usage will be less impacted, but apparently there is a lot of room for improvement.


BenchmarkDotNet=v0.12.1, OS=Windows 10.0.18363.1198 (1909/November2018Update/19H2)
Intel Core i5-6500 CPU 3.20GHz (Skylake), 1 CPU, 4 logical and 4 physical cores
.NET Core SDK=5.0.201
  [Host]     : .NET Core 5.0.4 (CoreCLR 5.0.421.11614, CoreFX 5.0.421.11614), X64 RyuJIT
  Job-CMPGYI : .NET Core 5.0.4 (CoreCLR 5.0.421.11614, CoreFX 5.0.421.11614), X64 RyuJIT

InvocationCount=1  UnrollFactor=1

Method	Mean	Error	StdDev	Median	Ratio	RatioSD
'enumeration of plain list'	0.2561 ns	0.0057 ns	0.0162 ns	0.2500 ns	4.06	1.06
'enumeration of immutable list'	8.7836 ns	0.1629 ns	0.3060 ns	8.7750 ns	137.01	30.59
'enumeration of immutable array'	0.0679 ns	0.0062 ns	0.0177 ns	0.0700 ns	1.00	0.00
'enumeration of concurrent bag'	0.9049 ns	0.0285 ns	0.0790 ns	0.8700 ns	14.36	4.01
'enumeration of async locked list'	0.3947 ns	0.0192 ns	0.0536 ns	0.3800 ns	6.23	1.67
'enumeration of sync locked list'	0.2616 ns	0.0064 ns	0.0180 ns	0.2600 ns	4.13	1.09
'inline enumeration of async locked list'	4.2055 ns	0.2361 ns	0.6660 ns	4.0950 ns	66.13	17.81
'parallel enumeration of async locked list'	3.9957 ns	0.2384 ns	0.6916 ns	3.8600 ns	62.32	18.28
'parallel enumeration of sync locked list'	2.9224 ns	0.1702 ns	0.4855 ns	2.8150 ns	46.06	14.57
'parallel enumeration of immutable list (non-volatile)'	13.7143 ns	1.2159 ns	3.5660 ns	11.7200 ns	211.85	77.57
'parallel enumeration of immutable array (non-volatile)'	2.7011 ns	0.1569 ns	0.4425 ns	2.6450 ns	42.74	12.49
'parallel enumeration of concurrent bag'	4.2211 ns	0.2558 ns	0.7338 ns	3.9900 ns	65.77	18.50

eiriktsarpalis commented 2 years ago

Most immutable collections use class enumerators, so for example the difference in performance between for example T[] and ImmutableArray<T> is to be expected.

The real outliers here seem to be immutable dictionaries and sets: without having profiled the benchmarks, the enumerator implementations seem to be allocating nested enumerator objects for each bucket within the data structure. I'm sure we could optimize this somewhat.

simonthum commented 2 years ago

@eiriktsarpalis Now that you highlighted it, sharing iteration logic between immutable and mutable code also seems likely to impact performance: https://github.com/dotnet/runtime/blob/e5f3fa0ed0f52b5073dbfcc7fa800246b9e17adf/src/libraries/System.Collections.Immutable/src/System/Collections/Immutable/ImmutableDictionary_2.Enumerator.cs#L80

dotnet / runtime

Iterating with ForEach over Immutable collections is very slow #29092