dotnet / reactive

The Reactive Extensions for .NET
http://reactivex.io
MIT License
6.68k stars 749 forks source link

Feature/2005 large composite disposable perf #2092

Closed idg10 closed 5 months ago

idg10 commented 6 months ago

Addresses #2005

This change makes the CompositeDisposable far more efficient when it is dealing with large numbers (thousands) of IDisposable objects. This matters in "fan out/in" scenarios where items are split into large numbers of groups with GroupBy, and then later recombined (usually with some per-group processing having been applied) with either SelectMany or Merge, because these operators use CompositeDisposable to keep track of their subscriptions to upstream observables (the ones that emerge by GroupBy in the scenario being described here).

Fan out/in is an important technique in Rx.NET, as described at https://introtorx.com/chapters/transformation-of-sequences#the-significance-of-selectmany and https://introtorx.com/chapters/partitioning so we want this to work well even with large numbers of groups.

Before, CompositeDisposable used a List<IDisposable> internally, which meant that its Remove method had to perform a linear scan of the list to find the item to remove. This resulted in $O(N^2)$ performance (where $N$ is the number of groups being merged) when the source finally calls OnCompleted. If you had over 100,000 groups, this could end up taking seconds to complete!

With this change, the CompositeDisposable continues to use the same strategy as before for small and moderate sized lists because there's no observable performance problem with that, and it is likely to be more efficient for small collections than a strategy optimized for larger collections. And awful lot of CompositeDisposable instances end up with only a handful of items, so it's important that it continues to perform well in that scenario.

But with large numbers of disposables, it switches to a different strategy in which it uses a dictionary to track disposables.

We use a dictionary instead of the more obvious HashSet<IDisposable> because CompositeDisposable has always allowed you add the same disposable instance multiple times. Although it would not be a good idea to depend on this, we've never prevented it, and since this is a public type, some Rx users might depend on it. So the dictionary has an int value keeping track of how many times each particular disposable has been added. We expect this to be 1 in most cases in practice.

As the discussion at https://github.com/dotnet/reactive/issues/2005#issuecomment-1991961257 shows, we've added benchmarks that verify that this does address the performance problem described. We've also checked that this doesn't cause any significant change in performance for any other benchmarks. (That's important because CompositeDisposable is widely used within Rx.NET, so changes of this kind have the potential to have a broad negative impact. We wanted to make sure we hadn't made normal operation much slower just to improve this one particular and relatively unusual scenario.)