Closed alewisfm closed 5 months ago
Thanks for all the details!
I think there are two possible strategies we could use to fix this:
CompositeDisposable.Remove
more efficient when it is tracking large numbers of disposablesI think 1 might be tricky in this case, because it's the GroupBy
that knows that it is shutting down all downstream observers, but it's the SelectMany
that would need to process the shutdown in bulk. (The CompositeDisposable
is owned by the SelectMany
in this case.) It's not completely inconceivable that there could be some sort of protocol by which an upstream source could notify something downstream that it's going to be calling OnComplete
on all its downstream subscribers. We do already have some cases where we detect that multiple System.Reactive
operators are chained together, and they can behave differently than they would when dealing with some random unknown IObserver<T>
. We use that to avoid redundant safe wrappers for example. But this would be a significant change because at the moment, there's nothing at all that enables a set of downstream observers in some sort of "fan out" configuration to be handled as a single logical group.
Also, attempting 1 seems less general: it seems like it would be too easy to end up solving this only for certain specific combinations of known operators, and it would definitely break if someone threw their own operator into the mix. Also, there are cases where you could have thousands of groups, and they would not necessarily all complete at once. Ideally we'd want to make that more efficient, instead of fixing only the very particular case in which everything shuts down at once.
So approach 2 seems more promising.
The way I would approach this would, I think, be to make CompositeDisposable
operate in one of two modes. For collections below some threshold we would continue to use a List<IDisposable>
internally. I believe a lot of CompositeDisposable
s contain very small lists in practice, and a linear search is usually more efficient than a more scalable algorithm for small collections. Also, this type is used very widely, and changing its internal makeup could have a significant effect on memory usage by Rx.NET.
So I'm thinking we'd change the field that currently is of type List<IDisposable?>
to be of type object
, and for that to contain a reference either to a List<IDisposable>
as now, or some hash-based type once the number of items exceeds some threshold. (Round about 1000 seems like a reasonable threshold, based on the numbers in the benchmark.)
The alternate representation couldn't be a HashSet<IDisposable>
because at the moment, nothing stops you from adding entries to CompositeDisposable
twice, and it will Dispose
them twice. Although it would be a bad idea for anything to depend on that, it's possible that something is in fact depending on it. So I'm thinking we could use Dictionary<IDisposable, int>
where the int
is a count of how many times the relevant disposable has been added.
In cases where the IDisposable
has a reasonable hash code, this means individual removals would no longer have $O(N)$ cost. As the docs for Dictionary say, locating items typically has $O(1)$ cost, so the total shutdown cost would become $O(N)$ instead of $O(N^2)$.
I've prototyped approach 2 described above, making CompositeDisposable
operate in two different modes according to how many disposables it contains. Below a threshold (currently 1024) it uses a list exactly like it did before. But once this threshold is exceeded, it switches into a mode where it uses a dictionary instead.
See https://github.com/dotnet/reactive/pull/2092
Here are how your benchmarks look on my machine (which is quite old now, and significantly slower than yours) running against the current CompositeDisposable
:
(The dark blue line here lines up almost precisely with the green one here which is why it appears to be missing. Likewise the orange line behind the pale blue one.)
You can see the lines curving upwards in way that looks consistent with the $O(N^2)$ behaviour predicted by the analysis.
Here's how it looks with prototype:
Note that the vertical scale is different—this goes up only to 600 whereas the first went up to 3500. (Not quite sure why it went for that instead of 3000, but there we are.) Because we're effectively zoomed in a bit more, it's now possible to see that there really are 4 lines, thanks to minor variations in the runs. For comparison, if I show it at the same vertical scale as the first graph, you can see that it's a significant overall improvement for larger group sizes:
Perhaps more importantly, we no longer see the upward curve. It looks consistent with the predicted $O(N)$ behaviour.
My one concern with this was that because CompositeDisposable
is used very widely, and because it's common for it to contain small numbers of items, adding this extra mode to enable better performance for large collections would have a net negative effect because it's going to slow things down at least slightly in all scenarios that don't need the large mode. So I ran the full benchmark suite before and after the change, to visualize the overall effect:
Each point represents one particular ([Benchmark]
method, parameter set) combination. Its X position indicates how long the benchmark took to run before I made these changes to CompositeDisposable
, and its Y position shows how long the same benchmark took after the change.
The red line marks X=Y. Any points falling on this line ran at the same speed before and after the change. Any falling above it got slower, and any falling beneath it got faster.
You can see the positive effect of this change very easily—the points that are way off the line and below it all represent the new benchmarks added to capture this problem, and their position on this plot shows that they all got significantly faster.
We can also see that nothing got dramatically slower. There's a bit of variation, but I think that's just variability between runs. (I was running this suite on my desktop machine and since it takes several hours, I was doing other things at the same time.) We are aiming to introduce some more consistent benchmark analysis, with execution procedures that should produce less variable results, and we also want to run the benchmarks over some older versions of Rx to see if any performance regressions were introduced. So at this stage, with respect to this particular problem, our main concern is to determine that we didn't cause any major new problems. It's looking OK from that perspective.
System.Reactive 6.0.0
dotnet 7.0.203
We are processing an observable that contains sensor samples taken from many different sensors. We want to gather statistics related to each individual sensor, and so perform a
GroupBy()
on the observable to create per-sensor observables. There is a long delay between the observable completing and the final subscriber completing during which there is 100% CPU usage.A simple reproducer has been created based on an observable:
IObservable<IGroupedObservable<...>>
in n groupsMerge()
orSelectMany()
.For example:
or
The time taken to complete is O(m) where m is the number of elements in the array.
The time taken to complete is O(n^2) where n is the number of groups that were allocated (
numberOfGroups
from the above example).Running under a profiler the problem appears to be that when each of the
IGroupedObservables
completes, the subscriber created by theSelectMany()
orMerge()
is individually removed from aCompositeDisposable
. This removal results in a linear search of anIList
in theCompositeDisposable
that contains one entry per group. As all the subscribers are removed, one after the other, this removal process is O(n^2) on the number of groups.Profiler output:
The following class runs under BenchmarkDotNet to exhibit the issue:
Sample output is:
Note that increasing the number of samples has a relatively consistent increase the total time of about 100 ms, whereas increasing the number of groups has a more quadratic relationship with overall time, becoming the dominating factor at around 100,000 groups.