This test is failing on main, specifically, the count variable is getting set beyond 1
public class MultiThreadedAccessTests
{
[Test]
public async Task SingletonOnlyGetsMadeOnce()
{
var builder = new DependencyBuilder();
int count = 0;
builder.AddSingleton<A>(_ =>
{
count++;
return new A();
});
var provider = builder.Build();
var tasks = Enumerable.Range(0, 10_000).Select(i => Task.Run(async () =>
{
await Task.Delay(1);
provider.GetService<A>();
}));
await Task.WhenAll(tasks);
Assert.That(count, Is.EqualTo(1), "the factory function should only be invoked once.");
}
public class A
{
// no-op class just for testing
}
}
Also, looking into the DI types, they use ConcurrentDictionary, likely in an attempt to be "thread safe". However, some very simple benchmarks show that those types are costly.
[Benchmark]
public void Dict_Concurrent()
{
var x = new ConcurrentDictionary<Type, ServiceDescriptor>();
}
[Benchmark]
public void Dict_Regular()
{
var x = new Dictionary<Type, ServiceDescriptor>();
}
And the results, (the tldr is that ConcurrentDictionary uses almost 10x the memory that a regular Dictionary does!)
Indeed, when I ran a benchmark test that only created a DependencyBuilder, and then ran the .Build() method to produce an emptyDependencyProvider, according to a benchmark, I'd allocated 4.43 KB worth of memory.
Clearly, there are some problems.
the code is spamming allocations in an attempt to be threadsafe, but
the code is not even threadsafe.
This caused me to go on an adventure!
I ran the above unit test, but as a benchmark instead,
| Method | Mean | Error | StdDev | Gen0 | Gen1 | Gen2 | Allocated |
|-------------------------- |---------:|---------:|---------:|---------:|---------:|---------:|----------:|
| SingletonOnlyGetsMadeOnce | 12.10 ms | 0.519 ms | 0.028 ms | 687.5000 | 265.6250 | 171.8750 | 4.22 MB |
The goal now is to fix the two problems... I guess most importantly the code should pass the unit test, performance be damned! But ideally, while I'm sneaking through the code, it would be nice if it allocated less memory and ran faster. I added a simple lock keyword around the entire GetService method, and it raised the mean time a bit, but changed no other performance issues... And it fixed the unit test.
With this in mind, it seems like perhaps the goal of using the concurrent data structures isn't worth the memory allocation cost. I ripped them out (without running all the unit tests yet, so perhaps I broke something), and re-ran my base benchmark where I created an empty DependencyProvider.
The allocation goes from 4.4KB to .9KB. And it follows that as less memory is allocated, less time is spent in the GC, so the new method runtime is nearly a third of the old runtime. This feels like a massive improvement. So maybe its worth it to dig in some more and investigate if those ConcurrentDictionary types were really saving us from anything.
On a separate note, the internals of the DependencyProvider keep two caches, one for instantiated singleton objects, and one for instantiated scoped objects. This is wasteful, because you cannot add the same Type as a singleton and a scoped service anyway, so those dictionary caches would never overlap in key-space. Since they never overlap, we can just use a single cache object. This reduces a further 80B of memory.
Actually, the same principle can be applied to the DependencyProvider's use of 3 separate dictionaries to hold the ServiceDescriptors for transients, scoped, and singletons. It is invalid to have any time overlap anyway, so those 3 dictionaries can be collapsed into a single dictionary. This saves us a few hundred bytes in the base case.
For reference, if all you do, is instantiate the DependencyBuilder, but don't call .Build() on it, that costs 136B. That means there are roughly 450B involved in the basic spin up of a DependencyProvider. A lot of that stuff has to do with a feature I'd love to remove in our DI scheme, called hydration... Indeed, if I run a benchmark where all I do is instantiate a DependencyProvider, it costs the same rough 450B. I don't want to think about the hydration stuff for now, but the same stunt to collapse the various dictionaries can be applied to the builder as well. It was holding 3 separate dictionaries for each lifetime type. I collapsed them into dictionary, and the resulting memory wen't down to 56B. Re-running the base case, then where the empty provider is created through the DependencyBuilder.Build() method,
The returns appear to be diminishing.
I figured it would be a good time to re-run all the unit tests across cli and microservice to see if I had broken anything. (side note, we should combine our solutions at this point). And low and behold, I had broken something!! I broke 81 tests in microservice, all roughly with the same call stack,
System.ArgumentException : An item with the same key has already been added. Key: Beamable.Server.SocketRequesterContext
at System.Collections.Generic.Dictionary`2.TryInsert(TKey key, TValue value, InsertionBehavior behavior)
at System.Collections.Generic.Dictionary`2.Add(TKey key, TValue value)
at Beamable.Common.Dependencies.DependencyProvider..ctor(DependencyBuilder builder, BuildOptions options) in /Users/chrishanna/Documents/Github/BeamableProduct/cli/beamable.common/Runtime/Dependencies/DependencyProvider.cs:line 215
at Beamable.Common.Dependencies.DependencyProvider.Fork(Action`1 configure) in /Users/chrishanna/Documents/Github/BeamableProduct/cli/beamable.common/Runtime/Dependencies/DependencyProvider.cs:line 520
at Beamable.Server.BeamableMicroService.<Start>b__49_0[TMicroService](MicroserviceArgs conf) in /Users/chrishanna/Documents/Github/BeamableProduct/microservice/microservice/dbmicroservice/BeamableMicroService.cs:line 163
The code that is coming from is this code in the constructor of the DependencyProvider,
foreach (var desc in builder.Descriptors)
{
Descriptors.Add(desc.Interface, desc);
}
The exception is obvious- the desc.Interface value already exists as a key in the Descriptors dictionary. Where before, there were 3 dictionaries for separate lifetimes, now there is only 1. I guess my assumption that it was invalid to have multiple interfaces per lifetime wasn't quite accurate. To boil backwards, this is happening in the Fork operation from the Microservice base code, where the sub-scope is adding in a scoped version of the _socketRequesterContext.
_args = args.Copy(conf =>
{
conf.ServiceScope = conf.ServiceScope.Fork(builder =>
{
// do we need instance specific services? They'd go here.
builder.AddScoped(_socketRequesterContext);
builder.AddScoped(_socketRequesterContext.Daemon);
});
});
But if I look backwards ad the conf.ServiceScope, there is a singleton version already in existence.
The way that dependencies used to get resolved was in a specific ordering of the lifetime types. First transient, then scoped, then singleton. So if you add something as a "short lived" service, then the higher order it had. In this case then, the addition of the scoped service would override the resolution away from the original singleton. To fix this, I allowed the constructor to override the service if is a "higher order" lifetime.
foreach (var desc in builder.Descriptors)
{
if (Descriptors.TryGetValue(desc.Interface, out var existingDesc))
{
if (existingDesc.Lifetime <= desc.Lifetime)
{
throw new Exception(
$"Cannot add service=[{existingDesc.Interface.Name}] to scope as lifetime=[{desc.Lifetime}], because the service has already been added to the scope as existing-lifetime=[{existingDesc.Lifetime}]. ");
}
Descriptors[desc.Interface] = desc;
}
else
{
Descriptors.Add(desc.Interface, desc);
}
}
And now all the tests are passing again.
To ra-cap so far, take this benchmark function
[Benchmark]
public void BaseCase_NoDispose_RegisterAndResolve()
{
var builder = new DependencyBuilder();
builder.AddSingleton<TestService>();
var provider = builder.Build();
var service = provider.GetService<TestService>();
}
Here are the results... About twice as fast on the runtime, and less than a quarter of the allocation. This test is a pretty vanilla base case. It really isn't doing much at all. its tragic we are allocating an entire kilobyte to instantiate a service, but I still believe that this cost is worth it as the complexity of the dependency scope scales.
So there are more problems. After a dependency scope is created, eventually it needs to get shutdown. This happens in our Microsevice framework all the time, because there is a dependency scope per request. The scope is disposed at the end of the request.
So here is another benchmark to see how the disposal method performs... Hint, its not good.
[Benchmark]
public void BaseCase_Dispose()
{
var builder = new DependencyBuilder();
var provider = builder.Build();
provider.Dispose();
}
any service that implements IBeamableDisposable, call the OnDispose method and wait for it to finish via a Promise
any service that implements IBeamableDisposableOrder, order them
call Dispose on all child scopes
be "threadsafe".
But it feels absurd that it should cost us a kilobyte to call the dispose method.
One of the first things that happens in the method is that all child scopes are disposed, first, so services die from the bottom of the tree, up towards the root. The Dispose method is async, so we need to wait for all the sub disposals to finish. When I comment this line out, it shaves 200B off the allocation (and tests break).
await Promise.Sequence(childRemovalPromises);
One thing to note is that the order doesn't actually matter for these sub processes to finish, but the implmentation of Promise.Sequence is doing more effort than we need to maintain order. I wrote a Promise.WhenAll method util that returns a Promise that completes when the input list of Promise<T> complete. It is a bit better than Promise.Sequence.
[Benchmark]
public async Task Sequence()
{
var pList = Enumerable.Range(0, 10).Select(_ => new Promise<int>()).ToList();
var final = Promise.Sequence(pList);
var _ = pList.Select(p => Task.Run(async () =>
{
await Task.Delay(1);
p.CompleteSuccess(1);
})).ToList();
await final;
}
[Benchmark]
public async Task WhenAll()
{
var pList = Enumerable.Range(0, 10).Select(_ => new Promise<int>()).ToList();
var final = Promise.WhenAll(pList);
var _ = pList.Select(p => Task.Run(async () =>
{
await Task.Delay(1);
p.CompleteSuccess(1);
})).ToList();
await final;
}
| Method | Mean | Error | StdDev | Allocated |
|--------- |---------:|----------:|----------:|----------:|
| Sequence | 1.246 ms | 0.2394 ms | 0.0131 ms | 8.22 KB |
| WhenAll | 1.219 ms | 0.2327 ms | 0.0128 ms | 7.1 KB |
And when applied to the Dispose method, it brings the allocation down to 1296 B (down almost 200 ), but, its an unfair optimization, because in the test, there are no child scopes, and the implementation for WhenAll has a zero-input base case,
public static Promise WhenAll<T>(List<Promise<T>> promises)
{
if (promises.Count == 0)
return Success;
var result = new Promise();
Check();
return result;
void Check()
{
for (var i = 0; i < promises.Count; i++)
{
if (promises[i].IsCompleted)
continue;
promises[i].Error(ex => result.CompleteError(ex));
promises[i].Then(_ => Check());
return;
}
result.CompleteSuccess();
}
}
We'll come back to this when we actually have child scopes to think about... Anyway, the next thing that happens in the Dispose method is that all cached service instances in the scope are disposed, with respect to their order defined by the IBeamableDisposableOrder interface. If I take this bit out, then the memory allocation goes down to 728 B, which is only a few hundred more bytes than the non disposal case. So there are lots of potential gains to be made by looking at this disposal.
Actually, time for a detour, because the act of calling a Promise function, or using an await on a Promise causes allocation. A Promise instantiation takes 104 B in a benchmark, and every time we use async Promise, that implicitly generates a Promise behind the scenes. I removed a tiny bit of unused cruft, and now its 96 B, but still. That feels like a lot.
We use Promise so much over the codebase that I feel like even the tiniest wins inside this library will pay dividends later.
In fact, take the following benchmark,
[Benchmark]
public void PromiseAllocation()
{
var p = new Promise();
}
When this runs, it allocates 96 B... Why? Becuase inside the Promise, we allocate a new object() to be used as a reference value for the lock keywords used through the Then/Error methods. According to Microsoft, you should never use the this keyword value as the reference value for a lock statement, because someone else might use the value in another lock statement, leading to a deadlock. This is a tricky scenario, because if I change our implementation to use this, all of the allocation goes away (for the instantiatn, allocation will re-appear during the usage). I am of the opinion at the moment that this is a worthy trade, given
how often these Promise types are instantiated, and
how unlikely it is for someone to lock a Promise. I think we can curtail this in documentation.
Moving on, our async/await code around Promise costs memory. This benchmark shows that simply adding async to the method signature will cost memory.
[Benchmark]
public Promise ReturnPromise()
{
var p = new Promise();
return null;
}
[Benchmark]
public async Promise ReturnAsyncPromise()
{
var p = new Promise();
}
That number should look familiar, because it was the same amount of memory the new object() took for the previous lock statement. This is a snippet from the async support code from Promise.
public sealed class PromiseAsyncMethodBuilder
{
private IAsyncStateMachine _stateMachine;
private Promise _promise = new Promise(); // TODO: allocation.
public static PromiseAsyncMethodBuilder Create()
{
return new PromiseAsyncMethodBuilder();
}
I took a lot of inspiration (aka, stole) from this article about UniTask.
The PromiseAsyncMethodBuilder is getting instantiated to build the state machine for the async handling. However, it doesn't need to be a class, and can be converted into a struct, which means its allocation goes away.
The remaining 64 B aren't actually the async's fault, but something that happens during the async state machine, which is that the Promise is completed.
Consider this benchmark, that compares a promise instantiation to an instantiation and a CompleteSuccess call. The 64 B looks familiar, eh?
[Benchmark]
public void PromiseComplete()
{
var p = new Promise();
p.CompleteSuccess();
}
Where is that 64 B coming from? There are 2 main causes...
the CompleteSuccess method takes a generic <T>, which leads to boxing, and
the base Promise type is secretly a Promise<Unit> under the hood, so passing PromiseBase.Unit to CompleteSuccess is passing the struct by value. We need to pass it by reference to avoid the allocation.
It was at this point I realized I'd been doing my benchmarks wrong this whole time. When returning void from a benchmark, it may be that you've optimized the code so much, that the compiler thinks nothing actually needs to happen, because there are no side effects. So I went back and changed my Promise allocation benchmark to actually return the generated Promise, and I saw that the allocation itself was responsible for the 64 B, not the COmpleteSuccess. This also means that the async Promise method was "correct" in that it returned something; where as the void method was "incorrect". After I fixed my benchmarks, I saw no difference between simply returning a new Promise vs doing it implicitly with an async Promise. I would still like to to remove more allocation here, but I think the way to do it would be with Promise pooling. However, pooling at this level would introduce some new constraints on the behaviour of the Promise library that I don't feel comfortable doing in the blind.
So it is time to return to the Dispose method's internal call to dispose all of its internal services...
Perhaps an easier target to go after are some LINQ statements inside the method.
I tore out a lot of code in the method (so it doesn't work), to isolate one specific line,
Perhaps not surprisingly, there is some allocation here (88 B), and ultimately, I don't understand why we need to do this call to Distinct at all. I remember writing this code, and I remember noticing that it was possible for the same instance to appear in the cache more than once. That was bad, because if we call the Dispose methods on the instance more than once, all the logic gets broken. However, I am sad with Past-Me for not drilling into how the multiple instances appeared in the list in the first place, because that seems like the real problem. I am going to boldly ignore the root cause, and remove the Distinct call, which brings the simple access of InstanceCache.Values down to 680 B from 744 B, saving 64 B.
Moving on, inside the helper method that disposes all the services, the first piece of business is to sort the services into groups based on their possible ordering.
var clonedList = new List<object>(services);
var groups = clonedList.GroupBy(x =>
{
if (x is IBeamableDisposableOrder disposableOrder)
return disposableOrder.DisposeOrder;
return 0;
});
groups = groups.OrderBy(x => x.Key);
Two things jump out at me,
the clonedList allocates, but it doesn't need to exist, since the LINQ GroupBy is going to re-allocate a new structure anyway. The clonedList existed from a time when I wanted to avoid possible collection modification errors.
the LINQ statements are cute for doing the logic I need, but I'm curious if there is a less memory-intensive way to write this.
Calling this method with everything else commented out, so that it only runs the code above, the allocation for the entire Dispose() method jumps to 1304 B.
I may just be shifting complexity around, but I decided to move the sorting of the services out of the Dispose method, and track it as additional state as the services are resolved. This raises the floor on the base case without calling Dispose, but removes the need to do any sorting within Dispose, thus removing LINQ and dropping the allocation. We should always be calling Dispose, so I'm valuing that case over the non disposal case.
This benchmark is still using broken code, but it illustrates using a precached sorted datastructure instead of the code above.
This test is failing on main, specifically, the
count
variable is getting set beyond1
Also, looking into the DI types, they use
ConcurrentDictionary
, likely in an attempt to be "thread safe". However, some very simple benchmarks show that those types are costly.And the results, (the tldr is that
ConcurrentDictionary
uses almost 10x the memory that a regularDictionary
does!)Indeed, when I ran a benchmark test that only created a
DependencyBuilder
, and then ran the.Build()
method to produce an emptyDependencyProvider
, according to a benchmark, I'd allocated4.43 KB
worth of memory.Clearly, there are some problems.
This caused me to go on an adventure!
I ran the above unit test, but as a benchmark instead,
The goal now is to fix the two problems... I guess most importantly the code should pass the unit test, performance be damned! But ideally, while I'm sneaking through the code, it would be nice if it allocated less memory and ran faster. I added a simple
lock
keyword around the entireGetService
method, and it raised the mean time a bit, but changed no other performance issues... And it fixed the unit test.With this in mind, it seems like perhaps the goal of using the concurrent data structures isn't worth the memory allocation cost. I ripped them out (without running all the unit tests yet, so perhaps I broke something), and re-ran my base benchmark where I created an empty
DependencyProvider
.The allocation goes from
4.4KB
to.9KB
. And it follows that as less memory is allocated, less time is spent in the GC, so the new method runtime is nearly a third of the old runtime. This feels like a massive improvement. So maybe its worth it to dig in some more and investigate if thoseConcurrentDictionary
types were really saving us from anything.On a separate note, the internals of the
DependencyProvider
keep two caches, one for instantiated singleton objects, and one for instantiated scoped objects. This is wasteful, because you cannot add the sameType
as a singleton and a scoped service anyway, so those dictionary caches would never overlap in key-space. Since they never overlap, we can just use a single cache object. This reduces a further80B
of memory.Actually, the same principle can be applied to the
DependencyProvider
's use of 3 separate dictionaries to hold theServiceDescriptors
for transients, scoped, and singletons. It is invalid to have any time overlap anyway, so those 3 dictionaries can be collapsed into a single dictionary. This saves us a few hundred bytes in the base case.For reference, if all you do, is instantiate the
DependencyBuilder
, but don't call.Build()
on it, that costs136B
. That means there are roughly450B
involved in the basic spin up of aDependencyProvider
. A lot of that stuff has to do with a feature I'd love to remove in our DI scheme, called hydration... Indeed, if I run a benchmark where all I do is instantiate aDependencyProvider
, it costs the same rough450B
. I don't want to think about the hydration stuff for now, but the same stunt to collapse the various dictionaries can be applied to the builder as well. It was holding 3 separate dictionaries for each lifetime type. I collapsed them into dictionary, and the resulting memory wen't down to56B
. Re-running the base case, then where the empty provider is created through theDependencyBuilder.Build()
method,The returns appear to be diminishing. I figured it would be a good time to re-run all the unit tests across cli and microservice to see if I had broken anything. (side note, we should combine our solutions at this point). And low and behold, I had broken something!! I broke 81 tests in microservice, all roughly with the same call stack,
The code that is coming from is this code in the constructor of the
DependencyProvider
,The exception is obvious- the
desc.Interface
value already exists as a key in theDescriptors
dictionary. Where before, there were 3 dictionaries for separate lifetimes, now there is only 1. I guess my assumption that it was invalid to have multiple interfaces per lifetime wasn't quite accurate. To boil backwards, this is happening in theFork
operation from the Microservice base code, where the sub-scope is adding in a scoped version of the_socketRequesterContext
.But if I look backwards ad the
conf.ServiceScope
, there is a singleton version already in existence.The way that dependencies used to get resolved was in a specific ordering of the lifetime types. First transient, then scoped, then singleton. So if you add something as a "short lived" service, then the higher order it had. In this case then, the addition of the scoped service would override the resolution away from the original singleton. To fix this, I allowed the constructor to override the service if is a "higher order" lifetime.
And now all the tests are passing again. To ra-cap so far, take this benchmark function
Here are the results... About twice as fast on the runtime, and less than a quarter of the allocation. This test is a pretty vanilla base case. It really isn't doing much at all. its tragic we are allocating an entire kilobyte to instantiate a service, but I still believe that this cost is worth it as the complexity of the dependency scope scales.
So there are more problems. After a dependency scope is created, eventually it needs to get shutdown. This happens in our Microsevice framework all the time, because there is a dependency scope per request. The scope is disposed at the end of the request.
So here is another benchmark to see how the disposal method performs... Hint, its not good.
That
Dispose
method is doing a lot,IBeamableDisposable
, call theOnDispose
method and wait for it to finish via aPromise
IBeamableDisposableOrder
, order themDispose
on all child scopesBut it feels absurd that it should cost us a kilobyte to call the dispose method.
One of the first things that happens in the method is that all child scopes are disposed, first, so services die from the bottom of the tree, up towards the root. The
Dispose
method isasync
, so we need to wait for all the sub disposals to finish. When I comment this line out, it shaves200B
off the allocation (and tests break).One thing to note is that the order doesn't actually matter for these sub processes to finish, but the implmentation of
Promise.Sequence
is doing more effort than we need to maintain order. I wrote aPromise.WhenAll
method util that returns aPromise
that completes when the input list ofPromise<T>
complete. It is a bit better thanPromise.Sequence
.And when applied to the
Dispose
method, it brings the allocation down to1296 B
(down almost 200 ), but, its an unfair optimization, because in the test, there are no child scopes, and the implementation forWhenAll
has a zero-input base case,We'll come back to this when we actually have child scopes to think about... Anyway, the next thing that happens in the
Dispose
method is that all cached service instances in the scope are disposed, with respect to their order defined by theIBeamableDisposableOrder
interface. If I take this bit out, then the memory allocation goes down to728 B
, which is only a few hundred more bytes than the non disposal case. So there are lots of potential gains to be made by looking at this disposal.Actually, time for a detour, because the act of calling a
Promise
function, or using anawait
on aPromise
causes allocation. APromise
instantiation takes104 B
in a benchmark, and every time we useasync Promise
, that implicitly generates aPromise
behind the scenes. I removed a tiny bit of unused cruft, and now its96 B
, but still. That feels like a lot.We use
Promise
so much over the codebase that I feel like even the tiniest wins inside this library will pay dividends later. In fact, take the following benchmark,When this runs, it allocates
96 B
... Why? Becuase inside thePromise
, we allocate anew object()
to be used as a reference value for thelock
keywords used through theThen/Error
methods. According to Microsoft, you should never use thethis
keyword value as the reference value for alock
statement, because someone else might use the value in anotherlock
statement, leading to a deadlock. This is a tricky scenario, because if I change our implementation to usethis
, all of the allocation goes away (for the instantiatn, allocation will re-appear during the usage). I am of the opinion at the moment that this is a worthy trade, givenPromise
types are instantiated, andlock
aPromise
. I think we can curtail this in documentation.Moving on, our
async/await
code aroundPromise
costs memory. This benchmark shows that simply addingasync
to the method signature will cost memory.The
async
version costs96 B
more memory.That number should look familiar, because it was the same amount of memory the
new object()
took for the previouslock
statement. This is a snippet from theasync
support code fromPromise
.I took a lot of inspiration (aka, stole) from this article about UniTask. The
PromiseAsyncMethodBuilder
is getting instantiated to build the state machine for theasync
handling. However, it doesn't need to be a class, and can be converted into a struct, which means its allocation goes away.The remaining
64 B
aren't actually theasync
's fault, but something that happens during theasync
state machine, which is that thePromise
is completed.Consider this benchmark, that compares a promise instantiation to an instantiation and a
CompleteSuccess
call. The64 B
looks familiar, eh?Where is that
64 B
coming from? There are 2 main causes...CompleteSuccess
method takes a generic<T>
, which leads to boxing, andPromise
type is secretly aPromise<Unit>
under the hood, so passingPromiseBase.Unit
toCompleteSuccess
is passing the struct by value. We need to pass it by reference to avoid the allocation.It was at this point I realized I'd been doing my benchmarks wrong this whole time. When returning
void
from a benchmark, it may be that you've optimized the code so much, that the compiler thinks nothing actually needs to happen, because there are no side effects. So I went back and changed myPromise
allocation benchmark to actually return the generatedPromise
, and I saw that the allocation itself was responsible for the64 B
, not theCOmpleteSuccess
. This also means that theasync Promise
method was "correct" in that it returned something; where as thevoid
method was "incorrect". After I fixed my benchmarks, I saw no difference between simply returning a newPromise
vs doing it implicitly with anasync Promise
. I would still like to to remove more allocation here, but I think the way to do it would be withPromise
pooling. However, pooling at this level would introduce some new constraints on the behaviour of thePromise
library that I don't feel comfortable doing in the blind.So it is time to return to the
Dispose
method's internal call to dispose all of its internal services... Perhaps an easier target to go after are some LINQ statements inside the method.I tore out a lot of code in the method (so it doesn't work), to isolate one specific line,
Perhaps not surprisingly, there is some allocation here (
88 B
), and ultimately, I don't understand why we need to do this call toDistinct
at all. I remember writing this code, and I remember noticing that it was possible for the same instance to appear in the cache more than once. That was bad, because if we call theDispose
methods on the instance more than once, all the logic gets broken. However, I am sad with Past-Me for not drilling into how the multiple instances appeared in the list in the first place, because that seems like the real problem. I am going to boldly ignore the root cause, and remove theDistinct
call, which brings the simple access ofInstanceCache.Values
down to680 B
from744 B
, saving64 B
.Moving on, inside the helper method that disposes all the services, the first piece of business is to sort the services into groups based on their possible ordering.
Two things jump out at me,
clonedList
allocates, but it doesn't need to exist, since the LINQGroupBy
is going to re-allocate a new structure anyway. TheclonedList
existed from a time when I wanted to avoid possible collection modification errors.LINQ
statements are cute for doing the logic I need, but I'm curious if there is a less memory-intensive way to write this.Calling this method with everything else commented out, so that it only runs the code above, the allocation for the entire
Dispose()
method jumps to1304 B
. I may just be shifting complexity around, but I decided to move the sorting of the services out of theDispose
method, and track it as additional state as the services are resolved. This raises the floor on the base case without callingDispose
, but removes the need to do any sorting withinDispose
, thus removing LINQ and dropping the allocation. We should always be callingDispose
, so I'm valuing that case over the non disposal case.This benchmark is still using broken code, but it illustrates using a precached sorted datastructure instead of the code above.
--more to do