Open stephentoub opened 6 years ago
@stephentoub is Future ok? Or do you think we should push it into 2.1?
Future is fine
Is this difficult to do? I'd love to measure our broadcast performance scenario with this. We currently filed this issue https://github.com/aspnet/SignalR/issues/1796 to try and work around allocating the enumerator per message (which we're currently doing).
We can't change the return type of GetEnumerator, but we can at least expose an enumerator struct to enable enumeration without allocating.
Isn't the compiler (and/or JIT) already smart enough to emit direct calls for struct enumerators? :thinking:
Unless you meant that changing from a class to a struct enumerator is a binary breaking change, in which case.... Would it be suitable to quirk instead?
I just wouldn't like have having to remember to use the more verbose syntax over the simplicity of foreach
for one specific collection type to keep GC pressure down.
Is this difficult to do?
No, the iterator just needs to be rewritten manually.
Unless you meant that changing from a class to a struct enumerator is a binary breaking change
Correct. Changing the GetEnumerator signature is a breaking change.
Would it be suitable to quirk instead?
No, not for API signature changes.
Is this difficult to do?
@davidfowl, e.g. something like this: https://github.com/stephentoub/corefx/commit/d11bdcd0763b8913e78a237e631a8abe3c0afa68
I'm not sure I like it because it seems heavy-handed and the consuming code cannot use foreach
.
@jaredpar is there any way we could change the compiler rules for IEnumerable
to prefer a struct-based enumerator when GetEnumerator()
is already taken? I suspect we'll have this problem in other areas too. @bartonjs suggested that we could add another property/method that returns a struct that provides a struct-based GetEnumerator()
method. Again, feels heavy-handed...
is there any way we could change the compiler rules for IEnumerable ...
No
This is one of the most used patterns in .NET. Changing the compiler rules for how we bind existing IEnumerable
is a back compat issue.
This has probably been discussed before, but is there anything reasonable we could do in the runtime to make it a non-breaking change to change a signature from:
public IEnumerator<T> GetEnumerator() { ... }
to
public Enumerator GetEnumerator() { ... }
public struct Enumerator : IEnumerator<T> { ... }
such that code compiled against the former would "just work" with an implementation containing the latter? Then code compiled against the new signature would get the allocation benefit but code compiled against the older signature would still work.
cc: @jkotas
You can have both Enumerator GetEnumerator()
and IEnumerator<T> GetEnumerator()
methods in the IL. It is just that you cannot do that in C# today. It is possible to hack it via IL rewriting, e.g. the implementation can contain both methods, but reference assembly just the new one. The problem with introducing such non-standard overload in the implementation is that it would break reflection or scripting scenarios.
The other possible approach is to wait for the JIT to get better and optimize out the allocation in the current less than ideal pattern for you.
@jkotas
It is possible to hack it via IL rewriting, e.g. the implementation can contain both methods, but reference assembly just the new one.
Wouldn't that mean that if you upgrade to the newer reference assembly, code that does any kind of type inference will start behaving differently? E.g.:
var e = cdict.GetEnumerator(); // is the type of e Enumerator or IEnumerator<T>?
void F<T>(T x) {}
F(cdict.GetEnumerator()); // is the inferred T Enumerator or IEnumerator<T>?
I think that's an unacceptable breaking change.
This is a general problem with introducing overloads. If the new overload has same behavior as the old one (ie it is unlikely to change the behavior of existing code silently), I believe it has been considered as an acceptable option. It has been treated on case by case.
The options listed here are all somewhat unfortunate (that is, they suck for various reasons :-)).
One compromise would be to add a new method called GetEnumerator2()
and change the compiler to support foreach-ing enumerators and code would like this:
var dict = new ConcurrentDictionary<string, int>();
// add stuff
foreach (var kv in dict.GetEnumerator2())
{
Use(enumerator.Current);
}
The established patterns for struct-based alternative is ValueXxx
, like ValueTuple
or ValueTask
but in this case value
might be a bad idea due to the generic parameter TValue
.
@jaredpar, are you still supportive of this change to the compiler? @stephentoub, are you OK with the resulting usage?
@terrajobst How is that better than a method like GetStructEnumerable()
that returns a struct
whose GetEnumerator()
returns the struct
enumerator (which you called "heavy-handed" before)?
With GetStructEnumerable()
:
GetEnumerator2
(you just change the name of the called method).are you still supportive of this change to the compiler?
I'm personally comfortable with the idea of allowing foreach
on IEnumerator<T>
. It's a language change though hence has to go through LDM. Happy to represent the issue though.
There have been a number of times I've wanted foreach on an IEnumerator<T>
-patterned struct. This language change would be welcome!
Suddenly this has become almost the only allocator on my hot path (c.3Gb per minute). What are other current alternatives for small number of values without lock? I'm using it instead of missing but long-awaited in CoreFx ConcurrentHashSet (only keys).
The enumerator is implemented with yield
keyword that does not support Reset
. However, the Reset
method is still not deprecated officially, but it is optional. What if we could reuse the enumerator and call Reset
on it instead of GetEnumerator()
? That will allow to keep current API unchanged.
The other possible approach is to wait for the JIT to get better and optimize out the allocation in the current less than ideal pattern for you.
@jkotas, are we any closer to this happening? It's been discussed for years. :)
What if we could reuse the enumerator and call Reset on it instead of GetEnumerator()?
Interesting idea. I'd be ok with that as a stopgap measure, but it wouldn't fully replace the benefits of GetEnumerator being allocation-free, in particular because you couldn't share that enumerator instance across multiple threads. Plus, it's very non-discoverable.
in particular because you couldn't share that enumerator instance across multiple threads.
If thread that resets could exclusively use it after reset that it's ok. The new struct enumerator could be a field of a class enumerator and just re-created on Reset.
Plus, it's very non-discoverable.
Those who care about enumerator allocations will find it. This issue was the first Google result yesterday :)
Another issue is that it will still have interface method call and no inlining, but this is still so much better than allocations on every small broadcast message like in SignalR and my case.
It would be great to have something usable that didn't allocate (not sure about the reset hack). Here's what we currently do in SignalR:
Suddenly this has become almost the only allocator on my hot path (c.3Gb per minute). What are other current alternatives for small number of values without lock? I'm using it instead of missing but long-awaited in CoreFx ConcurrentHashSet (only keys).
The enumerator is implemented with
yield
keyword that does not supportReset
. However, theReset
method is still not deprecated officially, but it is optional. What if we could reuse the enumerator and callReset
on it instead ofGetEnumerator()
? That will allow to keep current API unchanged.
Funny. This allocator has become an outstanding hotspot in our memory intensive project as well. Maybe time to write a custom concurrent dictionary in the meantime? Or did you find a way to get rid of the allocations?
Due to lack of recent activity, this issue has been marked as a candidate for backlog cleanup. It will be closed if no further activity occurs within 14 more days. Any new comment (by anyone, not necessarily the author) will undo this process.
This process is part of the experimental issue cleanup initiative we are currently trialing in a limited number of areas. Please share any feedback you might have in the linked issue.
Will this be included in .NET 6
? .NET 7
?
That mswtfbot
wants to move this to the backlog? Meanwhile, I'm thinking about this issue every single night. In our mega large codebase, I tried using a lock + regular Dictionary, but soon realized it had even worse performance than ConcurrentDictionary
. Obviously, I soon need to write a new concurrent dictionary 😓 Nice...
This concept has implications for game dev scenarios in which zero allocation enumeration is needed, i.e. Unity.
Profiling a SignalR based game again, there's per frame cost allocating these that I would love to see go away in .NET 7.
@jaredpar, another category of examples binary obsoletion could help with, if we could come up with a decent way of having both signatures in the C# and not having to drop to IL to overload by return type.
@jaredpar, another category of examples binary obsoletion could help with, if we could come up with a decent way of having both signatures in the C# and not having to drop to IL to overload by return type.
We have another scenario for this in ASP.NET Core. My attempt was going to be introducing a source breaking change but not a binary breaking one by manipulating the ref assembly.
@davidfowl can you point me to the ASP.NET scenarios in this area? Trying to get the list together so we can find the best way to approach this problem in compiler / language.
Sent mail to avoid polluting this thread 😄
I'd appreciate seeing some canonical examples here too, perhaps we can find opportunities for escape analysis to allow stack allocation of one of these objects.
Do you want a GC benchmark example?
I found this issue when hunting GC waits
on a very hot path where GetEnumerator()
was significant.
I'd appreciate seeing some canonical examples here too, perhaps we can find opportunities for escape analysis to allow stack allocation of one of these objects.
Specific to ConcurrentDictionary<>
, they're generally of the form:
private ConcurrentDictionary<Key, Value> _data = new();
...
void Handle()
{
foreach (KeyValuePair<Key, Value> entry in _data)
{
...
}
}
with variations, like the dictionary being a static readonly field, or being a method parameter. e.g. https://github.com/dotnet/aspnetcore/blob/46562b1435bf111a7425b40f507b157b42a016a4/src/Components/Components/src/CascadingValueSource.cs#L95 https://github.com/dotnet/aspnetcore/blob/46562b1435bf111a7425b40f507b157b42a016a4/src/SignalR/server/Core/src/StreamTracker.cs#L79 https://github.com/dotnet/aspnetcore/blob/46562b1435bf111a7425b40f507b157b42a016a4/src/SignalR/server/StackExchangeRedis/src/Internal/AckHandler.cs#L53 https://github.com/dotnet/aspnetcore/blob/46562b1435bf111a7425b40f507b157b42a016a4/src/Servers/Kestrel/Core/src/Internal/Infrastructure/ConnectionManager.cs#L69 https://github.com/dotnet/aspnetcore/blob/46562b1435bf111a7425b40f507b157b42a016a4/src/Servers/Kestrel/Core/src/Internal/Infrastructure/TransportConnectionManager.cs#L59 https://github.com/dotnet/aspnetcore/blob/46562b1435bf111a7425b40f507b157b42a016a4/src/SignalR/server/Core/src/DefaultHubLifetimeManager.cs#L129C36-L129C47 https://github.com/dotnet/runtime/blob/b82b1d9218b8c68e39a1754c149396bad37a4a49/src/libraries/Microsoft.Extensions.Logging/src/LoggerFactory.cs#L126 https://github.com/dotnet/runtime/blob/b82b1d9218b8c68e39a1754c149396bad37a4a49/src/libraries/Microsoft.Extensions.Caching.Memory/src/MemoryCache.cs#L294 https://github.com/dotnet/runtime/blob/b82b1d9218b8c68e39a1754c149396bad37a4a49/src/libraries/System.Net.Http/src/System/Net/Http/SocketsHttpHandler/HttpConnectionPoolManager.cs#L462
More generally, we'll often see these enumerator allocations from methods that return an interface, e.g. IList<T>
or IEnumerable<T>
, and the consumer then just foreach's it or uses LINQ with it. In such cases, even if there's a type like List<T>
that provides a struct-based Enumerator, it still ends up allocating due to the boxing.
Thanks. With PGO the JIT can see through the interface calls, but it is not yet adept at converting a connected graph of checks into a check-free region that will let the JIT see that the enumerator does not escape (see eg https://github.com/dotnet/runtime/issues/62457). And escape analysis is disabled for ref classes and can't handle boxes yet.
The ultimate plan here would be to use GDV to devirtualize, fully inline the enumerator construction and enumerator calls, stack allocate the enumerator (either boxed struct or class), and then promote the enumerator fields. At that point the entire enumerator object likely evaporates, and we have something akin to a for loop over the concrete collection.
I don't know if we can get all this into .NET 9 but would like to continue to work on the necessary technology.
ConcurrentDictionary enables enumerating the dictionary while other threads are modifying it, making it very useful in a variety of scenarios. But its GetEnumerator allocates an enumerator object. We can't change the return type of GetEnumerator, but we can at least expose an enumerator struct to enable enumeration without allocating.
Proposal:
Example usage: