Support struct Enumerator for ConcurrentDictionary

stephentoub commented 6 years ago

ConcurrentDictionary enables enumerating the dictionary while other threads are modifying it, making it very useful in a variety of scenarios. But its GetEnumerator allocates an enumerator object. We can't change the return type of GetEnumerator, but we can at least expose an enumerator struct to enable enumeration without allocating.

Proposal:

namespace System.Collections.Concurrent
{
    public class ConcurrentDictionary<TKey, TValue>
    {
        public IEnumerator<KeyValuePair<TKey, TValue>> GetEnumerator(); // existing

        public struct Enumerator // new
        {
            public Enumerator(ConcurrentDictionary<TKey, TValue> dictionary);
            public bool MoveNext();
            public KeyValuePair<TKey, TValue> Current { get; }
        }
    }
}

Example usage:

private static ConcurrentDictionary<string, int> s_dict = new ConcurrentDictionary<string, int>();
...
var enumerator = new ConcurrentDictionary<string, int>.Enumerator(s_dict);
while (enumerator.MoveNext())
{
    Use(enumerator.Current);
}

karelz commented 6 years ago

@stephentoub is Future ok? Or do you think we should push it into 2.1?

stephentoub commented 6 years ago

Future is fine

davidfowl commented 6 years ago

Is this difficult to do? I'd love to measure our broadcast performance scenario with this. We currently filed this issue https://github.com/aspnet/SignalR/issues/1796 to try and work around allocating the enumerator per message (which we're currently doing).

Joe4evr commented 6 years ago

We can't change the return type of GetEnumerator, but we can at least expose an enumerator struct to enable enumeration without allocating.

Isn't the compiler (and/or JIT) already smart enough to emit direct calls for struct enumerators? :thinking:

Unless you meant that changing from a class to a struct enumerator is a binary breaking change, in which case.... Would it be suitable to quirk instead?

I just wouldn't like have having to remember to use the more verbose syntax over the simplicity of foreach for one specific collection type to keep GC pressure down.

stephentoub commented 6 years ago

Is this difficult to do?

No, the iterator just needs to be rewritten manually.

Unless you meant that changing from a class to a struct enumerator is a binary breaking change

Correct. Changing the GetEnumerator signature is a breaking change.

Would it be suitable to quirk instead?

No, not for API signature changes.

stephentoub commented 6 years ago

Is this difficult to do?

@davidfowl, e.g. something like this: https://github.com/stephentoub/corefx/commit/d11bdcd0763b8913e78a237e631a8abe3c0afa68

terrajobst commented 6 years ago

I'm not sure I like it because it seems heavy-handed and the consuming code cannot use foreach.

@jaredpar is there any way we could change the compiler rules for IEnumerable to prefer a struct-based enumerator when GetEnumerator() is already taken? I suspect we'll have this problem in other areas too. @bartonjs suggested that we could add another property/method that returns a struct that provides a struct-based GetEnumerator() method. Again, feels heavy-handed...

jaredpar commented 6 years ago

is there any way we could change the compiler rules for IEnumerable ...

No

This is one of the most used patterns in .NET. Changing the compiler rules for how we bind existing IEnumerable is a back compat issue.

stephentoub commented 6 years ago

This has probably been discussed before, but is there anything reasonable we could do in the runtime to make it a non-breaking change to change a signature from:

public IEnumerator<T> GetEnumerator() { ... }

to

public Enumerator GetEnumerator() { ... }

public struct Enumerator : IEnumerator<T> { ... }

such that code compiled against the former would "just work" with an implementation containing the latter? Then code compiled against the new signature would get the allocation benefit but code compiled against the older signature would still work.

cc: @jkotas

jkotas commented 6 years ago

You can have both Enumerator GetEnumerator() and IEnumerator<T> GetEnumerator() methods in the IL. It is just that you cannot do that in C# today. It is possible to hack it via IL rewriting, e.g. the implementation can contain both methods, but reference assembly just the new one. The problem with introducing such non-standard overload in the implementation is that it would break reflection or scripting scenarios.

The other possible approach is to wait for the JIT to get better and optimize out the allocation in the current less than ideal pattern for you.

svick commented 6 years ago

@jkotas

It is possible to hack it via IL rewriting, e.g. the implementation can contain both methods, but reference assembly just the new one.

Wouldn't that mean that if you upgrade to the newer reference assembly, code that does any kind of type inference will start behaving differently? E.g.:

var e = cdict.GetEnumerator(); // is the type of e Enumerator or IEnumerator<T>?

void F<T>(T x) {}
F(cdict.GetEnumerator()); // is the inferred T Enumerator or IEnumerator<T>?

I think that's an unacceptable breaking change.

jkotas commented 6 years ago

This is a general problem with introducing overloads. If the new overload has same behavior as the old one (ie it is unlikely to change the behavior of existing code silently), I believe it has been considered as an acceptable option. It has been treated on case by case.

terrajobst commented 6 years ago

Video

The options listed here are all somewhat unfortunate (that is, they suck for various reasons :-)).

One compromise would be to add a new method called GetEnumerator2() and change the compiler to support foreach-ing enumerators and code would like this:

var dict = new ConcurrentDictionary<string, int>();
// add stuff

foreach (var kv in dict.GetEnumerator2())
{
    Use(enumerator.Current);
}

The established patterns for struct-based alternative is ValueXxx, like ValueTuple or ValueTask but in this case value might be a bad idea due to the generic parameter TValue.

@jaredpar, are you still supportive of this change to the compiler? @stephentoub, are you OK with the resulting usage?

svick commented 6 years ago

@terrajobst How is that better than a method like GetStructEnumerable() that returns a struct whose GetEnumerator() returns the struct enumerator (which you called "heavy-handed" before)?

With GetStructEnumerable():

You don't need to change the language (or the compiler).
The usage stays the same as with GetEnumerator2 (you just change the name of the called method).
The API surface does become larger, but if that's an issue, you could combine the enumerable and the enumerator into a single type.
Performance should not be worse, since the additional call should be inlineable.

jaredpar commented 6 years ago

are you still supportive of this change to the compiler?

I'm personally comfortable with the idea of allowing foreach on IEnumerator<T>. It's a language change though hence has to go through LDM. Happy to represent the issue though.

jnm2 commented 6 years ago

There have been a number of times I've wanted foreach on an IEnumerator<T>-patterned struct. This language change would be welcome!

buybackoff commented 6 years ago

Suddenly this has become almost the only allocator on my hot path (c.3Gb per minute). What are other current alternatives for small number of values without lock? I'm using it instead of missing but long-awaited in CoreFx ConcurrentHashSet (only keys).

The enumerator is implemented with yield keyword that does not support Reset. However, the Reset method is still not deprecated officially, but it is optional. What if we could reuse the enumerator and call Reset on it instead of GetEnumerator()? That will allow to keep current API unchanged.

stephentoub commented 6 years ago

The other possible approach is to wait for the JIT to get better and optimize out the allocation in the current less than ideal pattern for you.

@jkotas, are we any closer to this happening? It's been discussed for years. :)

What if we could reuse the enumerator and call Reset on it instead of GetEnumerator()?

Interesting idea. I'd be ok with that as a stopgap measure, but it wouldn't fully replace the benefits of GetEnumerator being allocation-free, in particular because you couldn't share that enumerator instance across multiple threads. Plus, it's very non-discoverable.

buybackoff commented 6 years ago

in particular because you couldn't share that enumerator instance across multiple threads.

If thread that resets could exclusively use it after reset that it's ok. The new struct enumerator could be a field of a class enumerator and just re-created on Reset.

Plus, it's very non-discoverable.

Those who care about enumerator allocations will find it. This issue was the first Google result yesterday :)

Another issue is that it will still have interface method call and no inlining, but this is still so much better than allocations on every small broadcast message like in SignalR and my case.

davidfowl commented 6 years ago

It would be great to have something usable that didn't allocate (not sure about the reset hack). Here's what we currently do in SignalR:

https://github.com/aspnet/SignalR/blob/2d4fb0af6fd2ef2e3a059a2a15a56484d8800d35/src/Microsoft.AspNetCore.SignalR.Core/HubConnectionStore.cs#L44

znakeeye commented 5 years ago

Suddenly this has become almost the only allocator on my hot path (c.3Gb per minute). What are other current alternatives for small number of values without lock? I'm using it instead of missing but long-awaited in CoreFx ConcurrentHashSet (only keys).

The enumerator is implemented with yield keyword that does not support Reset. However, the Reset method is still not deprecated officially, but it is optional. What if we could reuse the enumerator and call Reset on it instead of GetEnumerator()? That will allow to keep current API unchanged.

Funny. This allocator has become an outstanding hotspot in our memory intensive project as well. Maybe time to write a custom concurrent dictionary in the meantime? Or did you find a way to get rid of the allocations?

ghost commented 3 years ago

Due to lack of recent activity, this issue has been marked as a candidate for backlog cleanup. It will be closed if no further activity occurs within 14 more days. Any new comment (by anyone, not necessarily the author) will undo this process.

This process is part of the experimental issue cleanup initiative we are currently trialing in a limited number of areas. Please share any feedback you might have in the linked issue.

znakeeye commented 3 years ago

Will this be included in .NET 6? .NET 7?

That mswtfbot wants to move this to the backlog? Meanwhile, I'm thinking about this issue every single night. In our mega large codebase, I tried using a lock + regular Dictionary, but soon realized it had even worse performance than ConcurrentDictionary. Obviously, I soon need to write a new concurrent dictionary 😓 Nice...

stonstad commented 3 years ago

This concept has implications for game dev scenarios in which zero allocation enumeration is needed, i.e. Unity.

davidfowl commented 2 years ago

Profiling a SignalR based game again, there's per frame cost allocating these that I would love to see go away in .NET 7.

stephentoub commented 2 years ago

@jaredpar, another category of examples binary obsoletion could help with, if we could come up with a decent way of having both signatures in the C# and not having to drop to IL to overload by return type.

davidfowl commented 2 years ago

@jaredpar, another category of examples binary obsoletion could help with, if we could come up with a decent way of having both signatures in the C# and not having to drop to IL to overload by return type.

We have another scenario for this in ASP.NET Core. My attempt was going to be introducing a source breaking change but not a binary breaking one by manipulating the ref assembly.

jaredpar commented 2 years ago

@davidfowl can you point me to the ASP.NET scenarios in this area? Trying to get the list together so we can find the best way to approach this problem in compiler / language.

davidfowl commented 2 years ago

Sent mail to avoid polluting this thread 😄

AndyAyersMS commented 1 year ago

I'd appreciate seeing some canonical examples here too, perhaps we can find opportunities for escape analysis to allow stack allocation of one of these objects.

znakeeye commented 1 year ago

Do you want a GC benchmark example?

I found this issue when hunting GC waits on a very hot path where GetEnumerator() was significant.

stephentoub commented 1 year ago

I'd appreciate seeing some canonical examples here too, perhaps we can find opportunities for escape analysis to allow stack allocation of one of these objects.

Specific to ConcurrentDictionary<>, they're generally of the form:

private ConcurrentDictionary<Key, Value> _data = new();
...
void Handle()
{
    foreach (KeyValuePair<Key, Value> entry in _data)
    {
       ...
    }
}

More generally, we'll often see these enumerator allocations from methods that return an interface, e.g. IList<T> or IEnumerable<T>, and the consumer then just foreach's it or uses LINQ with it. In such cases, even if there's a type like List<T> that provides a struct-based Enumerator, it still ends up allocating due to the boxing.

AndyAyersMS commented 1 year ago

Thanks. With PGO the JIT can see through the interface calls, but it is not yet adept at converting a connected graph of checks into a check-free region that will let the JIT see that the enumerator does not escape (see eg https://github.com/dotnet/runtime/issues/62457). And escape analysis is disabled for ref classes and can't handle boxes yet.

The ultimate plan here would be to use GDV to devirtualize, fully inline the enumerator construction and enumerator calls, stack allocate the enumerator (either boxed struct or class), and then promote the enumerator fields. At that point the entire enumerator object likely evaporates, and we have something akin to a for loop over the concrete collection.

I don't know if we can get all this into .NET 9 but would like to continue to work on the necessary technology.

dotnet / runtime

Support struct Enumerator for ConcurrentDictionary #25448