Get InvocationList of MulticastDelegate without any allocation

RamType0 commented 4 years ago

Background and Motivation

Currently,we have an API to invocation list of MulticastDelegate,it is MulticastDelegate.GetInvocationList().

It allocates array every time we call it. And maybe that's why we have internal HasSingleTarget property in it.

We need handsome,and performant API.

Proposed API

public class Delegate
{
    // Existing API
    // public virtual System.Delegate[] GetInvocationList();

    // Returns whether the delegate has a single target. It is a property to highlight that it is guaranteed 
    // to be a very cheap operation that is important for set of scenarios addressed by this proposal.
    // Alternative names: HasSingleTarget, IsTrueMulticast (inverted condition), HasSingleElementInvocationList
    public bool IsSingle { get; }

    // Returns length of the invocation list. It is a method to highlight that it may not be a very cheap operation.
    // This method is optional part of the proposal for completeness. It is fine to omit it from the approved shape.
    public int GetInvocationListLength();

    // Returns invocation list enumerator
    public static InvocationListEnumerator<TDelegate> EnumerateInvocationList<TDelegate>(TDelegate d) where TDelegate : Delegate;

    // The shape of this enumerator matches existing StringBuilder.ChunkEnumerator and Activity.Enumerator
    // Neither of these existing enumerators implement IEnumerable and IEnumerator. This can be changed. 
    // Q: How are we deciding whether enumerators like this should implement IEnumerable and IEnumerator?
    // Note that implementing these interfaces makes each instantiation of the type more expensive, so it is a tradeoff.
    public struct InvocationListEnumerator<TDelegate> where TDelegate : Delegate
    {
        public TDelegate Current { get; }
        public bool MoveNext();

        // EditorBrowsable to match existing StringBuilder.ChunkEnumerator and Activity.Enumerator
        [EditorBrowsable(EditorBrowsableState.Never)] // Only here to make foreach work
        public InvocationListEnumerator<TDelegate> GetEnumerator() => this;
    }
}

Original rejected proposal

public ReadOnlySpan<Delegate> InvocationList => delegates == null ? MemoryMarshal.CreateSpan(this,1) : delegates;

Dotnet-GitSync-Bot commented 4 years ago

I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label.

GrabYourPitchforks commented 4 years ago

We need handsome,and performant API.

This doesn't really describe the motivation, just the desired solution. Where do you see this API being used? Do you anticipate those scenarios seeing a measurable perf benefit? That would help justify the work needed for this API.

Thanks!

RamType0 commented 4 years ago

I have used this API to handle exception from each invoked delegate.

GrabYourPitchforks commented 4 years ago

Would syntax like the following work for you instead? I don't think there's a reliable way to create a one-element ROS<Delegate> without allocating or introducing a new field to MulticastDelegate, neither of which would be desirable.

SomeDelegate myDelegate = GetDelegate();
foreach (Delegate innerDelegate in myDelegate)
{
    var castDelegate = (SomeDelegate)innerDelegate;
    // use 'castDelegate' here
}

You should also spend some time prototyping this to make sure that the performance is what you expect. In your own application, make sure that the cost of this allocation really is measurable. You can use FieldInfo.GetValue via reflection to access the private _invocationList instance field for the purpose of testing.

RamType0 commented 4 years ago

Ah,I just mistaken that we need ref Delegate for CreateSpan,but not Delegate. But there are still available solution. alter type of MuticastDelegate.delegates to object,and just assign it self to it if it is not multicasting. Then, implementation would become

public ReadOnlySpan<Delegate> InvocationList => delegates == this ? MemoryMarshal.CreateSpan<Delegate>(ref delegates,1) : Unsafe.As<Delegate[]>( delegates);

GrabYourPitchforks commented 4 years ago

just assign it self to it if it is not multicasting.

That's probably not a viable solution. There's a lot of code in the runtime - both managed and unmanaged - that has knowledge about the exact layout of delegate instances and what all the different instance fields represent. It would be very risky (and possibly also a negative perf hit) to update all of those call sites. That's not a good tradeoff for a niche API.

RamType0 commented 4 years ago

Aren't you confusing it with field of Delegate? I saw many comment which is similar to what you say in Delegate. But I have not seen such a comment in MulticastDelegate.

GrabYourPitchforks commented 4 years ago

But I have not seen such a comment in MulticastDelegate.

See here for some examples. You can follow the call graphs backward and see that there's a decent amount of code which relies on this field containing a restricted set of possible values.

RamType0 commented 4 years ago

Oh,maybe I was seeing mono's one.

bollhals commented 3 years ago

SomeDelegate myDelegate = GetDelegate();
foreach (Delegate innerDelegate in myDelegate)
{
    var castDelegate = (SomeDelegate)innerDelegate;
    // use 'castDelegate' here
}
Such syntax (as long as it is not allocating) would actually be very nice for when you'd have to invoke each delegate within a try catch block to ensure each gets called. So e.g.
SomeDelegate myDelegate = GetDelegate();
foreach (Delegate innerDelegate in myDelegate)
{
try
{
// invoke single delegate target
((SomeDelegate)innerDelegate).Invoke()
}
catch (Exception exception)
{
//Do something with that exception
}
}

As of right now we (Example) just call GetInvocationList() which always allocates. (It shows up already in measurements, although not yet high up in the list. Still working on reducing others first)

RamType0 commented 3 years ago

I have just created benchmark project. In this project,we have these variants of GetInvocationlist.

CoreCLRGetInvocationList Just a copy from current CoreCLR's implementation.
MyGetInvocationList My optimized implementation of GetInvocationList without altering or adding any API.
UnsafeGetInvocationList My optimized allocation less new API. This is unsafe because the result will become wrong if we access the result after altering the delegate. But if we use this correctly, it is very fast and safe.


BenchmarkDotNet=v0.12.1, OS=Windows 10.0.19041.508 (2004/?/20H1)
Intel Core i9-9900K CPU 3.60GHz (Coffee Lake), 1 CPU, 16 logical and 8 physical cores
.NET Core SDK=5.0.100
  [Host]     : .NET Core 5.0.0 (CoreCLR 5.0.20.51904, CoreFX 5.0.20.51904), X64 RyuJIT
  DefaultJob : .NET Core 5.0.0 (CoreCLR 5.0.20.51904, CoreFX 5.0.20.51904), X64 RyuJIT

Method	N	Mean	Error	StdDev	Gen 0	Gen 1	Gen 2	Allocated
CoreCLRGetInvocationList	1	10.774 ns	0.1214 ns	0.1077 ns	0.0038	-	-	32 B
MyGetInvocationList	1	9.302 ns	0.1656 ns	0.1549 ns	0.0038	-	-	32 B
UnsafeGetInvocationList	1	2.917 ns	0.0370 ns	0.0346 ns	-	-	-	-
CoreCLRGetInvocationList	10	82.291 ns	0.7804 ns	0.7299 ns	0.0124	-	-	104 B
MyGetInvocationList	10	23.457 ns	0.4810 ns	0.4499 ns	0.0124	-	-	104 B
UnsafeGetInvocationList	10	7.849 ns	0.0776 ns	0.0726 ns	-	-	-	-
CoreCLRGetInvocationList	100	732.493 ns	10.2674 ns	9.6041 ns	0.0982	-	-	824 B
MyGetInvocationList	100	154.543 ns	3.0153 ns	4.2270 ns	0.0985	-	-	824 B
UnsafeGetInvocationList	100	53.075 ns	0.5351 ns	0.4743 ns	-	-	-	-
CoreCLRGetInvocationList	1000	7,922.301 ns	100.5212 ns	89.1095 ns	0.9460	-	-	8024 B
MyGetInvocationList	1000	1,405.915 ns	9.4829 ns	8.8703 ns	0.9575	-	-	8024 B
UnsafeGetInvocationList	1000	676.750 ns	6.3342 ns	5.6151 ns	-	-	-	-

GrabYourPitchforks commented 3 years ago

@RamType0 The benchmark code makes incorrect assumptions about the layout of that field, and it makes incorrect assumptions about how the C# language treats reachability of objects passed as in parameters. This could lead to reliability problems at runtime.

I'll re-up my comment from https://github.com/dotnet/runtime/issues/41849#issuecomment-686898306. We could probably add an enumerator to allow this scenario if it's really that important, but creating a ROS<T> is likely a non-starter. For now, a workaround could be to call GetInvocationList once and cache the result.

RamType0 commented 3 years ago

@RamType0 The benchmark code makes incorrect assumptions about the layout of that field, and it makes incorrect assumptions about how the C# language treats reachability of objects passed as in parameters. This could lead to reliability problems at runtime.

This caused just by MemoryMarshal.CreateReadOnlySpan. Actually, C# compiler correctly handles this.

public static ref readonly T UnsafeGetInvocationListByRefAndCount<T>(in T d,out int count)
            where T : MyMulticastDelegate
        {
            var _invocationList = d._invocationList;
            if (_invocationList != null)
            {
                var invocationList = Unsafe.As<object[]>(_invocationList);
                count = (int)d._invocationCount;
                return ref Unsafe.As<object, T>(ref MemoryMarshal.GetArrayDataReference(invocationList));
            }
            else
            {
                count = 1;
                return ref d;
            }
        }

So, it is not a UnsafeGetInvocationList specific problem. And also, this problem has been forgiven for MemoryMarshal.CreateReadOnlySpan, which is already approved public API.

I'll re-up my comment from #41849 (comment). We could probably add an enumerator to allow this scenario if it's really that important, but creating a ROS<T> is likely a non-starter. For now, a workaround could be to call GetInvocationList once and cache the result.

It is better than current API. But we have thought that List.Enumerator is slow, and we have finally created CollectionsMarshal.AsSpan. So I proposed API for creating ReadOnlySpan.

bollhals commented 3 years ago

I'll re-up my comment from #41849 (comment). We could probably add an enumerator to allow this scenario if it's really that important, but creating a ROS<T> is likely a non-starter. For now, a workaround could be to call GetInvocationList once and cache the result.

In which use case (where you'd use this API) would a ROS actually be more beneficial than an enumerator?

@GrabYourPitchforks In your example you've set the type of the loop variable to Delegate, I assume it wouldn't be possible and/or safe to type that to the actual delegate type (e.g. foreach (Action<int> innerDelegate in myDelegate)?

GrabYourPitchforks commented 3 years ago

@bollhals If it's an instance method, the type would be Delegate, and the caller would need to cast. (These casts should be very cheap.) If it's an extension method, it can probably be typed appropriately.

Joe4evr commented 3 years ago

But we have thought that List.Enumerator is slow

Exactly what is slow about it?

RamType0 commented 3 years ago

But we have thought that List.Enumerator is slow

Exactly what is slow about it?

using foreach for List<T> is slower than using foreach for T[]. Then, we have finally created CollectionsMarshal.AsSpan. This unsafe API was created just for performance.

RamType0 commented 3 years ago

@GrabYourPitchforks BTW, MyGetInvocationList is seems to be better than CoreCLRGetInvocationList without any API changes. If I created PR for it, is it beneficial?

BreyerW commented 3 years ago

If im not mistaken real reason was performance of list with large structs. Eg imagine a list with million Matrix4x4 structs. Each such matrix holds 16 floats making it pretty bulky and each such struct had to be copied every time a value was retrieved via foreach in our case meaning potentially 1 milion copies of relatively large struct. This can definitely tank performance. Class based cases are largely unaffected unless u wanted to swap references itself during iteration

RamType0 commented 3 years ago

If im not mistaken real reason was performance of list with large structs. Eg imagine a list with million Matrix4x4 structs. Each such matrix holds 16 floats making it pretty bulky and each such struct had to be copied every time a value was retrieved via foreach in our case meaning potentially 1 milion copies of relatively large struct. This can definitely tank performance. Class based cases are largely unaffected unless u wanted to swap references itself during iteration

At least, the posted benchmark uses int for element, which is smaller than "reference".

Joe4evr commented 3 years ago

using foreach for List<T> is slower than using foreach for T[].

Being skeptical of a post made more than 10 years ago, I went and did a proper benchmark.

BenchmarkDotNet=v0.12.1, OS=Windows 10.0.19041.572 (2004/?/20H1)
Intel Core i7-10610U CPU 1.80GHz, 1 CPU, 8 logical and 4 physical cores
.NET Core SDK=5.0.100
  [Host]        : .NET Core 3.1.9 (CoreCLR 4.700.20.47201, CoreFX 4.700.20.47203), X64 RyuJIT
  .NET Core 3.1 : .NET Core 3.1.9 (CoreCLR 4.700.20.47201, CoreFX 4.700.20.47203), X64 RyuJIT
  .NET Core 5.0 : .NET Core 5.0.0 (CoreCLR 5.0.20.51904, CoreFX 5.0.20.51904), X64 RyuJIT

Method	Job	Runtime	Mean	Error	StdDev	Median
LargeStructList	.NET Core 3.1	.NET Core 3.1	78.698 ns	0.8484 ns	0.7084 ns	78.545 ns
LargeStructArray	.NET Core 3.1	.NET Core 3.1	28.893 ns	0.2721 ns	0.2546 ns	28.873 ns
ClassList	.NET Core 3.1	.NET Core 3.1	73.004 ns	1.4792 ns	3.0217 ns	74.361 ns
ClassArray	.NET Core 3.1	.NET Core 3.1	8.114 ns	0.0707 ns	0.0661 ns	8.104 ns
IntList	.NET Core 3.1	.NET Core 3.1	42.759 ns	0.2340 ns	0.1954 ns	42.711 ns
IntArray	.NET Core 3.1	.NET Core 3.1	7.497 ns	0.1566 ns	0.1465 ns	7.479 ns
LargeStructList	.NET Core 5.0	.NET Core 5.0	70.169 ns	0.9476 ns	0.8400 ns	69.955 ns
LargeStructArray	.NET Core 5.0	.NET Core 5.0	24.355 ns	0.5135 ns	0.9893 ns	24.142 ns
ClassList	.NET Core 5.0	.NET Core 5.0	67.844 ns	0.6781 ns	0.6343 ns	67.873 ns
ClassArray	.NET Core 5.0	.NET Core 5.0	6.542 ns	0.0661 ns	0.0586 ns	6.543 ns
IntList	.NET Core 5.0	.NET Core 5.0	44.817 ns	0.7760 ns	0.7258 ns	44.625 ns
IntArray	.NET Core 5.0	.NET Core 5.0	7.864 ns	0.2957 ns	0.8718 ns	7.717 ns

So I concede that lists are slower than a raw array, however I also don't think this difference is the end of the world, provided you're not literally on fire on a burning hot path. Now I'm very curious on where this difference comes from, since I was assuming it could get much closer to array performance than this.

stephentoub commented 3 years ago

Now I'm very curious on where this difference comes from

foreach on an array is desugared by the C# compiler into simple for loop. foreach on a list needs to go through its struct Enumerator, does version checks to make sure the list hasn't changed between iterations, etc.

e.g. https://sharplab.io/#v2:EYLgxg9gTgpgtADwGwBYA0AXEBDAzgWwB8ABAJgEYBYAKGIAYACY8lAbhpuIGYnSGBhBgG8aDMUx4BLAHYYGAQShRsATwAUMjAG0Aug0kYY+XAEpR4kdXHX9shrgCu+BgF4GddlZtiAZtBjYYAAWDBp2krb6hsYm9k4MANRukp7eYsQA7HH4qeIAvuZihRK2cgAykrgYahVVADyaAHxRRqbFlmml2a7uud5+sIEhYXIRMi0x3Un6fTaZ2bMF1HlAA===

RamType0 commented 3 years ago

@Joe4evr Thanks for posting the benchmark! I couldn't find benchmark in recent environment.

So I concede that lists are slower than a raw array, however I also don't think this difference is the end of the world, provided you're not literally on fire on a burning hot path. Now I'm very curious on where this difference comes from, since I was assuming it could get much closer to array performance than this.

"foreach" for span or array is converted to "for". And also, the generated pattern of "for" eliminates bounds check to throwing IndexOutOfRangeException. Both of these optimization is specific for array or span.

It is also seems that IntList is significant faster than ClassList. Maybe this is because of array's covariant check.

GrabYourPitchforks commented 3 years ago

The CollectionsMarshal class was added to allow efficient bulk operations over the list, such as bulk moving data or bulk modification of data. It wasn't added because of any perceived deficiency in List's enumerator.

We're getting very off-track here. If the request is specifically for projection of a MulticastDelegate as a ROS over its inner targets, then that is not feasible (as previously mentioned) and this issue should be closed. This issue provided several alternative designs, but if we don't have a consumer for those alternative APIs then this work will never rise above the cut line for any release.

bollhals commented 3 years ago

We're getting very off-track here. If the request is specifically for projection of a MulticastDelegate as a ROS over its inner targets, then that is not feasible (as previously mentioned) and this issue should be closed. This issue provided several alternative designs, but if we don't have a consumer for those alternative APIs then this work will never rise above the cut line for any release.

For me I think the main goal should be that there is a simple and easily accessable way (that has zero allocation) to iterate over the individual delegates. Ideally this improves performance but this should not be performance critical.

GrabYourPitchforks commented 3 years ago

Something that was asked earlier and never answered: why is it infeasible to call GetInvocationList (taking the allocation hit) once, then cache the result? That would solve the immediate problem, and it doesn't require any new API.

RamType0 commented 3 years ago

Something that was asked earlier and never answered: why is it infeasible to call GetInvocationList (taking the allocation hit) once, then cache the result? That would solve the immediate problem, and it doesn't require any new API.

In scenario with concurrency and multiple delegates, we need 'ConcurrentDictionary<Delegate,Delegate[]?> ' for caching it. ConcurrentDictionary allocates a lot. And also, it is inconvinient.

How about TryGetInvocationList? It only returns invocation list when it was multicasted.

GrabYourPitchforks commented 3 years ago

How about TryGetInvocationList?

I don't think a Try method is appropriate here. If there's new API being added, it should work in all cases.

In scenario with concurrency and multiple delegates, we need ConcurrentDictionary<Delegate,Delegate[]?> for caching it.

No you don't. The typical scenario for extracting the invocation list is when implementing an event handler. You control the implementation of the add and remove events, which means you can do something like (pseudocode):

private object _lockObj = new object();
private MyEventHandler _myEvent;
private MyEventHander[] _invocationList;

public event MyEventHandler MyEvent
{
    add { lock(_lockObj) { _myEvent += value; _invocationList = _myEvent.GetInvocationList(); } }
    remove { lock(_lockObj) { _myEvent -= value; _invocationList = _myEvent.GetInvocationList(); } }
}

private void FireEvents()
{
    MyEventHandler[] invocationList;
    lock (_lockObj) { invocationList = _invocationList; }

    foreach (var handler in invocationList)
    {
        // do something
    }
}

The only time you ever have to update the cached invocation list is when somebody adds or removes a handler from the event. In all other cases, especially where you take a method that takes an opaque delegate as a parameter, it's generally not appropriate to crack open the delegate and retrieve its invocation list. In those scenarios, it's generally more appropriate to treat the delegate as a single opaque Invoke method and to assume that the caller constructed it using whatever wrapping was appropriate for their own scenario.

alexrp commented 2 years ago

For async events (i.e. returning Task/ValueTask) to work, iterating the invocation list and invoking/awaiting them individually while collecting exceptions is necessary. There's a good example of this in Microsoft.VisualStudio.Threading:

https://github.com/microsoft/vs-threading/blob/73aa23b392d0d031e7eff121125803df25a14f87/src/Microsoft.VisualStudio.Threading/TplExtensions.cs#L276-L304

Having a way to enumerate the handlers without allocating is pretty important for cases like this. While something like https://github.com/dotnet/runtime/issues/41849#issuecomment-821870977 will work, it's not ergonomic to say the least. The idea in https://github.com/dotnet/runtime/issues/41849#issuecomment-686898306 seems completely sensible to me to fulfill this need.

RamType0 commented 2 years ago

How about TryGetInvocationList?

I don't think a Try method is appropriate here. If there's new API being added, it should work in all cases.

In scenario with concurrency and multiple delegates, we need ConcurrentDictionary<Delegate,Delegate[]?> for caching it.

No you don't. The typical scenario for extracting the invocation list is when implementing an event handler. You control the implementation of the add and remove events, which means you can do something like (pseudocode):
private object _lockObj = new object();
private MyEventHandler _myEvent;
private MyEventHander[] _invocationList;

public event MyEventHandler MyEvent
{
    add { lock(_lockObj) { _myEvent += value; _invocationList = _myEvent.GetInvocationList(); } }
    remove { lock(_lockObj) { _myEvent -= value; _invocationList = _myEvent.GetInvocationList(); } }
}

private void FireEvents()
{
    MyEventHandler[] invocationList;
    lock (_lockObj) { invocationList = _invocationList; }

    foreach (var handler in invocationList)
    {
        // do something
    }
}
The only time you ever have to update the cached invocation list is when somebody adds or removes a handler from the event. In all other cases, especially where you take a method that takes an opaque delegate as a parameter, it's generally not appropriate to crack open the delegate and retrieve its invocation list. In those scenarios, it's generally more appropriate to treat the delegate as a single opaque Invoke method and to assume that the caller constructed it using whatever wrapping was appropriate for their own scenario.

Does this work for nested multicast delegate?

Also, to achieve my motivation(handle exception from each invoked delegate), we must call get invocation list recursively. And we need to stop recursively doing this when we reached non-multicasting delegate. So I think Try is more suitable.

jkotas commented 11 months ago

We could probably add an enumerator to allow this scenario if it's really that important, but creating a ROS is likely a non-starter.

+1 . This needs to be an enumerator. Returning ROS would tie this API to delegate internal implementation details that is not non-desirable.

jkotas commented 11 months ago

@eerhardt @GrabYourPitchforks @stephentoub I have update the API proposal at the top to enumerator. Thoughts about the API shape?

RamType0 commented 11 months ago

I want variant for how it handles nested multicast delegate.(Automatically unrolls it or not)

alexrp commented 11 months ago

What do you mean by "nested multicast delegate"?

RamType0 commented 11 months ago

What do you mean by "nested multicast delegate"?

The multicast delegate which is made from multiple multicast delegates.

alexrp commented 11 months ago

Can you give a concrete example of what you mean? Multicast delegates should already be 'flattened' by virtue of how they work.

eerhardt commented 11 months ago

Looking at the StackExchange.Redis code that is trying to do this, I see 2 categories of usage:

Ensuring all handlers are invoked, even if one of them throws an exception (same scenario as mentioned above).
- https://github.com/StackExchange/StackExchange.Redis/blob/510b5e21b256b5d4226c79062b8ecfc4d2265817/src/StackExchange.Redis/ConnectionMultiplexer.Threading.cs#L23-L43
- https://github.com/StackExchange/StackExchange.Redis/blob/510b5e21b256b5d4226c79062b8ecfc4d2265817/src/StackExchange.Redis/MessageCompletable.cs#L30-L40
Getting the invocation count
- https://github.com/StackExchange/StackExchange.Redis/blob/510b5e21b256b5d4226c79062b8ecfc4d2265817/src/StackExchange.Redis/RedisSubscriber.cs#L246-L258

I have update the API proposal at the top to enumerator. Thoughts about the API shape?

My initial questions are:

Why is it a static method? Why not an instance method?
Can it be updated to have an Enumerable and an Enumerator? That way it can be used in a foreach? Possibly something very similar to how Pipelines.Sockets.Unofficial defined its extension method (only just an normal instance method and not an extension method)? https://github.com/mgravell/Pipelines.Sockets.Unofficial/blob/6703efc310131037ee55b3136ff0d2d9e0df1e3f/src/Pipelines.Sockets.Unofficial/Delegates.cs#L24-L28

stephentoub commented 11 months ago

Can it be updated to have an Enumerable and an Enumerator? That way it can be used in a foreach?

StringBuilder.GetChunks returns an "enumerator" that also meets the foreach pattern, as it exposes a GetEnumerator method that just returns itself. We could do the same here.

jkotas commented 11 months ago

Why is it a static method? Why not an instance method?

If it was an instance method, it would need to be Delegate and callers would need to cast. I think we want to the API to be strongly typed to avoid the cast. I have changed it to be static extension method. Does it look better?

Can it be updated to have an Enumerable and an Enumerator?

Done.

alexrp commented 11 months ago

Would suggest EnumerateInvocationList or similar instead of GetInvocationListAsEnumerable. See string.EnumerateRunes() as an example of a similar pattern:

jkotas commented 11 months ago

Would suggest EnumerateInvocationList or similar instead of GetInvocationListAsEnumerable. See string.EnumerateRunes() as an example of a similar pattern:

Updated. Thank you!

eerhardt commented 11 months ago

The API shape looks good to me. Thanks for updating it @jkotas.

Anyone object to marking this "ready for review" and in the 9.0 milestone?

MichalPetryka commented 11 months ago

Would changing IsSingleInvocation to an InvocationCount make sense here?

weltkante commented 11 months ago

Would changing IsSingleInvocation to an InvocationCount make sense here?

I don't think the count is stored currently? It probably would need to iterate the enumerator to calculate the count, the check for single/multicast delegates is the common case checked for and should be cheap

RamType0 commented 11 months ago

Invocation list is stored as array. So to get invocation count, just return length of the array or return 1 if it is single invocation. (If this is true: https://github.com/dotnet/runtime/issues/41849#issuecomment-1826404517)

jkotas commented 11 months ago

So to get invocation count, just return length of the array

The array can have extra space at the end.

Current delegate multicast implementations have the count stored in one of the other delegate fields, so it is still possible to implement without iterating over the whole list. InvocationCount is slightly less efficient than IsSingleInvocation for the common pattern:

class Delegate
{
    public bool IsSingleInvocation => m_helperObject is not Delegate[];
    public int InvocationCount => (m_helperObject is Delegate[]) ? (int)m_extraFunctionPointerOrData : 1;
}
...
void Example1(Action myDelegate)
{
    if (myDelegate.IsSingleInvocation)
    {
        // ... path optimized for single-cast delegates ...
    }
    else
    {
        // ... general path using enumerator ...
    }
}

void Example2(Action myDelegate)
{
    if (myDelegate.InvocationCount == 1)
    {
        // ... path optimized for single-cast delegates ...
    }
    else
    {
        // ... general path using enumerator ...
    }
}

Maybe we should have both methods? Similar to how we have Span.IsEmpty and Span.Length.

MichalPetryka commented 11 months ago

Maybe we should have both methods?

If both are exposed, would a method that'd copy the delegates into a provided span make sense too?

jkotas commented 11 months ago

If both are exposed, would a method that'd copy the delegates into a provided span make sense too?

We can always add it later if somebody comes up with a good use case for it. The proposal covers the motivating scenarios as is.

terrajobst commented 10 months ago

IsSingleInvocation and InvocationCount aren't related to the invocation list and thus shouldn't be expose
- However, if there is a value with exposing a property or method for GetInvocationList().Length that would be fine
We don't believe it's a mainline API that needs to be discoverable by dotting into a delegate, hence a static method on Delegate is more desireable to not add one more type to the System namespace. Also, we don't think we're likely adding more APIs to DelegateExtensions.
We decided to add IEnumerable<T> to make the enumerator more useful
Usage of GetInvocationList() is quite high(apisof.net, grep.app)
- Cursory look through grep.app seems to suggest that many users would be happy with EnumerateInvocationList()
- Might be worth having an analyzer that suggests the re-write

namespace System;

public class Delegate
{
    // Existing API
    // public virtual System.Delegate[] GetInvocationList();

    public static InvocationListEnumerator<TDelegate> EnumerateInvocationList<TDelegate>(TDelegate d) where TDelegate : Delegate;

    public struct InvocationListEnumerator<TDelegate> where TDelegate : Delegate, IEnumerable<TDelegate>, IEnumerator<TDelegate>
    {
        public TDelegate Current { get; }
        public bool MoveNext();
        public InvocationListEnumerator<TDelegate> GetEnumerator() => this;
    }
}

weltkante commented 10 months ago

Cursory look through grep.app seems to suggest that many users would be happy with EnumerateInvocationList()

Looking through existing users won't show you the potential of a fast path that just checks for single/multicast delegates distinction, since the currently existing API always requires you to allocate the array there is no point of building a fast path so nobody does.

All usages I ever have written could have used a fast path and not calling GetInvocationList() at all, just use the delegate you get passed directly. Calling GetInvocationList() is only required to be correct for edge cases where users pass multicast delegates which is a tiny amount of callers in my experience.

Given the enumerator is a struct and doesn't allocate I probably can write a check for a fast path on top of it but it feels like a lost opportunity to not give a public API for checking for the fast path.

dotnet / runtime

Get InvocationList of MulticastDelegate without any allocation #41849

Background and Motivation

Proposed API