Allow unsealed delegates to reduce allocations for lambdas

svick commented 7 years ago

Currently, the requirements for delegates are (from ECMA-335 §II.14.6 Delegates):

Delegates shall be declared sealed, and the only members a delegate shall have are [the] methods as specified here.

What if this requirement was weakened, so that delegates were allowed to be unsealed and also allowed to contain other members, especially fields? I think it would allow the C# compiler (and other compilers) to reduce the number of allocations required for lambdas that close over local variables, by fusing the delegate and the closure into a single object. (It also might allow F# to start using delegate types instead of FSharpFunc, but I'm not sure about that.)

For example, consider the following C# code:

delegate int FuncInt();

void M()
{
    int i = 42;
    FuncInt f = () => i;
    f();
}

Currently, the generated code looks roughly like this:

sealed class FuncInt : MulticastDelegate
{
    public extern FuncInt(object obj, IntPtr method);

    public virtual extern int Invoke();
}

sealed class Closure
{
    public int i;

    internal int Main_f() => this.i;
}

void Main()
{
    Closure closure = new Closure();
    closure.i = 42;
    FuncInt f = new FuncInt(closure,  __methodptr(Closure.Main_f));
    f.Invoke();
}

Notice that there are two allocations for a single lambda: the closure and the delegate.

If the requirements for delegates were weakened, the generated code could instead be:

class FuncInt : MulticastDelegate // no longer sealed
{
    public extern FuncInt(object obj, IntPtr method);

    public virtual extern int Invoke();
}

sealed class Closure : FuncInt // delegate class containing fields and custom methods
{
    public Closure() : base(this, __methodptr(Main_f)) {}

    public int i;

    private int Main_f() => this.i;
}

void Main()
{
    Closure closure = new Closure();
    closure.i = 42;
    FuncInt f = closure;
    f.Invoke();
}

This way, the two allocations are fused, which should make the code more efficient.

Some other notes:

In cases when a single closure is used for multiple lambdas, this would still avoid one delegate allocation (but not the others).
I think this also shouldn't interfere with the multicasting nature of delegates, since the runtime-provided fields would still be there.
Since I believe that unsealing a class is not a breaking change, unsealing framework-provided delegate types (like the Func and Action types) in some future version of the framework should be fine. Existing code would then become more efficient just by using a newer compiler and targeting a newer framework.

What do you think?

brandon942 commented 7 years ago

I assume the Closure holds a reference to the local variable and not a copy. Then it's all about performance vs memory. I'm all for performance.

svick commented 7 years ago

@brandon942

I assume the Closure holds a reference to the local variable and not a copy.

Neither. The closure holds the variable itself. If you want to access the variable, you have to go through the closure.

Then it's all about performance vs memory. I'm all for performance.

I don't see that kind of trade-off here. Doing what I proposed would save some memory and also improve performance (by decreasing GC pressure and probably also by improving cache behavior).

jkotas commented 7 years ago

This optimization has potential to save 16 bytes per closure on 64-bit platforms. The memory bytes is what matters for GC pressure here. Two small objects vs. one larger object is not going to make much difference.

It is expensive optimization to implement because of it changes the .NET type system. The first step should be to collect data about the benefit. E.g. how much of the GC heap in typical ASP.NET app are closures, how much would it save or make things faster.

Also, how would it work when there is more than one closure created in the method?

void M()
{
    int i = 42;
    FuncInt f = () => i;
    FuncInt g = () => i+1;
    f();
    g();
}

svick commented 7 years ago

@jkotas

The first step should be to collect data about the benefit. E.g. how much of the GC heap in typical ASP.NET app are closures, how much would it save or make things faster.

Yeah, that makes sense.

how would it work when there is more than one closure created in the method?

This example has only one closure (containing i), but two delegates using it. In that case, the optimization would have to be applied only to one of the two delegate allocations. I.e. it would go from "1 closure allocation + 2 delegate allocations" to "1 fused delegate-closure allocation + 1 delegate allocation".

I now realize this could increase the amount of used memory if the non-fused delegate lived longer than the fused one (by the size of delegate's fields, which is 4 pointers). I'm not sure how big of a problem this is.

mattwarren commented 7 years ago

The first step should be to collect data about the benefit. E.g. how much of the GC heap in typical ASP.NET app are closures, how much would it save or make things faster.

If anyone is interested in doing this, you might want to have a look at some code I've written that uses CLRMD to analyse heap dumps. It shouldn't be too hard to adapt it, so instead of looking for strings, it looks for instances of delegates.

svick commented 7 years ago

I have attempted to measure how much would this save by running the smoke test suite of the aspnet/MusicStore sample app 25 times. I'm not sure how typical app MusicStore is and this usage pattern definitely isn't typical (for example, it opens the home page as often as changing a user's password), but I think it could be close enough.

I ran the above under the VS memory profiler, exported allocations data into XML and processed the XML. The processed data can be seen here (430 kB).

Here is a screenshot of the profiler summary, so that you can see if it aligns with your idea of what a "typical" app does:

The results are:

All types:
- Types: 3 494
- Allocations: 25 591 080
- Bytes: 1 461 285 878
Closures (DisplayClass):
- Types: 296
- Allocations: 568 065
- Bytes: 17 355 488
Delegates:
- Types: 62
- Allocations: 656 929
- Bytes: 42 043 456

This shows that the number of allocated delegates and closures are not that far apart, which might indicate that many of them could be fused. If all closures were fused, this would save about 570 000 allocations (2.2 % of all allocations) and 8.7 MiB (0.62 % of all allocated bytes).

If this data is sufficiently representative, then I think it means that doing this optimization is not worth it, because the savings are tiny, compared with the significant cost.

On the other hand, this data shows that removing one or more of the 6 pointer-sized fields that are allocated for each delegate could give comparable savings in terms of allocated bytes, with much smaller cost.

RamType0 commented 4 years ago

I thought that the main pros of this feature is reducing number of objects and locality of memory access,not a total size of objects.

I saw that GC performance is affected mainly by number of objects.

jkotas commented 4 years ago

I saw that GC performance is affected mainly by number of objects.

This post is discussing "GC heap scanning performance" that is just one of the multiple components of total GC cost.

BTW: This is not the first post with misleading information that I have seen on stackoverflow. I have tried to correct the misleading information a few times before, but it was always rejected by the moderators. I am not even trying anymore.

@svick 's back of the envelope estimate looks accurate to me. This is unlikely to have material impact on performance of real world apps and there are cheaper ways to achieve savings of similar magnitude.

davidfowl commented 3 years ago

I was looking into this feature as a way to replace the IActionResult interface in ASP.NET Core MVC.

Today the core ASP.NET request processing primitive is RequestDelegate and would like to be able to use it in more places. Instead, MVC has IActionResult is basically the interface equivalent of this (well slightly different but that's the intent). Now I'm looking to unify the request processing and result processing and it would be nice to allow returning a RequestDelegate directly.

public class HomeController
{
     [HttpGet("/hello")]
     public RequestDelegate Hello()
     {
         return context => context.Response.WriteAsync("Hello World");
     }
}

This is workable but I would love to allow polymorphism to enable other code to change the results.

public class JsonResult : RequestDelegate
{
    private object _value;

    public JsonResult(object value) 
    { 
        _value = value;
    }

    public override Task Invoke(HttpContext context) => context.Response.WriteAsJsonAsync(_value);
}

public class HomeController
{
     [HttpGet("/hello")]
     public JsonResult Hello()
     {
         return new JsonResult(new { Name = "David" });
     }
}

This would allow for common scenarios that change the result by inspecting the type and lighting up behavior (mutating state on it before execution)

dotnet / runtime

Allow unsealed delegates to reduce allocations for lambdas #8362