Unnecessary closure allocation when only one variable is captured

svick commented 6 years ago

using System;

static class C
{
    static Func<int[]> M1(int[] xs)
    {
        return () => xs;
    }

    static Func<int[]> M2(int[] xs)
    {
        return xs.Identity;
    }

    static int[] Identity(this int[] xs) => xs;
}

Both M1 and M2 return a delegate that returns the passed-in xs object. The difference is that M1 additionally allocates a closure object, even though M2 doesn't need to do that to achieve the same result.

I think that in such situations, when a lambda uses only one variable from its closure, the lambda should be treated the same as extension method on the closed-over variable, which would avoid the closure allocation.

This won't help if the captured variable is a value type, but I don't think there is a way to avoid the allocation in that case.

For reference, the IL for M1 and M2 is:

.method private hidebysig static class [mscorlib]System.Func`1<int32[]> M1 (
        int32[] xs
    ) cil managed 
{
    .maxstack 8

    IL_0000: newobj instance void C/'<>c__DisplayClass0_0'::.ctor()
    IL_0005: dup
    IL_0006: ldarg.0
    IL_0007: stfld int32[] C/'<>c__DisplayClass0_0'::xs
    IL_000c: ldftn instance int32[] C/'<>c__DisplayClass0_0'::'<M1>b__0'()
    IL_0012: newobj instance void class [mscorlib]System.Func`1<int32[]>::.ctor(object, native int)
    IL_0017: ret
}

.method private hidebysig static class [mscorlib]System.Func`1<int32[]> M2 (
        int32[] xs
    ) cil managed 
{
    .maxstack 8

    IL_0000: ldarg.0
    IL_0001: ldftn int32[] C::Identity(int32[])
    IL_0007: newobj instance void class [mscorlib]System.Func`1<int32[]>::.ctor(object, native int)
    IL_000c: ret
}

gafter commented 6 years ago

@VSadov @agocke How hard do you think it would be to implement this improvement to our code gen strategy?

agocke commented 6 years ago

I think I would want to see perf measurements comparing the two methods.

Basically, the difference here is that the value itself is the receiver type, instead of a class containing the receiver type as a field. I'm not totally convinced the JIT is going to generate great IL for this.

Overall, I'd say it's not a simple optimization, but if it brings big wins it may be worth doing.

agocke commented 6 years ago

Thinking about this more, we would still need to validate that there's no mutation to avoid hoisting. I think this would be dependent on https://github.com/dotnet/roslyn/issues/20777

svick commented 6 years ago

@agocke

I think I would want to see perf measurements comparing the two methods.

I created a simple microbenchmark for this using Benchmark.net. The results are:

Method	Mean	Error	StdDev	Gen 0	Allocated
CreateDelegateLambda	36.74 ns	0.7785 ns	0.9268 ns	0.0280	88 B
CreateDelegateExtensionMethod	18.38 ns	0.5607 ns	1.6532 ns	0.0203	64 B
InvokeDelegateLambda	37.26 ns	0.9130 ns	2.6776 ns	0.0280	88 B
InvokeDelegateExtensionMethod	23.83 ns	0.6927 ns	2.0424 ns	0.0203	64 B

As you can see, when I simulate the proposed optimization by using an extension method, it saves 24 B of allocations, as expected. But it also measurably decreases the time it takes to execute the code. So it seems to me that it's a clear perf win.

agocke commented 6 years ago

Thanks! That's what I was looking for.

dotnet / roslyn

Unnecessary closure allocation when only one variable is captured #24941