Closed djluck closed 4 years ago
After a co-worker asked if this applied to any variables outside the scope of the closure or just to class members, I amended the benchmark to:
public class LambdaTest
{
[Benchmark]
public string ClosureWithGeneric()
{
return Closure_GetStr<string>();
}
private string _aString = "a";
private string Closure_GetStr<T>()
{
//y now holds a reference to _aString, x is still a closure
var y = _aString;
Func<string> x = () => { return y; };
return x();
}
}
Which gave the times:
Method | Job | Jit | Mean | Error | StdDev |
---|---|---|---|---|---|
ClosureWithGeneric | X64-Clr-LegacyJit | LegacyJit | 18.15 ns | 0.5168 ns | 0.7412 ns |
ClosureWithGeneric | X64-Clr-RyuJit | RyuJit | 13.75 ns | 0.8877 ns | 1.2731 ns |
So it seems it is specific to closures that directly reference class members.
Both the legacy JIT and RyuJIT allocate that delegate. In general, none of the existing .NET Framework JITs eliminate delegate allocations.
The problem here seems to be in the delegate constructor, it's much slower when RyuJIT is used.
@dotnet/jit-contrib
An expanded view of the PerfView trace looks like this:
If it's helpful, I can upload the original perfview trace.
Look at the perfview trace of the fast version as well. You'll notice that COMDelegate::DelegateConstruct
doesn't appear in that case.
fgOptimizeDelegateConstructor
is where call to COMDelegate::DelegateConstruct
is supposed to be replaced with the much more efficient equivalent.
fgOptimizeDelegateConstructor is where call to COMDelegate::DelegateConstruct is supposed to be replaced with the much more efficient equivalent.
Yeah, and that one expects the function address to follow a certain pattern (e.g. be GT_FTN_ADDR
) but in this case the function address ends up in a lclvar. What seems strange is that runtime lookup is involved in this despite that neither class nor the delegate target are generic.
ldftn instance string LambdaTest::'<Closure_GetStr>b__2_0'<!!0>()
needs to load instantiating stub.
The fix maybe similar to dotnet/coreclr#10663 (note that this change was CoreRT ABI only): don't depend on pattern matching and always perform the optimization for verifiable delegate creation sequence by construction.
ldftn instance string LambdaTest::'
b__2_0'<!!0>() needs to load instantiating stub.
Right, the delegate target is actually a generic method. I wonder why the C# compiler does that but that's another problem.
@sandreenko FYI
There is an issue for R2R dotnet/runtime#7839 (optimize delegate constructor in R2R for a virtual fp).
I've identified a situation where RyuJIT generates inefficient code that involve closures defined within generic methods.
For example, given this code:
The results of running the above code are:
RyuJIT is about 20x slower than the legacy JIT. I've confirmed that this issue exists in dotnet core 2.0.
The issue seems to be caused by heavy calls to
COMDelegate::DelegateConstruct
, which does not occur when using the legacy JIT:Closures in non-generic methods and Lambdas in generic methods do not seem to be affected. It is only the combination of the two.