Open smoogipoo opened 4 years ago
It's unclear what you're asking for here. Do you believe there is a bug in C# compiler, the JIT, etc ...
It seems to be a JIT issue, though I don't think I'm fully qualified to be able to answer that.
IL-wise they generate very similar code:
.method private hidebysig
instance int32 safeMethod () cil managed
{
// Method begins at RVA 0x2050
// Code size 8 (0x8)
.maxstack 3
.locals init (
[0] class C/'<>c__DisplayClass2_0' 'CS$<>8__locals0'
)
// sequence point: hidden
IL_0000: newobj instance void C/'<>c__DisplayClass2_0'::.ctor()
IL_0005: stloc.0
IL_0006: ldc.i4.0
IL_0007: ret
} // end of method C::safeMethod
.method private hidebysig
instance int32 allocatingMethod () cil managed
{
// Method begins at RVA 0x2064
// Code size 15 (0xf)
.maxstack 3
.locals init (
[0] class C/'<>c__DisplayClass3_0' 'CS$<>8__locals0'
)
// sequence point: hidden
IL_0000: newobj instance void C/'<>c__DisplayClass3_0'::.ctor()
IL_0005: stloc.0
IL_0006: ldloc.0
IL_0007: ldarg.0
IL_0008: stfld class C C/'<>c__DisplayClass3_0'::'<>4__this'
IL_000d: ldc.i4.0
IL_000e: ret
} // end of method C::allocatingMethod
Whereas the JIT ASM is very different:
C.safeMethod()
L0000: xor eax, eax
L0002: ret
C.allocatingMethod()
L0000: push esi
L0001: mov esi, ecx
L0003: mov ecx, 0x9c0cadc
L0008: call 0x05c230cc
L000d: lea edx, [eax+4]
L0010: call 0x0f7fa3b0
L0015: xor eax, eax
L0017: pop esi
L0018: ret
With the only difference being that an instance field is referenced in the allocatingMethod
path.
Moving back to runtime as the issue is about the JIT generation for the pattern emitted by Roslyn, not the Roslyn pattern itself. At least that is my reading at this point. If the discussion comes back to the pattern emitted here we can move back to Roslyn.
Isn't the concern here that the display class is new'd up at the beginning of the method? That's entirely due to the C# compiler's IL code gen and is not something the JIT would change. I don't see how this is a runtime or JIT issue.
(The asm is different because in one case the JIT can actually see that the method is a nop and makes it such.)
@stephentoub I've been reading it as they want the JIT to special case and optimize this pattern. @smoogipoo is acknowledging the IL difference but in the face of that still wants the JIT ASM to be the same.
Seems like incorrect odd IL to me -- for this source
private int allocatingMethod()
{
return 0;
int local = 2;
return list.Count(_ => field > local);
}
CSC produces this IL:
IL_0000 73 08 00 00 06 newobj 0x6000008
IL_0005 0a stloc.0
IL_0006 06 ldloc.0
IL_0007 02 ldarg.0
IL_0008 7d 04 00 00 04 stfld 0x4000004
IL_000d 16 ldc.i4.0
IL_000e 2a ret
Now, given that IL, the jit should arguably be able to optimize away the newobj, but this is blocked because the jit doesn't propagate non-nullness for local 0 fast enough (it could/should happen during local AP in morph), and so the jit thinks the stfld might cause an exception, and that keeps the newobj live long enough that it can't be optimized away.
I've been reading it as they want the JIT to special case and optimize this pattern
Why couldn't Roslyn similarly avoid emitting the dead code?
@smoogipoo I take it this is distilled from some more complex example? If so, do you have examples where Roslyn hoists the closure construction up above more interesting constructs than return 0
?
Eager construction is likely deemed safe as the only observable effect is extra heap allocation, but as you note this allocation is not expected and seemingly should be deferred until the point of plausible use.
Why couldn't Roslyn similarly avoid emitting the dead code?
Roslyn certainly could. Assuming this wasn't done initially because it can complicate code gen a bit and it's a scenario for which the user has received a warning. Generally that gets prioritized fairly low.
Eager construction is likely deemed safe as the only observable effect is extra heap allocation, but as you note this allocation is not expected and seemingly should be deferred until the point of plausible use.
Agree Roslyn would ideally delay the closure allocation until it was actually used. This has come up before and is tracked by issue 20777. There have been a few discussions over the years but historically there has been concern the extra complexity will outweigh the wins. May be worth taking another look at once we get past C# 9.
In terms of resolving this issue. If we want to look at it as a Roslyn request we can transfer back to roslyn and I can dupe it against 20777 + add parts of the scenario. If want to have it track the potential JIT optimization discussed then we can leave it here. Either way is good by me.
@AndyAyersMS The example I posted was hoisted from https://github.com/ppy/osu-framework/blob/c857318bdf8d3bb06b2d806bc957be571bded0ed/osu.Framework/Graphics/Cursor/TooltipContainer.cs#L195-L253.
The code which allocates (L235-252) runs <1% of the time, with majority of exits being done through L214.
The resultant IL: https://gist.github.com/smoogipoo/0ae07bebe7478c6960da22b16983a67b
@jaredpar since you already have an issue for this we can keep this one to cover the codegen aspect. Marking as future.
@AndyAyersMS, but that won't help with the actual example, no? I don't see how the JIT can eliminate the allocation in the real case, unless it's going to stack alloc or something.
The jit is already optimizing away heap allocations (as in safeMethod
) when object construction doesn't have side effects and the object is unreferenced after construction.
removing stmt with no side effects
Removing statement STMT00000 (IL 0x000...0x005)
N005 ( 17, 16) [000003] -AC-----R--- * ASG ref
N004 ( 1, 1) [000002] D------N---- +--* LCL_VAR ref V03 tmp1
N003 ( 17, 16) [000001] --C--------- \--* CALL help ref HELPER.CORINFO_HELP_NEWSFAST
N002 ( 3, 10) [000000] ------------ arg0 in rcx \--* CNS_INT(h) long 0xd1ffab1e method
in BB01 as useless:
resulting code:
; Assembly listing for method BenchmarkMethodDelegateAllocations:safeMethod():int:this
; Emitting BLENDED_CODE for X64 CPU with AVX - Windows
; optimized code
; rsp based frame
; partially interruptible
; Final local variable assignments
;
;* V00 this [V00 ] ( 0, 0 ) ref -> zero-ref this class-hnd
;* V01 loc0 [V01 ] ( 0, 0 ) ref -> zero-ref class-hnd exact
;# V02 OutArgs [V02 ] ( 1, 1 ) lclBlk ( 0) [rsp+0x00] "OutgoingArgSpace"
;* V03 tmp1 [V03 ] ( 0, 0 ) ref -> zero-ref class-hnd exact "NewObj constructor temp"
;
; Lcl frame size = 0
G_M42318_IG01:
;; bbWeight=1 PerfScore 0.00
G_M42318_IG02:
xor eax, eax
;; bbWeight=1 PerfScore 0.25
G_M42318_IG03:
ret
;; bbWeight=1 PerfScore 1.00
so the jit issue is to try and recognize these cases more broadly, removing completely dead allocations, and perhaps using code motion to sink allocations when objects are conditionally live when created.
I don't know how common these dead or partially dead allocation are, so a first step would be to run an analysis screen and see how often we see these.
Outside of exceptions and writes to statics, there can be other observable effects of allocation and construction. For allocation, the runtime tells the jit if the associated new helper has side effects so that the jit won't allocate away dead finalizable allocations, and similar cases. For construction, the jit currently must inline the entire constructor path before it can prove construction is side effect free.
We cannot get rid of the object allocation today because we consider it used due to a write to one of its field.
Presumably object stack allocation would be able to kick in and remove the allocation if we allowed stack allocating objects in loops.
I was reading the recent .NET 9 performance improvements blog post (https://devblogs.microsoft.com/dotnet/performance-improvements-in-net-9/#object-stack-allocation) and had the above comment in the back of my mind.
If I go back to my benchmark and add MethodImplOptions.NoInlining
to the methods, then the results are:
| Method | Job | Runtime | Mean | Error | StdDev | Ratio | RatioSD | Gen0 | Allocated | Alloc Ratio |
|----------------- |--------- |--------- |---------:|----------:|----------:|------:|--------:|-------:|----------:|------------:|
| SafeMethod | .NET 8.0 | .NET 8.0 | 1.126 us | 0.0205 us | 0.0182 us | 1.00 | 0.00 | - | - | NA |
| AllocatingMethod | .NET 8.0 | .NET 8.0 | 4.745 us | 0.0077 us | 0.0060 us | 4.22 | 0.07 | 5.0964 | 32000 B | NA |
| | | | | | | | | | | |
| SafeMethod | .NET 9.0 | .NET 9.0 | 1.115 us | 0.0094 us | 0.0084 us | 1.00 | 0.00 | - | - | NA |
| AllocatingMethod | .NET 9.0 | .NET 9.0 | 1.111 us | 0.0036 us | 0.0030 us | 1.00 | 0.01 | - | - | NA |
... which is already awesome! This covers probably 95+% of cases where this comes up for us, because we're "looping" at a very high level (global game loop).
Description
We've recently encountered a method allocating even though there was no allocate-y code running. We've since tracked this down to the following:
IL shows the capturing lambda's
DisplayClass
always seems to be "allocated" as soon as the method is entered, but the JIT has the ability to control when the object is initialised. When the lambda captures both a local andthis
, the JIT is suddenly unable to forego this initialisation even if the method terminates execution early.Here's a gist that shows the issue: https://gist.github.com/smoogipoo/810c033962b8a95289dbdabd9d84801b And a SharpLab repro.
Is this a valid and/or known issue, or something we should be wary of going forward?
Configuration + Data
I've also run this on mono, which seems to always initialise the object (see
Gen 0
, Benchmark.Net doesn't seem to have allocation tracking on mono).category:cq theme:dead-code skill-level:expert cost:medium