dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
14.95k stars 4.65k forks source link

PGO: redundant Tier0-instr promotions #84517

Open EgorBo opened 1 year ago

EgorBo commented 1 year ago
static void Main()
{
    for (int i = 0; i < 100; i++)
    {
        Test();
        var _ = Test2;
        Thread.Sleep(16);
    }
}

[MethodImpl(MethodImplOptions.NoInlining)]
static int Test1()
{
    int sum = 0;
    for (int i = 0; i < 10; i++)
        sum += i;
    return sum;
}

static int Test2 { get; set; }

Actual result:

 147: JIT compiled Prog:Test1():int [Instrumented Tier0, IL size=21, code size=109, hash=0x792ae54b]
 311: JIT compiled Prog:Test1():int [Instrumented Tier0, IL size=21, code size=109, hash=0x792ae54b]
 313: JIT compiled Prog:Test1():int [Tier1 with Dynamic PGO, IL size=21, code size=14, hash=0x792ae54b]

 152: JIT compiled Prog:get_Test2():int [Tier0, IL size=6, code size=12, hash=0xf8310770]
 316: JIT compiled Prog:get_Test2():int [Instrumented Tier0, IL size=6, code size=12, hash=0xf8310770]
 319: JIT compiled Prog:get_Test2():int [Tier1, IL size=6, code size=7, hash=0xf8310770]

Since Test1 has a loop we always instrument it in Tier0, but then it's promoted to Instrumented Tier0 again - it can be omitted.

Also, all Tier0 methods should warn VM in advance that they don't need instrumentation at all (e.g. simple auto properties like get_Test2).

cc @AndyAyersMS

ghost commented 1 year ago

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch, @kunalspathak See info in area-owners.md if you want to be subscribed.

Issue Details
```csharp static void Main() { for (int i = 0; i < 100; i++) { Test(); var _ = Test2; Thread.Sleep(16); } } [MethodImpl(MethodImplOptions.NoInlining)] static int Test1() { int sum = 0; for (int i = 0; i < 10; i++) sum += i; return sum; } static int Test2 { get; set; } ``` Actual result: ``` 147: JIT compiled Prog:Test1():int [Instrumented Tier0, IL size=21, code size=109, hash=0x792ae54b] 311: JIT compiled Prog:Test1():int [Instrumented Tier0, IL size=21, code size=109, hash=0x792ae54b] 313: JIT compiled Prog:Test1():int [Tier1 with Dynamic PGO, IL size=21, code size=14, hash=0x792ae54b] 152: JIT compiled Prog:get_Test2():int [Tier0, IL size=6, code size=12, hash=0xf8310770] 316: JIT compiled Prog:get_Test2():int [Instrumented Tier0, IL size=6, code size=12, hash=0xf8310770] 319: JIT compiled Prog:get_Test2():int [Tier1, IL size=6, code size=7, hash=0xf8310770] ``` Since `Test1` has a loop we always instrument it in Tier0, but then it's promoted to Instrumented Tier0 again - it can be omitted. Also, all Tier0 methods should warn VM in advance that they don't need instrumentation at all (e.g. simple auto properties like `get_Test2`). cc @AndyAyersMS
Author: EgorBo
Assignees: -
Labels: `area-CodeGen-coreclr`, `untriaged`
Milestone: -
AndyAyersMS commented 1 year ago

Even more interesting when OSR gets involved:

 832: JIT compiled Benchstone.BenchI.MulMatrix:Inner(int[][],int[][],int[][]) [Instrumented Tier0, IL size=435, code size=2943, hash=0x482bb336]
 833: JIT compiled Benchstone.BenchI.MulMatrix:Inner(int[][],int[][],int[][]) [Tier1-OSR @0x60 with Dynamic PGO, IL size=435, code size=1323, hash=0x482bb336]
 834: JIT compiled Benchstone.BenchI.MulMatrix:Inner(int[][],int[][],int[][]) [Tier1-OSR @0x2a with Dynamic PGO, IL size=435, code size=1529, hash=0x482bb336]
 835: JIT compiled Benchstone.BenchI.MulMatrix:Inner(int[][],int[][],int[][]) [Instrumented Tier0, IL size=435, code size=2943, hash=0x482bb336]
 836: JIT compiled Benchstone.BenchI.MulMatrix:Inner(int[][],int[][],int[][]) [Tier1-OSR @0x60 with Dynamic PGO, IL size=435, code size=1323, hash=0x482bb336]
 837: JIT compiled Benchstone.BenchI.MulMatrix:Inner(int[][],int[][],int[][]) [Tier1-OSR @0x2a with Dynamic PGO, IL size=435, code size=1547, hash=0x482bb336]
 838: JIT compiled Benchstone.BenchI.MulMatrix:Inner(int[][],int[][],int[][]) [Tier1 with Dynamic PGO, IL size=435, code size=1194, hash=0x482bb336]
EgorBo commented 1 year ago

Even more interesting when OSR gets involved:

 832: JIT compiled Benchstone.BenchI.MulMatrix:Inner(int[][],int[][],int[][]) [Instrumented Tier0, IL size=435, code size=2943, hash=0x482bb336]
 833: JIT compiled Benchstone.BenchI.MulMatrix:Inner(int[][],int[][],int[][]) [Tier1-OSR @0x60 with Dynamic PGO, IL size=435, code size=1323, hash=0x482bb336]
 834: JIT compiled Benchstone.BenchI.MulMatrix:Inner(int[][],int[][],int[][]) [Tier1-OSR @0x2a with Dynamic PGO, IL size=435, code size=1529, hash=0x482bb336]
 835: JIT compiled Benchstone.BenchI.MulMatrix:Inner(int[][],int[][],int[][]) [Instrumented Tier0, IL size=435, code size=2943, hash=0x482bb336]
 836: JIT compiled Benchstone.BenchI.MulMatrix:Inner(int[][],int[][],int[][]) [Tier1-OSR @0x60 with Dynamic PGO, IL size=435, code size=1323, hash=0x482bb336]
 837: JIT compiled Benchstone.BenchI.MulMatrix:Inner(int[][],int[][],int[][]) [Tier1-OSR @0x2a with Dynamic PGO, IL size=435, code size=1547, hash=0x482bb336]
 838: JIT compiled Benchstone.BenchI.MulMatrix:Inner(int[][],int[][],int[][]) [Tier1 with Dynamic PGO, IL size=435, code size=1194, hash=0x482bb336]

Heh indeed funny one but presumably the same issue, I'll fix it this week

AndyAyersMS commented 1 year ago

We can also see this when we have a nontrivial intrinsic R2R method; we rejit at tier1-instr but it doesn't actually instrument; then we rejit at tier1 and don't have any dynamic PGO data, so we are jitting twice and probably producing the same code each time.

 841: JIT compiled System.Memory.Span`1[ubyte]:SequenceEqual():bool:this [Tier0, IL size=28, code size=151, hash=0xbffeed85]
**877: JIT compiled System.SpanHelpers:SequenceEqual(byref,byref,ulong):bool [Instrumented Tier1 with Static PGO, IL size=612, code size=302, hash=0xacbc5054]**
 878: JIT compiled System.Memory.Span`1[ubyte]:SequenceEqual():bool:this [Instrumented Tier0, IL size=28, code size=151, hash=0xbffeed85]
 881: JIT compiled System.MemoryExtensions:SequenceEqual[ubyte](System.Span`1[ubyte],System.ReadOnlySpan`1[ubyte]):bool [Instrumented Tier1, IL size=96, code size=42, hash=0x608a246a]
**882: JIT compiled System.SpanHelpers:SequenceEqual(byref,byref,ulong):bool [Tier1 with Static PGO, IL size=612, code size=302, hash=0xacbc5054]**
 883: JIT compiled System.Memory.Span`1[ubyte]:SequenceEqual():bool:this [Tier1, IL size=28, code size=81, hash=0xbffeed85]
 886: JIT compiled System.MemoryExtensions:SequenceEqual[ubyte](System.Span`1[ubyte],System.ReadOnlySpan`1[ubyte]):bool [Tier1, IL size=96, code size=42, hash=0x608a246a]
EgorBo commented 1 year ago

Moving to 9.0 as the VM change was not trivial. We think that this doesn't really affect perf or startup but still nice to fix

EgorBo commented 1 month ago

Moving to post-P7 as it's not a correctness problem