dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
14.98k stars 4.66k forks source link

Tiered JIT: redundant compilations #76402

Open EgorBo opened 1 year ago

EgorBo commented 1 year ago

To record @AndyAyersMS's thoughts I came up with a quick repro:

public class Program
{
    public static void Main()
    {
        for (int i = 0; i < 100; i++)
        {
            // Promote Test to Tier1
            Test();
            Thread.Sleep(16);
        }
    }

    [MethodImpl(MethodImplOptions.NoInlining)]
    private static int Test()
    {
        return Property;
    }

    private static int Property => 42;
}

Run this code with DOTNET_JitDisasmSummary=1 on .NET 7.0 RC1 and it's going to print:

   ...
   4: JIT compiled Program:Main() [Tier0, IL size=27, code size=94]
   5: JIT compiled Program:Test():int [Tier0, IL size=6, code size=23]
   6: JIT compiled Program:get_Property():int [Tier0, IL size=3, code size=11]
   7: JIT compiled Program:Test():int [Tier1, IL size=6, code size=6]
   8: JIT compiled Program:get_Property():int [Tier1, IL size=3, code size=6]

get_Property was compiled twice (Tier0 and Tier1) despite the fact it's super trivial (like e.g. any auto-property) - only 6 bytes of IL and we wasted some time on it.

We should consider allowing inlining for very small methods in Tier0, potentially, this might even improve JIT's TP because call IR nodes are slow to process. Only if they're small and don't contain control-flow

category:cq theme:tiering skill-level:expert cost:large impact:medium

ghost commented 1 year ago

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch See info in area-owners.md if you want to be subscribed.

Issue Details
To record @AndyAyersMS's thoughts I came up with a quick repro: ```csharp public class Program { public static void Main() { for (int i = 0; i < 100; i++) { Test(); Thread.Sleep(16); } } [MethodImpl(MethodImplOptions.NoInlining)] private static int Test() { return Property; } private static int Property => 42; } ``` Run this code with `DOTNET_JitDisasmSummary=1` on .NET 7.0 RC1 and it's going to print: ``` ... 4: JIT compiled Program:Main() [Tier0, IL size=27, code size=94] 5: JIT compiled Program:Test():int [Tier0, IL size=6, code size=23] 6: JIT compiled Program:get_Property():int [Tier0, IL size=3, code size=11] 7: JIT compiled Program:Test():int [Tier1, IL size=6, code size=6] 8: JIT compiled Program:get_Property():int [Tier1, IL size=3, code size=6] ``` `get_Property` was compiled twice (Tier0 and Tier1) despite the fact it's super trivial (like e.g. any auto-property) and we wasted some time on it. As a quick solution we should consider inlining for very small calls in Tier0, potentially, this might even improve JIT's TP because `call` IR nodes are slow to process.
Author: EgorBo
Assignees: -
Labels: `area-CodeGen-coreclr`, `untriaged`
Milestone: -
EgorBo commented 1 year ago

AvaloniaILSpy app, R2R=0, TC=1:

More than 3000 methods made it to Tier1 with IL<= 8 bytes

EgorBo commented 1 year ago

In fact, most of the Tier1 compilations in that app (~11500k methods made it to Tier1) are quite small: image

Histogram: X axis is IL size

EgorBo commented 1 year ago

Potential easy fix: Increase call-counting threshold for methods below 16 bytes (e.g. 30 -> 100) on the VM side.

EgorBo commented 1 year ago

Did a quick prototype: 1) Inline only <=8 bytes of IL 2) Give up if an inlinee has any control flow (branches/switches) 3) Experimented with other limitations like max inlining depth, number of locals 4) Introduced a new VM API to get method IL size quickly (cached via hashtable)

The number of compilation reduced by 3000 but the start up time slightly regressed any way 😢

EgorBo commented 1 year ago

BingSNR:

image

Most popular IL size of methods is 5-6 bytes of IL.

EgorBo commented 1 year ago

so we emit thousands of redundant call-counting stubs/precodes/methods

19% of all methods made it to Tier1

EgorBo commented 1 year ago

For BingSNR my fairly simple prototype lowers overall number of "jitted functions" from 240k to 200k (and my prototype ignores calls inside simple calls, e.g. a chain of properties)

EgorBo commented 1 year ago

Moving to Future as my attempt to enable limitted inlining in tier0 even slightly regressed startup