Open stephentoub opened 1 year ago
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch See info in area-owners.md if you want to be subscribed.
Author: | stephentoub |
---|---|
Assignees: | - |
Labels: | `area-CodeGen-coreclr` |
Milestone: | - |
Tagging subscribers to this area: @dotnet/area-system-reflection See info in area-owners.md if you want to be subscribed.
Author: | stephentoub |
---|---|
Assignees: | - |
Labels: | `area-System.Reflection`, `untriaged` |
Milestone: | - |
One approach to addressing some of this is to implement https://github.com/dotnet/runtime/issues/75357 which would share previously generated IL if the signature is the same and the method is not virtual.
Otherwise, there are other options to consider such as detecting the startup phase in the runtime, and having reflection respect that so it doesn't generate the invoke stubs.
@stephentoub does the logging here happen because we're enabling DOTNET_JitDisasmSummary
or is it happening all the time? In other words - can we try and avoid the logging codepath during startup?
does the logging here happen because we're enabling DOTNET_JitDisasmSummary
It's not related to DOTNET_JitDisasmSummary; the JIT handles that directly, not managed code. EventSource is unrelated to that.
or is it happening all the time?
Invariably on Windows there's something somewhere on the box that's enabled an ETW session, which then causes every .NET app to trip over this, with NativeRuntimeEventSource being created on the startup path.
Moving to 9.0 however I'll investigate a solution now and we can always backport. The regression occurred in v7 (not v8).
Invariably on Windows there's something somewhere on the box that's enabled an ETW session, which then causes every .NET app to trip over this, with NativeRuntimeEventSource being created on the startup path.
Another approach is to determine if the ETW logging during startup is necessary.
On my machine, this is costing ~10% of time-to-Main
@stephentoub how was this measured? I tried launching corerun against simple console app and didn't see any improvments when only using interpreted invoke. Typical perf result like this: | Method | Mean | Error | StdDev | Median | Min | Max | Allocated |
---|---|---|---|---|---|---|---|---|
AFTER | 5.290 ms | 0.1116 ms | 0.1240 ms | 5.271 ms | 5.100 ms | 5.571 ms | 565 B | |
BEFORE | 5.306 ms | 0.1625 ms | 0.1806 ms | 5.309 ms | 5.031 ms | 5.628 ms | 565 B |
With Jit results showing fewer items: BEFORE
1: JIT compiled System.Guid:FormatGuidVector128Utf8(System.Guid,ubyte) [Tier0, IL size=322, code size=584]
2: JIT compiled System.HexConverter:AsciiToHexVector128(System.Runtime.Intrinsics.Vector128`1[ubyte],System.Runtime.Intrinsics.Vector128`1[ubyte]) [Tier0, IL size=78, code size=359]
3: JIT compiled System.Runtime.Intrinsics.Vector128:ShuffleUnsafe(System.Runtime.Intrinsics.Vector128`1[ubyte],System.Runtime.Intrinsics.Vector128`1[ubyte]) [Tier0, IL size=41, code size=50]
4: JIT compiled (dynamicClass):InvokeStub_EventAttribute.set_Level(System.Object,System.Object,ulong) [FullOpts, IL size=25, code size=27]
5: JIT compiled (dynamicClass):InvokeStub_EventAttribute.set_Message(System.Object,System.Object,ulong) [FullOpts, IL size=25, code size=28]
6: JIT compiled (dynamicClass):InvokeStub_EventAttribute.set_Task(System.Object,System.Object,ulong) [FullOpts, IL size=25, code size=27]
7: JIT compiled (dynamicClass):InvokeStub_EventAttribute.set_Opcode(System.Object,System.Object,ulong) [FullOpts, IL size=25, code size=27]
8: JIT compiled (dynamicClass):InvokeStub_EventAttribute.set_Version(System.Object,System.Object,ulong) [FullOpts, IL size=25, code size=28]
9: JIT compiled (dynamicClass):InvokeStub_EventAttribute.set_Keywords(System.Object,System.Object,ulong) [FullOpts, IL size=25, code size=28]
10: JIT compiled (dynamicClass):InvokeStub_EventAttribute.set_Level(System.Object,System.Span`1[System.Object]) [FullOpts, IL size=36, code size=40]
11: JIT compiled (dynamicClass):InvokeStub_EventAttribute.set_Message(System.Object,System.Span`1[System.Object]) [FullOpts, IL size=26, code size=37]
12: JIT compiled (dynamicClass):InvokeStub_EventAttribute.set_Task(System.Object,System.Span`1[System.Object]) [FullOpts, IL size=36, code size=40]
13: JIT compiled (dynamicClass):InvokeStub_EventAttribute.set_Opcode(System.Object,System.Span`1[System.Object]) [FullOpts, IL size=36, code size=40]
14: JIT compiled (dynamicClass):InvokeStub_EventAttribute.set_Version(System.Object,System.Span`1[System.Object]) [FullOpts, IL size=36, code size=41]
15: JIT compiled (dynamicClass):InvokeStub_EventAttribute.set_Keywords(System.Object,System.Span`1[System.Object]) [FullOpts, IL size=36, code size=41]
16: JIT compiled StartupApp.Program:Main(System.String[]) [Tier0, IL size=3, code size=15]
AFTER
1: JIT compiled System.Guid:FormatGuidVector128Utf8(System.Guid,ubyte) [Tier0, IL size=322, code size=584]
2: JIT compiled System.HexConverter:AsciiToHexVector128(System.Runtime.Intrinsics.Vector128`1[ubyte],System.Runtime.Intrinsics.Vector128`1[ubyte]) [Tier0, IL size=78, code size=359]
3: JIT compiled System.Runtime.Intrinsics.Vector128:ShuffleUnsafe(System.Runtime.Intrinsics.Vector128`1[ubyte],System.Runtime.Intrinsics.Vector128`1[ubyte]) [Tier0, IL size=41, code size=50]
4: JIT compiled StartupApp.Program:Main(System.String[]) [Tier0, IL size=3, code size=15]
Linking related issue https://github.com/dotnet/runtime/issues/85791
Typical perf result like this:
What are you measuring that's only taking 5ms? That's not coreclr startup.
@stephentoub how was this measured?
IIRC, I built once with main and once with the codegen code path commented out, and then used Measure-Command to time many empty console app invocations.
Possible design that is extensible:
For an empty Main method, and enabling DOTNET_JitDisasmSummary, we see all of these:
EventSource is looking for
[Event]
on a type usingGetCustomAttributes
, which uses MethodInfo.Invoke to populate the attribute, and since there's more than one attribute, the setters all get invoked twice, triggering the MethodInfo.Invoke to use reflection emit and dynamically generate a method to handle each setter.On my machine, this is costing ~10% of time-to-Main. When I disable the dynamic method approach, startup improves by ~5ms.