DrewScoggins / performance-2

This repo contains benchmarks used for testing the performance of all .NET Runtimes
MIT License
0 stars 0 forks source link

[Perf -14%] Microsoft.Extensions.Logging.LoggingOverhead (3) #3249

Open performanceautofiler[bot] opened 3 years ago

performanceautofiler[bot] commented 3 years ago

Run Information

Architecture x64
OS Windows 10.0.18362
Baseline b6f791d1984f998384267b8b6683bd02f64747f2
Compare 7a2e0ececef83f6b5626a6410bd266fbbf1cb071

Regressions in Microsoft.Extensions.Logging.LoggingOverhead

Benchmark Baseline Test Test/Base Baseline IR Compare IR IR Ratio Baseline ETL Compare ETL
[NoArguments](<https://pvscmdupload.blob.core.windows.net/reports/allTestHistory/refs/heads/master_x64_Windows 10.0.18362/Microsoft.Extensions.Logging.LoggingOverhead.NoArguments.html>) 31.18 ns 36.00 ns 1.15 321.089102434843 373.8511616032678 1.1643221734039744 Trace Trace
[NoArguments_DefineMessage](<https://pvscmdupload.blob.core.windows.net/reports/allTestHistory/refs/heads/master_x64_Windows 10.0.18362/Microsoft.Extensions.Logging.LoggingOverhead.NoArguments_DefineMessage.html>) 33.48 ns 39.50 ns 1.18 363.90220302121344 410.7404024333668 1.1287109531717316 Trace Trace
[TwoArguments_DefineMessage](<https://pvscmdupload.blob.core.windows.net/reports/allTestHistory/refs/heads/master_x64_Windows 10.0.18362/Microsoft.Extensions.Logging.LoggingOverhead.TwoArguments_DefineMessage.html>) 47.76 ns 51.93 ns 1.09 510.5725446164502 563.1364961266828 1.1029509950436513 Trace Trace

Related Issue on x64 ubuntu

[Perf 21%] Microsoft.Extensions.Logging.LoggingOverhead (2)

graph graph graph Historical Data in Reporting System

Repro

git clone https://github.com/dotnet/performance.git
py .\performance\scripts\benchmarks_ci.py -f netcoreapp5.0 --filter 'Microsoft.Extensions.Logging.LoggingOverhead*'
### Histogram #### Microsoft.Extensions.Logging.LoggingOverhead.NoArguments ```log [30.664 ; 31.779) | @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ [31.779 ; 33.024) | @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ [33.024 ; 34.256) | @@@@@@@@@@@@@@@@@@@@@@@@@@ [34.256 ; 35.409) | @@@@@@@@@@@@@@@@@@@@@@@@@@@@ [35.409 ; 36.583) | @@@@@@@@@@@@@@@@ [36.583 ; 38.067) | @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ [38.067 ; 38.868) | @ [38.868 ; 40.350) | @ [40.350 ; 41.443) | @ ``` #### Microsoft.Extensions.Logging.LoggingOverhead.NoArguments_DefineMessage ```log [32.379 ; 33.671) | @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ [33.671 ; 34.636) | @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ [34.636 ; 35.572) | @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ [35.572 ; 37.027) | @@@@@@@@@@@@@@@@@@@@@@@@@ [37.027 ; 37.836) | @@@@@@@@@@@@@@ [37.836 ; 38.659) | @@@@@ [38.659 ; 39.623) | @@@@ [39.623 ; 40.446) | @@@@@@@@@@@@@@@@@@@@ [40.446 ; 42.220) | @@@@@@@@@@@@@@@@ ``` #### Microsoft.Extensions.Logging.LoggingOverhead.TwoArguments_DefineMessage ```log [45.767 ; 47.102) | @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ [47.102 ; 48.982) | @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ [48.982 ; 50.707) | @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ [50.707 ; 51.838) | @@@@@@@@@@@@@@@@@@@@@@@@@ [51.838 ; 53.213) | @@@@@@@@@@@@@@@@@@@@@@@@@@@@ [53.213 ; 54.924) | @@@@@@@@@@@@@@@@@@@@@@@@@@@ [54.924 ; 57.230) | @@@@@@@@@@@@ [57.230 ; 60.030) | @ ``` ### Docs [Profiling workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/profiling-workflow-dotnet-runtime.md) [Benchmarking workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/benchmarking-workflow-dotnet-runtime.md)
DrewScoggins commented 3 years ago

@AndyAyersMS

AndyAyersMS commented 3 years ago

Benchmark-interval only analysis for NoArguments (from ETL) suggests this is something in the core runtime itself.

Baseline

Samples for corerun: 7193 events for Benchmark Intervals
Jitting           : 00.24% 9E+06    samples 1573 methods
  JitInterface    : 00.08% 3E+06    samples
Jit-generated code: 59.63% 2.27E+09 samples
  Jitted code     : 59.63% 2.27E+09 samples
  MinOpts code    : 00.00% 0        samples
  FullOpts code   : 17.42% 6.62E+08 samples
  Tier-0 code     : 00.00% 0        samples
  Tier-1 code     : 42.21% 1.6E+09  samples
  R2R code        : 00.00% 0        samples

02.92%   1.11E+08    ?        Unknown
37.13%   1.411E+09   native   coreclr.dll
16.55%   6.29E+08    FullOpt  [Microsoft.Extensions.Logging]Logger.Log(value class Microsoft.Extensions.Logging.LogLevel,value class Microsoft.Extensions.Logging.EventId,!!0,class System.Exception,class System.Func`3<!!0,class System.Exception,class System.String>)
12.00%   4.56E+08    Tier-1   [Microsoft.Extensions.Logging.Abstractions]LoggerExtensions.Log(class Microsoft.Extensions.Logging.ILogger,value class Microsoft.Extensions.Logging.LogLevel,value class Microsoft.Extensions.Logging.EventId,class System.Exception,class System.String,class System.Object[])
11.08%   4.21E+08    Tier-1   [Microsoft.Extensions.Logging]Logger.<Log>g__LoggerLog|12_0(value class Microsoft.Extensions.Logging.LogLevel,value class Microsoft.Extensions.Logging.EventId,class Microsoft.Extensions.Logging.ILogger,class System.Exception,class System.Func`3<!!0,class System.Exception,class System.String>,class System.Collections.Generic.List`1<class System.Exception>&,!!0&)
07.53%   2.86E+08    Tier-1   [Microsoft.Extensions.Logging.Abstractions]FormattedLogValues..ctor(class System.String,class System.Object[])
06.05%   2.3E+08     Tier-1   [MicroBenchmarks]LoggingOverhead.NoArguments()
04.68%   1.78E+08    Tier-1   [Microsoft.Extensions.Logging]MessageLogger.IsEnabled(value class Microsoft.Extensions.Logging.LogLevel)
00.87%   3.3E+07     FullOpt  [c5630227-85b0-454f-88ac-8126d9409a39]Runnable_0.WorkloadActionUnroll(int64)
00.84%   3.2E+07     Tier-1   [MicroBenchmarks]LoggingBenchmarkBase+NoopLogger.Log(value class Microsoft.Extensions.Logging.LogLevel,value class Microsoft.Extensions.Logging.EventId,!!0,class System.Exception,class System.Func`3<!!0,class System.Exception,class System.String>)
00.16%   6E+06       native   clrjit.dll
00.08%   3E+06       native   ntoskrnl.exe
00.05%   2E+06       native   intelppm.sys

Change

Samples for corerun: 7797 events for Benchmark Intervals
Jitting           : 00.26% 1E+07    samples 1573 methods
  JitInterface    : 00.03% 1E+06    samples
Jit-generated code: 54.03% 2.08E+09 samples
  Jitted code     : 54.03% 2.08E+09 samples
  MinOpts code    : 00.00% 0        samples
  FullOpts code   : 16.07% 6.18E+08 samples
  Tier-0 code     : 00.00% 0        samples
  Tier-1 code     : 37.96% 1.46E+09 samples
  R2R code        : 00.00% 0        samples

01.38%   5.3E+07     ?        Unknown
44.25%   1.702E+09   native   coreclr.dll
15.21%   5.85E+08    FullOpt  [Microsoft.Extensions.Logging]Logger.Log(value class Microsoft.Extensions.Logging.LogLevel,value class Microsoft.Extensions.Logging.EventId,!!0,class System.Exception,class System.Func`3<!!0,class System.Exception,class System.String>)
10.95%   4.21E+08    Tier-1   [Microsoft.Extensions.Logging]Logger.<Log>g__LoggerLog|12_0(value class Microsoft.Extensions.Logging.LogLevel,value class Microsoft.Extensions.Logging.EventId,class Microsoft.Extensions.Logging.ILogger,class System.Exception,class System.Func`3<!!0,class System.Exception,class System.String>,class System.Collections.Generic.List`1<class System.Exception>&,!!0&)
10.69%   4.11E+08    Tier-1   [Microsoft.Extensions.Logging.Abstractions]LoggerExtensions.Log(class Microsoft.Extensions.Logging.ILogger,value class Microsoft.Extensions.Logging.LogLevel,value class Microsoft.Extensions.Logging.EventId,class System.Exception,class System.String,class System.Object[])
06.73%   2.59E+08    Tier-1   [Microsoft.Extensions.Logging.Abstractions]FormattedLogValues..ctor(class System.String,class System.Object[])
04.50%   1.73E+08    Tier-1   [Microsoft.Extensions.Logging]MessageLogger.IsEnabled(value class Microsoft.Extensions.Logging.LogLevel)
04.39%   1.69E+08    Tier-1   [MicroBenchmarks]LoggingOverhead.NoArguments()
00.86%   3.3E+07     FullOpt  [86029241-0098-4ac1-a544-cffed1af7a77]Runnable_0.WorkloadActionUnroll(int64)
00.70%   2.7E+07     Tier-1   [MicroBenchmarks]LoggingBenchmarkBase+NoopLogger.Log(value class Microsoft.Extensions.Logging.LogLevel,value class Microsoft.Extensions.Logging.EventId,!!0,class System.Exception,class System.Func`3<!!0,class System.Exception,class System.String>)
00.23%   9E+06       native   clrjit.dll
00.08%   3E+06       native   ntoskrnl.exe

Benchmark: found 15 intervals; mean interval 255.670ms
AndyAyersMS commented 3 years ago

Perfview can't find the symbols for coreclr for those two traces so we can't drill in further.

AndyAyersMS commented 3 years ago

With symbols, the diff seems to be all in JIT_VirtualFunctionPointer. Source for this doesn't appear to have changed during this time span.

image

AndyAyersMS commented 3 years ago

It could be that this helper is just called more often; the callers are logging methods. Source for those callers hasn't changed either.

Suggests we need to look for recent jit codegen diffs in the key managed methods.

AndyAyersMS commented 3 years ago

I don't see anything in the commit history that jumps out as a likely culprit.

Am building before/after runtimes so I can drill in further.

AndyAyersMS commented 3 years ago

I don't see any jit codegen changes, but still need to make sure I'm looking at the right version of the logging dll.

AndyAyersMS commented 3 years ago

I can repro this locally. Still investigating.

Method Job Toolchain Mean
NoArguments Job-AILDUX \runtime1\artifacts\tests\coreclr\Windows_NT.x64.Release\Tests\Core_Root\corerun.exe 25.08 ns
NoArguments Job-MJIGDO \runtime2\artifacts\tests\coreclr\windows.x64.Release\Tests\Core_Root\corerun.exe 28.48 ns
AndyAyersMS commented 3 years ago

Seems like perhaps we're just taking longer to look things up in the generic handle cache? Need to check how much of the lookup code gets inlined into JIT_VirtualFunctionPointer.

@VSadov can you think of any recent changes (Oct 29-Nov 4) that might have caused something like this? I don't see anything in jit or runtime.

VSadov commented 3 years ago

Nothing really stands out from the diffs.

There was some change for the JIT sorting, switching from qsort to a stable sort. But that should either affect nothing or affect a lot of scenarios. I can't see how it would impact just this.

JIT_VirtualFunctionPointer is just a hash lookup and I do not see anything changed there in an obvious way. And regression there would probably affect more cases.

AndyAyersMS commented 3 years ago

New hypothesis: perhaps the windows_nt -> windows change broke how PGO is applied to the runtime, and these tests are particularly sensitive to native PGO?

Trying to verify now.

AndyAyersMS commented 3 years ago

Seems plausible it was ? From the build logs I have on hand:

;; hash 0199252802537aa4d8b393c3923b4d4cfe371872 (Oct 29)
3561 of 4076 (87.4%) original invalid call sites were matched.
4655 new call sites were added.
2324 of 33506 (  6.94%) profiled functions will be compiled for speed, and the rest of the functions will be compiled for size
150853 of 336462 inline instances were from dead/cold paths 
33226 of 33887 functions (98.0%) were optimized using profile data, and the rest of the functions were optimized without using profile data
49189162072 of 51438814196 instructions (95.6%) were optimized using profile data, and the rest of the instructions were optimized without using profile data

;; hash  03701ba6d2656c1fb52848744980c7b799aabccf (Nov 6)
3593 of 4111 (87.4%) original invalid call sites were matched.
4678 new call sites were added.
194 of 33461 (  0.58%) profiled functions will be compiled for speed, and the rest of the functions will be compiled for size
140438 of 315786 inline instances were from dead/cold paths 
33173 of 33887 functions (97.9%) were optimized using profile data, and the rest of the functions were optimized without using profile data
49184524871 of 51438814196 instructions (95.6%) were optimized using profile data, and the rest of the instructions were optimized without using profile data

Note how the number of methods compiled for speed dropped dramatically.

Am going to try building at dotnet/runtime@e691753245eb6f2d459c7f02aaaa511077c1e833 (before) and dotnet/runtime@b37f10a504f20faef2558b6d79180411ad0603c3 (after) to verify.

AndyAyersMS commented 3 years ago

Both those hashes show the PGO data is not applied effectively, so the impactful change was sometime before. Am going to keep looking.

Next to scrutinize:

AndyAyersMS commented 3 years ago

Looks like it was dotnet/runtime@8c6a049cdb4bcd958cc3487775174bdd32e5fb64 (dotnet/runtime#41897).

Suspect some of the build flags got changed and this disqualified some methods from leveraging existing PGO data; am going to try and verify by extracting the flags for jithelpers.obj.

AndyAyersMS commented 3 years ago

Only real flag diffs I see are for PCH; that might be enough as the code we inline into JIT_VirtualFunctionPointer likely comes from PCH...

So seems likely that if we regenerate native PGO data we'll get this perf back.

@brianrob do you know when we next expect to see a PGO update for the runtime repo? @jkoritzinsky FYI

brianrob commented 3 years ago

Ownership of PGO in the runtime repo is currently under discussion, so I don't currently have a timeline. Once this gets resolved, I'd expect updates to start coming.