Open AndyAyersMS opened 2 years ago
--corerun
mode with two coreruns, use the same exact execution strategy for each.corerun
s, run them and list them in the results table in the order specified on the command line. Right now they seem to be listed in the table in alphabetical order.eg
dotnet run -c Release -f net6.0 -- --filter System.Numerics.Tests.Perf_BitOperations.PopCount_ulong --corerun D:\bugs\osr-perf\main-rel\corerun.exe D:\bugs\osr-perf\osr-rel\corerun.exe D:\bugs\osr-perf\hack-rel\corerun.exe
gives the following table
Method | Job | Toolchain | Mean | Error | StdDev | Median | Min | Max | Ratio | RatioSD | Allocated | Alloc Ratio |
---|---|---|---|---|---|---|---|---|---|---|---|---|
PopCount_ulong | Job-MTRLJC | \hack-rel\corerun.exe | 464.7 ns | 6.25 ns | 5.84 ns | 465.7 ns | 450.6 ns | 471.7 ns | 1.39 | 0.02 | - | NA |
PopCount_ulong | Job-WAUWEH | \main-rel\corerun.exe | 333.5 ns | 4.70 ns | 4.16 ns | 334.8 ns | 324.5 ns | 339.7 ns | 1.00 | 0.00 | - | NA |
PopCount_ulong | Job-LMIYYB | \osr-rel\corerun.exe | 345.2 ns | 8.60 ns | 9.21 ns | 347.0 ns | 324.6 ns | 360.4 ns | 1.04 | 0.03 | - | NA |
Merged a PR addressing the first point:
https://github.com/dotnet/performance/pull/2314
To use the filter, you can use the format
dotnet run -c Release -f net7.0 --filter *Perf_Basic* --parameter-filter SkipValidation:True DataSize:10
when doing a command line run from the usual directory.
Nice! Looking forward to using it!
Ability to specify sets of benchmarks in filters
--filter
accepts multiple strings. Example:
--filter Bench1.A Perf2.B
Friendly names for --coreruns in reports
This should be easy to implement, once we have an idea how to expose it via command line args.
My current idea is:
--corerun path1 path2 --corerun-names name1 name2
but it's far from ideal.
Multiple groups of --envVar that are treated as different run configs
We could achieve that by introducing some new "separators" to --envVar
.
Currently we have:
--envVars ENV_VAR_KEY_1:value_1 ENV_VAR_KEY_2:value_2
We could do sth like:
--envVars ENV_VAR_KEY_1:value_1 $magicSeparator ENV_VAR_KEY_2:value_2
Friendly names for the --envVar groups in reports
It's same as with --corerun
friendly names. How should this be exposed from cmd line arg level?
Mix --corerun and --netX.Y on one command line
Currently --runtime x
combined with --corerun y z
means build as x
and run using y
and z
. I remember that we used it a while ago as a workaround for some dotnet/runtime limitation (iirc dotnet/runtime had an old SDK and it could not build new benchmarks from dotnet/performance that were using new APIs). We could change the meaning of it to: build as current (-f
) moniker, run as x
, y
and z
. iirc @stephentoub asked me for that in the past.
VTune diagnoser or similar that uses VTune API to mark actual measurement intervals
this sounds very interesting. Do you have any links to VTune API docs?
I am going to transfer this issue to BDN repo as all the feature requests are BDN feature requests.
iirc @stephentoub asked me for that in the past.
Would like to see this get done: https://github.com/dotnet/BenchmarkDotNet/issues/1634 :( So useful to have custom names for parameters of complex types.
I also find myself wishing there was simpler/smoother integration with linux perf
. Exporting to perfview is ok for CPU samples but for HW counters it's not really viable.
Something along these lines:
perf
diagnoser that runs the benchmark subprocess under perf report
or perf stat
, allowing me to specify the events of interest (often PMU events)perf inject -j
to add in the mappings for jitted code ranges-p EP
with the aboveRight now I am running perf record
over the entire BDN invocation and either boosting the iteration/invocation counts so that the actual intervals clearly dominate everything else, or slicing and looking at only the last 10% (say) of the recorded data.
VTune diagnoser or similar that uses VTune API to mark actual measurement intervals
this sounds very interesting. Do you have any links to VTune API docs?
Related: I'd like to see --join
fixed so multiple filter expression results can all show up in a single results table: https://github.com/dotnet/performance/issues/1855
It would also be nice to have an integrated diagnoser for ETW that is benchmark interval aware. I have a crude start at this in https://github.com/AndyAyersMS/instructionsretiredexplorer; it can post-process the ETW (actual interval aware) and project onto managed method names & tiering variants. eg
Mining ETL from D:\bugs\r72730\BenchmarkDotNet.Artifacts\LargeRegexTest.Generated-20220725-132937.etl for process corerun
PMC interval now 10000
Found process [9716] corerun: "D:\bugs\r72730\48b85438-13c4-4c73-94b5-b109ce10b9d2\corerun.exe" 150360d1-1148-4e51-8848-28e3c3c32196.dll --benchmarkName LargeRegexTest.Generated --job Toolchain=CoreRun --benchmarkId 0
==> benchmark process is [9716]
Samples for corerun: 16277 events for Benchmark Intervals
Jitting : 01.44% 1.25E+06 samples 1554 methods
JitInterface : 00.18% 1.6E+05 samples
Jit-generated code: 96.84% 8.4E+07 samples
Jitted code : 96.84% 8.4E+07 samples
MinOpts code : 00.00% 0 samples
FullOpts code : 00.00% 0 samples
Tier-0 code : 87.98% 7.63E+07 samples
Tier-1 code : 08.86% 7.69E+06 samples
R2R code : 00.00% 0 samples
00.47% 4.1E+05 ? Unknown
87.98% 7.632E+07 Tier-0 [r72730]<RegexGenerator_g>F7__GetAsmInstructionsRegex_0+RunnerFactory+Runner.TryMatchAtCurrentPosition(value class System.ReadOnlySpan`1<wchar>)
01.38% 1.2E+06 Tier-1 [r72730]LargeRegexTest.Generated()
01.26% 1.09E+06 native clrjit.dll
01.23% 1.07E+06 native coreclr.dll
00.91% 7.9E+05 Tier-1 [System.Private.CoreLib]System.ReadOnlySpan`1[System.Char].get_Item(int32)
00.89% 7.7E+05 Tier-1 [System.Text.RegularExpressions]Match.AddMatch(int32,int32,int32)
00.88% 7.6E+05 Tier-1 [System.Private.CoreLib]System.ReadOnlySpan`1[System.Char].Slice(int32)
00.85% 7.4E+05 Tier-1 [System.Private.CoreLib]System.ReadOnlySpan`1[System.Char].get_Length()
00.76% 6.6E+05 Tier-1 [r72730]<RegexGenerator_g>F7__GetAsmInstructionsRegex_0+RunnerFactory+Runner.Scan(value class System.ReadOnlySpan`1<wchar>)
00.69% 6E+05 Tier-1 [System.Text.RegularExpressions]Regex.RunSingleMatch(value class System.Text.RegularExpressions.RegexRunnerMode,int32,class System.String,int32,int32,int32)
00.44% 3.8E+05 Tier-1 [System.Text.RegularExpressions]RegexRunner.Capture(int32,int32,int32)
00.38% 3.3E+05 Tier-1 [System.Text.RegularExpressions]RegexRunner.InitializeForScan(class System.Text.RegularExpressions.Regex,value class System.ReadOnlySpan`1<wchar>,int32,value class System.Text.RegularExpressions.RegexRunnerMode)
00.31% 2.7E+05 Tier-1 [r72730]<RegexGenerator_g>F7__GetAsmInstructionsRegex_0+RunnerFactory+Runner.TryFindNextPossibleStartingPosition(value class System.ReadOnlySpan`1<wchar>)
00.30% 2.6E+05 Tier-1 [System.Text.RegularExpressions]Regex.IsMatch(class System.String)
00.29% 2.5E+05 Tier-1 [System.Private.CoreLib]String.op_Implicit(class System.String)
00.25% 2.2E+05 Tier-1 [System.Text.RegularExpressions]Match.Reset(class System.Text.RegularExpressions.Regex,class System.String,int32,int32,int32)
00.20% 1.7E+05 Tier-1 [System.Private.CoreLib]SpanHelpers.SequenceEqual(unsigned int8&,unsigned int8&,unsigned int)
00.18% 1.6E+05 Tier-1 [System.Private.CoreLib]MemoryExtensions.StartsWith(value class System.ReadOnlySpan`1<!!0>,value class System.ReadOnlySpan`1<!!0>)
00.12% 1E+05 native ntoskrnl.exe
00.10% 9E+04 Tier-1 [System.Text.RegularExpressions]RegexRunner.InitializeTimeout(value class System.TimeSpan)
00.08% 7E+04 native ntdll.dll
Benchmark: found 15 intervals; mean interval 570.348ms
I also find myself wishing there was simpler/smoother integration with linux perf
@AndyAyersMS few days ago I've merged https://github.com/dotnet/BenchmarkDotNet/pull/2117 which adds a perf diagnoser that uses perfcollect
internally. perfcollect
supports collecting hardware counters:
we could take advantage of that and build something on top of it. I won't have the time to do that myself in the near future, but I would be happy to chat and perhaps create an up-for-grabs issue with a very detailed description of what we need and how it could be implemented.
It would also be nice to have an integrated diagnoser for ETW that is benchmark interval aware.
For that we could definitely extend ETWProfiler
to always export such a file when hardware counters are enabled. We are already parsing the trace file:
in theory it should be a matter of implementing an exporter:
Filter on benchmark parameters
This is now built in: #2132
@AndyAyersMS few days ago I've merged #2117 which adds a perf diagnoser that uses
perfcollect
internally
Somehow I missed seeing this -- will have to try it out soon! Thanks!
@AndyAyersMS in case you are interested in more details: https://adamsitnik.com/PerfCollectProfiler/
System.Text.Json.Tests.Perf_Basic.WriteBasicUtf16(Formatted: False, SkipValidation: False, DataSize: 100000)
, not the other 5 flavors)Bench1.A
andPerf2.B
)--corerun
s in reports--envVar
that are treated as different run configs--envVar
groups in reports--corerun
and--netX.Y
on one command line (#2002)perf
(see notes below)