Wishlist of features I'd find useful

AndyAyersMS commented 2 years ago

[x] Filter on benchmark parameters (say I only want to run System.Text.Json.Tests.Perf_Basic.WriteBasicUtf16(Formatted: False, SkipValidation: False, DataSize: 100000), not the other 5 flavors)
[x] Ability to specify sets of benchmarks in filters (say run both Bench1.A and Perf2.B)
[ ] Friendly names for --coreruns in reports
[ ] Multiple groups of --envVar that are treated as different run configs
[ ] Friendly names for the --envVar groups in reports
[x] Mix --corerun and --netX.Y on one command line (#2002)
[ ] VTune diagnoser or similar that uses VTune API to mark actual measurement intervals
[ ] Better integration with linux perf (see notes below)
[ ] When comparing two runtimes/coreruns/etc, use the same invocation count/iteration count for both runs so the same amount of work is being measured (and maybe the same warmup/overhead/etc so everything is more or less equivalent)

AndyAyersMS commented 2 years ago

In --corerun mode with two coreruns, use the same exact execution strategy for each.

AndyAyersMS commented 2 years ago

when passing multiple coreruns, run them and list them in the results table in the order specified on the command line. Right now they seem to be listed in the table in alphabetical order.

eg

dotnet run -c Release -f net6.0 -- --filter System.Numerics.Tests.Perf_BitOperations.PopCount_ulong --corerun D:\bugs\osr-perf\main-rel\corerun.exe D:\bugs\osr-perf\osr-rel\corerun.exe D:\bugs\osr-perf\hack-rel\corerun.exe

gives the following table

Method	Job	Toolchain	Mean	Error	StdDev	Median	Min	Max	Ratio	RatioSD	Allocated	Alloc Ratio
PopCount_ulong	Job-MTRLJC	\hack-rel\corerun.exe	464.7 ns	6.25 ns	5.84 ns	465.7 ns	450.6 ns	471.7 ns	1.39	0.02	-	NA
PopCount_ulong	Job-WAUWEH	\main-rel\corerun.exe	333.5 ns	4.70 ns	4.16 ns	334.8 ns	324.5 ns	339.7 ns	1.00	0.00	-	NA
PopCount_ulong	Job-LMIYYB	\osr-rel\corerun.exe	345.2 ns	8.60 ns	9.21 ns	347.0 ns	324.6 ns	360.4 ns	1.04	0.03	-	NA

Tohron commented 2 years ago

Merged a PR addressing the first point: https://github.com/dotnet/performance/pull/2314 To use the filter, you can use the format dotnet run -c Release -f net7.0 --filter *Perf_Basic* --parameter-filter SkipValidation:True DataSize:10 when doing a command line run from the usual directory.

AndyAyersMS commented 2 years ago

Nice! Looking forward to using it!

adamsitnik commented 2 years ago

Ability to specify sets of benchmarks in filters

--filter accepts multiple strings. Example:

--filter Bench1.A Perf2.B

Friendly names for --coreruns in reports

This should be easy to implement, once we have an idea how to expose it via command line args.

My current idea is:

--corerun path1 path2 --corerun-names name1 name2

but it's far from ideal.

Multiple groups of --envVar that are treated as different run configs

We could achieve that by introducing some new "separators" to --envVar.

Currently we have:

--envVars ENV_VAR_KEY_1:value_1 ENV_VAR_KEY_2:value_2

We could do sth like:

--envVars ENV_VAR_KEY_1:value_1 $magicSeparator ENV_VAR_KEY_2:value_2

Friendly names for the --envVar groups in reports

It's same as with --corerun friendly names. How should this be exposed from cmd line arg level?

Mix --corerun and --netX.Y on one command line

Currently --runtime x combined with --corerun y z means build as x and run using y and z. I remember that we used it a while ago as a workaround for some dotnet/runtime limitation (iirc dotnet/runtime had an old SDK and it could not build new benchmarks from dotnet/performance that were using new APIs). We could change the meaning of it to: build as current (-f) moniker, run as x, y and z. iirc @stephentoub asked me for that in the past.

VTune diagnoser or similar that uses VTune API to mark actual measurement intervals

this sounds very interesting. Do you have any links to VTune API docs?

adamsitnik commented 2 years ago

I am going to transfer this issue to BDN repo as all the feature requests are BDN feature requests.

stephentoub commented 2 years ago

iirc @stephentoub asked me for that in the past.

Yup: https://github.com/dotnet/BenchmarkDotNet/issues/1774

nawfalhasan commented 2 years ago

Would like to see this get done: https://github.com/dotnet/BenchmarkDotNet/issues/1634 :( So useful to have custom names for parameters of complex types.

AndyAyersMS commented 2 years ago

I also find myself wishing there was simpler/smoother integration with linux perf. Exporting to perfview is ok for CPU samples but for HW counters it's not really viable.

Something along these lines:

create a perf diagnoser that runs the benchmark subprocess under perf report or perf stat, allowing me to specify the events of interest (often PMU events)
Enables perf map under the covers
post processing of the data to inject markers/events for the actual intervals so that later on we can filter the reporting to just those stretches of time (using perhaps the switch on/off capabilities)
post processing using perf inject -j to add in the mappings for jitted code ranges
continued effort to both describe all the runtime stubs in the perf mappings and (where possible) make unwinding work through all of the stubs somehow so the call stack modes aren't badly broken (would also help perfview)
Similar injection of runtime events, eg a mashup of -p EP with the above

Right now I am running perf record over the entire BDN invocation and either boosting the iteration/invocation counts so that the actual intervals clearly dominate everything else, or slicing and looking at only the last 10% (say) of the recorded data.

AndyAyersMS commented 2 years ago

VTune diagnoser or similar that uses VTune API to mark actual measurement intervals

this sounds very interesting. Do you have any links to VTune API docs?

https://www.intel.com/content/www/us/en/develop/documentation/vtune-help/top/api-support/instrumentation-and-tracing-technology-apis.html

BruceForstall commented 2 years ago

Related: I'd like to see --join fixed so multiple filter expression results can all show up in a single results table: https://github.com/dotnet/performance/issues/1855

AndyAyersMS commented 2 years ago

It would also be nice to have an integrated diagnoser for ETW that is benchmark interval aware. I have a crude start at this in https://github.com/AndyAyersMS/instructionsretiredexplorer; it can post-process the ETW (actual interval aware) and project onto managed method names & tiering variants. eg

Mining ETL from D:\bugs\r72730\BenchmarkDotNet.Artifacts\LargeRegexTest.Generated-20220725-132937.etl for process corerun
PMC interval now 10000
Found process [9716] corerun: "D:\bugs\r72730\48b85438-13c4-4c73-94b5-b109ce10b9d2\corerun.exe" 150360d1-1148-4e51-8848-28e3c3c32196.dll --benchmarkName LargeRegexTest.Generated --job Toolchain=CoreRun --benchmarkId 0

==> benchmark process is [9716]

Samples for corerun: 16277 events for Benchmark Intervals
Jitting           : 01.44% 1.25E+06 samples 1554 methods
  JitInterface    : 00.18% 1.6E+05  samples
Jit-generated code: 96.84% 8.4E+07  samples
  Jitted code     : 96.84% 8.4E+07  samples
  MinOpts code    : 00.00% 0        samples
  FullOpts code   : 00.00% 0        samples
  Tier-0 code     : 87.98% 7.63E+07 samples
  Tier-1 code     : 08.86% 7.69E+06 samples
  R2R code        : 00.00% 0        samples

00.47%   4.1E+05     ?        Unknown
87.98%   7.632E+07   Tier-0   [r72730]<RegexGenerator_g>F7__GetAsmInstructionsRegex_0+RunnerFactory+Runner.TryMatchAtCurrentPosition(value class System.ReadOnlySpan`1<wchar>)
01.38%   1.2E+06     Tier-1   [r72730]LargeRegexTest.Generated()
01.26%   1.09E+06    native   clrjit.dll
01.23%   1.07E+06    native   coreclr.dll
00.91%   7.9E+05     Tier-1   [System.Private.CoreLib]System.ReadOnlySpan`1[System.Char].get_Item(int32)
00.89%   7.7E+05     Tier-1   [System.Text.RegularExpressions]Match.AddMatch(int32,int32,int32)
00.88%   7.6E+05     Tier-1   [System.Private.CoreLib]System.ReadOnlySpan`1[System.Char].Slice(int32)
00.85%   7.4E+05     Tier-1   [System.Private.CoreLib]System.ReadOnlySpan`1[System.Char].get_Length()
00.76%   6.6E+05     Tier-1   [r72730]<RegexGenerator_g>F7__GetAsmInstructionsRegex_0+RunnerFactory+Runner.Scan(value class System.ReadOnlySpan`1<wchar>)
00.69%   6E+05       Tier-1   [System.Text.RegularExpressions]Regex.RunSingleMatch(value class System.Text.RegularExpressions.RegexRunnerMode,int32,class System.String,int32,int32,int32)
00.44%   3.8E+05     Tier-1   [System.Text.RegularExpressions]RegexRunner.Capture(int32,int32,int32)
00.38%   3.3E+05     Tier-1   [System.Text.RegularExpressions]RegexRunner.InitializeForScan(class System.Text.RegularExpressions.Regex,value class System.ReadOnlySpan`1<wchar>,int32,value class System.Text.RegularExpressions.RegexRunnerMode)
00.31%   2.7E+05     Tier-1   [r72730]<RegexGenerator_g>F7__GetAsmInstructionsRegex_0+RunnerFactory+Runner.TryFindNextPossibleStartingPosition(value class System.ReadOnlySpan`1<wchar>)
00.30%   2.6E+05     Tier-1   [System.Text.RegularExpressions]Regex.IsMatch(class System.String)
00.29%   2.5E+05     Tier-1   [System.Private.CoreLib]String.op_Implicit(class System.String)
00.25%   2.2E+05     Tier-1   [System.Text.RegularExpressions]Match.Reset(class System.Text.RegularExpressions.Regex,class System.String,int32,int32,int32)
00.20%   1.7E+05     Tier-1   [System.Private.CoreLib]SpanHelpers.SequenceEqual(unsigned int8&,unsigned int8&,unsigned int)
00.18%   1.6E+05     Tier-1   [System.Private.CoreLib]MemoryExtensions.StartsWith(value class System.ReadOnlySpan`1<!!0>,value class System.ReadOnlySpan`1<!!0>)
00.12%   1E+05       native   ntoskrnl.exe
00.10%   9E+04       Tier-1   [System.Text.RegularExpressions]RegexRunner.InitializeTimeout(value class System.TimeSpan)
00.08%   7E+04       native   ntdll.dll

Benchmark: found 15 intervals; mean interval 570.348ms

adamsitnik commented 2 years ago

I also find myself wishing there was simpler/smoother integration with linux perf

@AndyAyersMS few days ago I've merged https://github.com/dotnet/BenchmarkDotNet/pull/2117 which adds a perf diagnoser that uses perfcollect internally. perfcollect supports collecting hardware counters:

https://github.com/dotnet/BenchmarkDotNet/blob/a78c2e6a6e3db79069fb5bbbd6da6e5cbea8c029/src/BenchmarkDotNet/Templates/perfcollect#L1891

we could take advantage of that and build something on top of it. I won't have the time to do that myself in the near future, but I would be happy to chat and perhaps create an up-for-grabs issue with a very detailed description of what we need and how it could be implemented.

adamsitnik commented 2 years ago

It would also be nice to have an integrated diagnoser for ETW that is benchmark interval aware.

For that we could definitely extend ETWProfiler to always export such a file when hardware counters are enabled. We are already parsing the trace file:

https://github.com/dotnet/BenchmarkDotNet/blob/a78c2e6a6e3db79069fb5bbbd6da6e5cbea8c029/src/BenchmarkDotNet.Diagnostics.Windows/EtwProfiler.cs#L81

in theory it should be a matter of implementing an exporter:

https://github.com/dotnet/BenchmarkDotNet/blob/a78c2e6a6e3db79069fb5bbbd6da6e5cbea8c029/src/BenchmarkDotNet.Diagnostics.Windows/EtwProfiler.cs#L48

adamsitnik commented 2 years ago

Filter on benchmark parameters

This is now built in: #2132

AndyAyersMS commented 1 year ago

@AndyAyersMS few days ago I've merged #2117 which adds a perf diagnoser that uses perfcollect internally

Somehow I missed seeing this -- will have to try it out soon! Thanks!

adamsitnik commented 1 year ago

@AndyAyersMS in case you are interested in more details: https://adamsitnik.com/PerfCollectProfiler/

dotnet / BenchmarkDotNet

Wishlist of features I'd find useful #1954