dotnet / BenchmarkDotNet

Powerful .NET library for benchmarking
https://benchmarkdotnet.org
MIT License
10.56k stars 970 forks source link

Run history difference reporting / exporting #973

Open lahma opened 5 years ago

lahma commented 5 years ago

I'm filing an issue just the check whether it would be a valid feature and reasonable to implement on BenchmarkDotNet's side. I've built a small and ugly helper that produces difference between two BenchmarkDotNet runs using CSV reports: https://github.com/lahma/BenchmarkDotNet.ResultDiff .

So the parsing is ugly and brittle, but having a feature that would clearly state difference between various runs in percentages/absolute values seems beneficial to me. I've used when optimizing the Jint library and I feel it's great way to easily communicate the difference. I see people throwing two sets of results (before/after) when creating optimization PRs and if there are more than 3 rows to mentally diff, it gets burdensome (or it's just me).

So what I would suggest is some form of exporter that keeps track of every run done in efficient raw data format, say normal_file_name_yyyy-MM-dd-HH-mm-ss.data and then runs a diff of oldest and newest (by default) and produces similar output like the tool I linked which shows what's the actual difference between work done.

adamsitnik commented 5 years ago

Hi @lahma

I like this idea and I have even implemented a similar tool in the dotnet/performance repo https://github.com/dotnet/performance/tree/master/src/tools/ResultsComparer

Maybe we should add it as a new, global tool similar to https://github.com/dotnet/BenchmarkDotNet/pull/1006 ?

@AndreyAkinshin what do you think about adding such command line tools to the BDN repo?

CodeTherapist commented 5 years ago

@adamsitnik @adamsitnik I would like to implement that as well.

Design suggestion

Implementing it into the existing BenchmarkDotNet.Tool project with the sub-commands concept.

The subcommand run would execute benchmarks: dotnet benchmarkdotnet run [arguments] [options]

The subcommand diff would get the difference between a baseline report and another report: dotnet benchmarkdotnet diff [arguments] [options]

This design is similiar to other well known dotnet tools (e.g. tooling for EF Core). In the following example database is a sub-command of the ef command: dotnet ef database [arguments]

Advantages

How would you like that?

AndreyAkinshin commented 5 years ago
  1. I think it's a good idea to reuse BenchmarkDotNet.Tool for all kinds of commands. We don't need many different NuGet packages for different commands.
  2. If we want to have another kind of the summary table, we should have a corresponding exporter which allows getting this new "diff" table for the current benchmarking session.
  3. Currently, we don't have a proper serialization format for benchmark results (see #305). I have some design notes for this format, but I still didn't come with good specifications for that. I think that the best way is to finish these specifications and introduce commands for post-processing. In this case, we will be able to apply any kinds of exporters to benchmark session which are already finished. Also, we can introduce a special kind of exporters which can consume two different reports (we can call it Differ).
  4. The current CSV export format is a temporary hack which I needed for the RPlotExporter. It wasn't designed for other kinds of post-processing (e.g., it doesn't include the environment data: we can't form a proper header for the diff summary table automatically).
lahma commented 5 years ago

I love all the possibilities listed here. I'd also like to point out the case which is important to me. I usually have baseline run that tests system (multiple methods, sub-systems) and I try to see how the results have been affetected by change. So in my case it's more of a "overall rps/s change after I fine-tuned data-structures allocation patterns" instead of "which of the two methods is faster". So I usually run multiple benchmarks stressing library from the top and check that no regressions are introduced when I tweak some particular case.

So in short I'll run same benchmarks and I usually want to see allocation and duration changes for the same benchmark over time.

adamsitnik commented 5 years ago

I think it's a good idea to reuse BenchmarkDotNet.Tool for all kinds of commands. We don't need many different NuGet packages for different commands.

@AndreyAkinshin personally I would prefer a dedicated tool for every command. It would give us a better overview of what our users are using (nuget stats) and cleaner commands, more Unix-like?

Commands with a single tool:

dotnet benchmark run abc.dll
dotnet benchmark compare x.json y.json

With dedicated tools:

dotnet benchmark abc.dll
dotnet compare x.json y.json

Also in the future I would like to move some of our code to stand-alone tools. Examples: disassembler (could be reused by others) and profilers (could also be reused)

dotnet disassembler --processId 1234 --method My.Program.Main --depth 3 
dotnet profiler start --type ETW
dotnet profiler stop --type ETW

@AndreyAkinshin what do you think about this idea in general?

Speaking of the files, as of today every run overwrites the previous result. I think that we should change it (maybe include a time stamp in the file name or sth like that?). Also, prefer JSON over CSV. It's more "type safe" to me ;p

AndreyAkinshin commented 5 years ago

I think that option "install one package and get all of the command line out of the box" is better than forcing users to install a separate NuGet package for each command.

Also, I don't like this command line:

dotnet compare x.json y.json

It's OK for us to reserve the dotnet benchmark keyword because BenchmarkDotNet is the most popular benchmarking library. However, I don't want to reserve dotnet compare for comparing BenchmarkDotNet-specific files (the same for disassembler and `profiler). Maybe we can resolve it via arguments like this:

dotnet benchmark --info
dotnet benchmark --version
dotnet benchmark --compare x.json y.json
dotnet benchmark abc.dll
adamsitnik commented 5 years ago

@AndreyAkinshin You are right.

BTW if we switch to System.CommandLine (#1016) it should be easier to write a single global tool that handles everything we want (it was designed for global tools, including support of auto-complete for the argument names!)

paule96 commented 5 years ago

This would be super nice. Currently, I try to compare the results in azure devops put don't find a good way to do this.

Wildenhaus commented 5 years ago

It would be nice to have additional metrics reported with each summary, such as being able to display the net increase or decrease in each benchmark's mean execution time (compared to the previous run).

For example, something like:

Method |     Mean |     Error |    StdDev | Delta Mean |
------ |---------:|----------:|----------:|-----------:|
 TestA | 1.617 ns | 0.0583 ns | 0.0924 ns |  ▼ 0.143ns |  Increase in performance
 TestB | 1.383 ns | 0.0218 ns | 0.0204 ns |  ▲ 0.522ns |  Decrease in performance
 TestC | 4.288 ns | 0.1344 ns | 0.0819 ns |        n/a |  No records to compare to

Adding on to that idea, being able to plot changes in performance over time (even if it meant opening a file in a third-party program) would also be awesome and help greatly with development.

MarcoRossignoli commented 5 years ago

@adamsitnik ask to me to follow up here after https://github.com/dotnet/performance/pull/314#pullrequestreview-206180237

I don't know if new comparer tool will be the same of BenchmarkDotNet.Tool but as I said could be useful:

dominikjeske commented 4 years ago

This feature looks promising - is there any progress on this?

svengeance commented 4 years ago

I was also looking for something like this. It would be wonderful if I could easily run BDN with a comparison CLI argument as part of a PR cycle, and have PR submissions report the net increase (or decrease) in performance.

rymeskar commented 3 years ago

@adamsitnik and @AndreyAkinshin is there any news around 'run history difference reporting / exporting'?

gioce90 commented 2 years ago

This feature looks promising - is there any progress on this?

Same question

lloydjatkinson commented 8 months ago

One of the biggest pain points I'm finding with BDN so far is there's no convenient way of comparing a before and after. So far, what I'm having to do is write a MyClass and MyClass2 which are identical except the performance change I'm trying out. This is a pretty grim methodology. It would be really good if it could compare across git commits.

Tarun047 commented 8 months ago

+1 for this. Please let me know if anyone is taking this up. If not I can take it up. As this would be of huge benefit for many people trying to use this on CI/CD pipelines.

AndreyAkinshin commented 8 months ago

@Tarun047 I'm working on it right now. A huge refactoring is coming with a new serialization format + a lot of new features including various reports.

Enterprize1 commented 2 months ago

@lloydjatkinson I had the same thought and created https://www.nuget.org/packages/Enterprize1.BenchmarkDotNet.GitCompare that allows running BenchmarkDotNet Jobs for different commits/branches (or just HEAD to see the difference the current changes make). Hope it works as good for you as it does for me.