dotnet / BenchmarkDotNet

Powerful .NET library for benchmarking
https://benchmarkdotnet.org
MIT License
10.43k stars 958 forks source link

Summary Save/Load/Combine #305

Open AndreyAkinshin opened 7 years ago

AndreyAkinshin commented 7 years ago

Now BenchmarkDotNet allows to create a nice set of jobs for different environments. However, sometimes it's impossible to get all desired result with a single run. Examples:

So, I suggest to add Save/Load/Combine methods which will help to solve this problems. An example:

// Part 1: WIndows
var summary = BenchmarkRunner.Run<MyBench>();
summary.Save("windows.json");
// Part 2: Linux
var summary = BenchmarkRunner.Run<MyBench>();
summary.Save("linux.json")
// Part 3: Common summary table
var summaryWindows = Summary.Load("windows.json");
var summaryLinux   = Summary.Load("linux.json");
var summary = Summary.Combine(summaryWindows, summaryLinux);

Some additional thoughts / points for discussions.

@adamsitnik, @mattwarren, @ig-sinicyn, @terrajobst, what do you think?

ig-sinicyn commented 7 years ago

@AndreyAkinshin, please, not JSON. There's standard format for storing and exchanging the data - XML. It has good conventions for value escaping and date representation. It has strong specification without silly logical errors such as undetermined behavior for duplicate keys. There's API for it out of the box, no additional dependencies required.

Why JSON? About API: why not expose it as an Exporter?

Oh, and now goes the worst part: how the Combine() should work at all? Averaging? No way.

Case 1.

Imagine two benchmark methods, A and B

On PC1:
`A` - 1000ns (Vector.HardwareAccelerated` is false); `B` - 100 ns

On PC2
`A` - 150ns (Vector.HardwareAccelerated` is true); `B` - 300 ns

now what? The bad news: we have no need to imagine it. It's a real case we were faced a year ago.

Case 2.

On PC1:
`A` - 200ns

On PC2
`B` - 150ns

how should these be merged at all? There's no common baseline for it so we have no idea what the actual ratio is. It can be like

On PC2
`B` - 150ns; A - 50ns
-or-
`B` - 150ns; A - 2000ns

the only way to compare them is to calculate relative-to-baseline times. And then entire summary can be safely shortened to

<CompetitionBenchmarks>
    <Competition Target="CodeJam.Examples.SimplePerfTest, CodeJam.PerfTests-Tests.NUnit">
        <Candidate Target="Baseline" Baseline="true" />
        <Candidate Target="SlowerX3" MinRatio="2.91" MaxRatio="3.09" />
        <Candidate Target="SlowerX5" MinRatio="4.85" MaxRatio="5.15" />
        <Candidate Target="SlowerX7" MinRatio="6.79" MaxRatio="7.21" />
    </Competition>
</CompetitionBenchmarks>

and yep, I've did it an it works:)

AndreyAkinshin commented 7 years ago

please, not JSON.

Ok, let's use XML. =)

About API: why not expose it as an Exporter?

Sounds good to me. Thus, we also have to define an Importer.

Oh, and now goes the worst part: how the Combine() should work at all?

All of the examples in the issue are about single PC. You are absolutely right, it's really hard to compare performance numbers across different computers. Probably, we should check combined environments and prints some warnings.

ig-sinicyn commented 7 years ago

Thus, we also have to define an Importer.

Well, maybe :)

All of the examples in the issue are about single PC.

The same issues do apply to the single PC case. The timings may change due to IO latency, FW upgrade, upgrade of the BDN (future roslyn upgrades may bring some optimizations like this). There's no point to keep absolute timings until you preserve the context too. And then it's better to store it as '.csv' or any another tabular format and ETL it into any datamining service.

The summary is the following:

Actually, I've did both (there's CsvTimingsExporter but I have a thought about switching to something like SQLite one day).

adamsitnik commented 7 years ago

Personally what I would like to add is to compare few different versions of nuget package.

sth like:

.Add(Job.WithNuget("System.Slices", version: 1.10.0))
.Add(Job.WithNuget("System.Slices", version: 1.20.0))

and

.Add(Job.WithNuget("System.Slices", version: Version.Latest))

so people could write some unit tests for having no performance drops

as for the format, we would have to provide a mechanism like Sinks in SeriLog: we provide interface (IExporter/IImporter) and people implement it for Xml, MS Sql, Sql Lite, RavenDb etc

ig-sinicyn commented 7 years ago

@adamsitnik

the same-time run, as in

.Add(Job.WithNuget("System.Slices", version: 1.10.0))
.Add(Job.WithNuget("System.Slices", version: 1.20.0))

do not require export / import at all as the BenchmarkConverter.TypeToBenchmarks(type, config) produces separate benchmark for every combination of job/parameters.

adamsitnik commented 7 years ago

do not require export

yes exactly, so we eliminate problems with storage / different PC conditions but on the other hand it will take two times more time ;)

forki commented 6 years ago

So is there a way to store and load?

AndreyAkinshin commented 6 years ago

@forki, you can export results, but there is no way to import it for now. I hope, this feature will be implemented in the nearest future.

Lonli-Lokli commented 4 years ago

As an addition to the request, there is a package with storing information regarding tests - https://github.com/approvals/ApprovalTests.Net

All tests are stored as text tables.

workgroupengineering commented 2 years ago

Hi, I need this feature, but I don't want to make any breaking changes. I would like to create a SnapshotToolchain which implements IToolchain and SnapshotExporter which implements IExporter.

I would like to use it like this:

namespace BenchmarkDotNet.Samples
{
    [Config(typeof(Config))]
    [XmlSnapshotExporter]
    public class TheClassWithBenchmarks
    {
        private class Config : ManualConfig
        {
            public Config()
            {
                AddJob(Job.MediumRun);
                AddJob(Job.MediumRun
                    .WithToolchain(SnapshotToolchain.FromXml("path"))
                    .WithId("Shapshot"));
            }
        }

       ...
    }

Do you think it would work?