Open skleanthous opened 6 years ago
I take it, as you have mentioned it yourself, that you are already aware of the Global
and IterationSetup
.
Do you also realise that each test is ran in isolation in a separate application generated by BenchmarkDotNet.
So by the sounds of things, what you really need to do is just refactor your code as BenchmarkDotNet already does what you are asking for.
Just do the following:
[MemoryDiagnoser]
public class DependencyBenchmark
{
private IManageSessionOf<MyTempState> session;
private object myEvent;
public DependencyBenchmark()
=> myEvent = new
{
Count = 2,
Name = "Hey"
};
[GlobalSetup]
public void GlobalSetup()
{
PreCallRequirements();
}
[Benchmark] //Default. If no RunXXX properties are set, behaviour defaults to run before
public void PreCallRequirements()
{
session = SessionFor<MyTempState>.StartSession();
}
[Benchmark]
public void PostCallRequirements()
{
session.Dispose();
}
[Benchmark]
public void AddEventBenchmark()
{
session.AddEvent(myEvent);
}
[GlobalCleanup]
public void GlobalCleanup()
{
PostCallRequirements();
}
}
Hello @Matthew-Bonner and thank you for your reply.
What you posted doesn't quite work for benchmarking AddEventBenchmark because in the case of an in-mem collection results will be skewed since each iteration would be affected by pre-existence of events in the session (considering that the other one is a global setup).
So in summary what I need is a test/invocation setup, and I suggested one way by which inaccuracy introduced by introducing a test/invocation setup could be "subtracted" out. To compensate for its non-existence, currently I am doing this:
[Benchmark]
public IManageSessionOf<MyTempState> Overhead()
{
session = SessionFor<MyTempState>.StartSession();
session.Dispose();
return session;
}
[Benchmark]
public IManageSessionOf<MyTempState> AddEventBenchmark()
{
session = SessionFor<MyTempState>.StartSession();
session.AddEvent(myEvent);
session.Dispose();
return session;
}
And I manually subtract one from the other. I would really like to not polute my report with two unnecessary results.
Hello,
@Matthew-Bonner or @AndreyAkinshin : sorry for @-mentioning you guys, but if you just let me know if something like this is OK from your side and that if I create a PR this feature can/will be considered, I am more than keen to do all the work and create a PR ASAP. I also have some knowledge and a bit of experience of measurment theory (although both from soem years ago) so I'll probable be able to find the proper way to calculate sd & error in these cases.
I've made a minor contribution to the project, so I was merely seeing if what I suggested was an option to try and be helpful. :)
@skleanthous, @Matthew-Bonner sorry for the late reply.
I don't like the idea with [Overhead(RunBefore = true)]
. If you have a macrobenchmark, [IterationSetup]
should work for you. If you have a microbenchmark, you should try to design it without side effects. I don't believe that math magic with different statistics for setup/benchmark/cleanup will work. If we are going to implement something like that, we need an example which shows that such approach provides better results than the classic one. We are not adding such features without a proof which demonstrates that we really need them.
On the other hand, the last example by @skleanthous (with the Overhead
method which includes StartSession
and Dispose
) looks pretty nice. I guess that it's one of the best ways to solve your problem.
And I manually subtract one from the other.
We already have an issue for this problem: https://github.com/dotnet/BenchmarkDotNet/issues/431 (Option to Normalize Results by an Additive Baseline). It would be great if you create a PR which solves this problem. Before you start to implement this feature, we should discuss two following topics:
I have some idea how we could implement that:
Instead of generating the empty Idle
method which measures the overhead of BDN we could tell the Engine to use the specified method as Idle
(in BDN the result = result + overhead - overhead
). It should be a very simple change.
[Overhead(Target = nameof(AddEventBenchmark))]
public IManageSessionOf<MyTempState> Overhead()
{
session = SessionFor<MyTempState>.StartSession();
session.Dispose();
return session;
}
[Benchmark]
public IManageSessionOf<MyTempState> AddEventBenchmark()
{
session = SessionFor<MyTempState>.StartSession();
session.AddEvent(myEvent);
session.Dispose();
return session;
}
@AndreyAkinshin what do you think?
Hey @AndreyAkinshin & @adamsitnik, thank you for your replies :)
Basically as I understand things (please correct me if I'm wrong), jobs/classes that containe benchmarks are designed to benchmark cohesive behaviour with shared setup & teardown. Due to this, personally I think it is best to have overhead methods apply to all benchmarks defined in a class/job by default, although I do agree that also being able to specify a target (as in @adamsitnik example) would be highly beneficial.
My recommendation would be to define it as such:
public class JobA
{
[Overhead]
public IManageSessionOf<MyTempState> Overhead()
{
session = SessionFor<MyTempState>.StartSession();
session.Dispose();
return session;
}
[Benchmark]
public IManageSessionOf<MyTempState> AddEventBenchmark()
{
session = SessionFor<MyTempState>.StartSession();
session.AddEvent(myEvent);
session.Dispose();
return session;
}
}
public class JobB
{
[Benchmark]
public IManageSessionOf<MyTempState> FullAddEventBenchmarl()
{
session = SessionFor<MyTempState>.StartSession();
session.AddEvent(myEvent);
session.Dispose();
return session;
}
}
With report showing:
Method | Mean | With Overhead | Error | StdDev |
--------------------- |-----------:|---------------|---------:|---------:|
AddEventBenchmark | 995.9 ns | 1500us | 51.03 ns | 52.40 ns |
FullAddEventBenchmark | 2500ns | 0us | 51.03 ns | 52.40 ns |
With Overhead
would only appear if there is an overhead method defined. This way reports are easilly understandable.
@skleanthous our IterationSetup
and IterationCleanup
attributes allow you to set Taget. If Target == null
the attribute applies to all benchmarks in given class. More in our docs.
@adamsitnik maybe I didn't write it very clearly, but what I wanted to say is that I agree with having a target, but that it shouldn't be required. (it was unclear to me from your message)
I wanted to say is that I agree with having a target, but that it shouldn't be required.
@skleanthous so we both agree and now just need @AndreyAkinshin confirmation about my Idle
idea. Andrey is getting back from the MVP summit, I am not sure when he is going to respond
@adamsitnik, @skleanthous, this idea sounds pretty interesting. However, it will not be so easy to implement it. Right now we apply special kinds of rules for the Idle
methods. Here are a few problems:
Pilot
stage is used only for the Main
method; Idle
skips the Pilot
stage and use the same amount of invocation as the Main
method. I'm not sure that it's an acceptable trick for non-empty methods.Idle
method has known statistical properties. It allows getting Result
iteration by Result[i] = Main[i] - Average(Idle[1..N])
. But we can't use the average value for a non-empty method with custom user logic. Now we should work with two complex distributions; each of them could have a tricky form. We can calculate basic metrics by known formula, but I don't know how to calculate some "advanced" metrics like P99
or Kurtosis
. Memory statistics (like GC0/1/2) also can be spoiled (an example: both Overhead
and Main
method allocates memory; Main
method allocates little bit more memory, but we have the same GC0/1/2 stats because of the allocation quantum; if we get the same GC0/1/2 stats for both methods and just subtract one value from another, we can get zero stats which is not a correct result). Overhead
method takes more time than the Main
methods? Print a warning? Use our usual Max(value, 0)
rule? We have issue https://github.com/dotnet/BenchmarkDotNet/issues/601 which assumes that "impossible" values (1 < CPU clock) like 0.0088 ns
should be replaced by 0 ns
. Should we disable this trick for custom overhead methods?Thus, there are two main disadvantages:
I still think that it's better to just introduce a few additional columns without any changes in the execution infrastructure.
@AndreyAkinshin @adamsitnik so from what I understand, in order to avoid these issues the overhead methods need to run as if they were normal benchmarks but deduct their mean from the mean of the normal benchmarks and add a collumn as per my reply here (With Overhead collumn on the bottom of the message): https://github.com/dotnet/BenchmarkDotNet/issues/654#issuecomment-372300435. Can't this be handled on the report generation level?
Can't this be handled on the report generation level?
Yes, we should just introduce an additional attribute and an additional column in the summary table.
Probably we should replace basic columns like Mean
(or other columns which we can calculate for the difference between two distributions. In order to avoid any confusions, we can denote it as Mean*
or Mean^
or Mean
with another additional symbol (and explain it in the legend section).
I'm also not sure about the [Ovehead]
name. Probably, it makes sense to extend Baseline
feature set. In https://github.com/dotnet/BenchmarkDotNet/issues/299#issuecomment-261061598, I suggested a few possible options for extending baseline features and column replacement/hiding policies.
The first thing which we should solve (before starting to actually implement this feature) is how the API should look like and how the summary table should look like. We have too many different users and each of them has own needs. It's really hard to design the API which satisfies everyone.
@adamsitnik, what do you think?
Personally, I feel the [Overhead]
name is very good because it conveys a lot of information both about what it is and about what is to be expected of it (from functionality). I can understand that the relationship between this new attribute and Baseline
is going to be an is-a
(as in: overhead as proposed is-a type of baseline, specifically DiffAbsolute
), but naming things for what they do over what they are, results in better readability imo.
Obviously, the above is just my opinion. I understand that there are already established standards in place and I will be grateful to see this functionality eventually deployed regardless of naming :)
@AndreyAkinshin @skleanthous If the implementation can be simple (not changing the engine, just an extra column/few math operations before printing the results) then I am ok with it.
@adamsitnik & @AndreyAkinshin So is this agreed? I would like to work on this feature if this is agreed
@skleanthous 👍
@AndreyAkinshin ?
@skleanthous, 👍
I'm still not sure about naming, but let's try your idea with [Overhead]
.
Awesome! Thank you. I'll start ASAP. Considering that I have tons of work at my day job, my own open source project that I maintain and (most importantly) a family, any pointers on where to start/focus would be greatly appreciated.
Just assign me this ticket and I'll start immediately.
@skleanthous,
[Benchmark, Overhead]
. I think that we should keep the [Benchmark]
attribute on such methods because we actually going to benchmark it (otherwise, it can be confused). You can inherit [Overhead]
from TargetedAttribute, so it will be possible to specify Target
(it means that we are going to "apply" this overhead to a specific method).BenchmarkConverter
: you should find benchmark methods with [Overhead]
attribute and pass it (with target value) to our "intermediate benchmark representation"Mean^
) and implement the math (use BaselineScaledColumn as a sample).Let me know if you have any questions.
@AndreyAkinshin thank you for info. I'll check it out over the weekend and I'll reply here if I need anything.
Did anyone ever do a PR for this? I was just reading some documentation @adamsitnik and imagined [Overhead]
also, and was happy to find it was already discussed deeply. :)
I'm having some overhead due to method calls + dictionary lookups. From the date of this issue and #431 I guess there hasn't been any progress on them, but I feel that this additional logic can help generate more meaningful/correct statistics. Has anyone ever started working on it? If so, please let me know. I would like to continue to get this feature a reality.
I think the notion by @adamsitnik is still very sound.
[Overhead(Target = nameof(AddEventBenchmark))] public IManageSessionOf<MyTempState> Overhead() { session = SessionFor<MyTempState>.StartSession(); session.Dispose(); return session; } [Benchmark] public IManageSessionOf<MyTempState> AddEventBenchmark() { session = SessionFor<MyTempState>.StartSession(); session.AddEvent(myEvent); session.Dispose(); return session; }
What are the current thoughts of [Benchmark, Overhead]
vs. [Overhead]
? I think both notations are fine, so let me know what you prefer. If noone else wants to grab the issue, I would work on it after christmas :)
Introduction
I'm trying to benchmark some large-er operations in my library that unfortunately cannot be isolated in a way that will give any confidence that it will be the same in production. This is similar to benchmarking an "AddToCollection" operation, which would need a setup that clears the collection on every operation.
Programmability proposal
What I would love is to be able to do the following in my benchmark:
Possible process
This would be conceptually similar to a setup for method invokation I guess. However, I think they can provide consistent and accurate results with the following process:
Benchmark
attribute. In the example above, Benchmark.Net would run benchmarks on bothPreCallRequirements
andPostCallRequirements
methods indipendantly. These would result in benchmarkPre and benchmarkPostAddEventBenchmark
normally, calling the two overhead methods as method invocation setup and teardown. Lets call this benchmarkMainGloabSetup
andIterationSetup
methods would still be called normally in all above cases.Benefits
Possible problems