catchorg / Catch2

A modern, C++-native, test framework for unit-tests, TDD and BDD - using C++14, C++17 and later (C++11 support is in v2.x branch, and C++03 on the Catch1.x branch)
https://discord.gg/4CWS9zD
Boost Software License 1.0
18.68k stars 3.05k forks source link

`--benchmark-samples` does not seem to work as expected #2813

Open Megaloblastt opened 9 months ago

Megaloblastt commented 9 months ago

Hi all,

I have some functions I'm trying to benchmark, and some of them can only be called once (internal mechanisms prevent a user to call them twice). The benchmarked function cannot be modified for obvious reasons, and there is no way I can prevent them from checking they are called only once.

After few search, I found the --benchmark-samples option, which allows to set the number of times the code within BENCHMARK macro is called. Unfortunately it seems it does not act as intended. I even used all the arsenal of available options to control benchmarks as follows --benchmark-samples 1 --benchmark-resamples 1 --benchmark-warmup-time 0 --benchmark-no-analysis

Nothing changed. What am I doing wrong? What should I do instead ?

Here is a sample code that produces the issue.

static uint8_t calls = 0;
static void function_callable_only_once() {
    if (calls > 0) throw std::exception("Only one call is allowed.");
    calls += 1;
}

BENCHMARK("generate_setup_commitments") {
    std::cout << "********** THIS IS SPARTA *************" << std::endl;
    REQUIRE_NOTHROW(function_callable_only_once());
};

The output is the following :

benchmark name                            samples    iterations          mean
-------------------------------------------------------------------------------
generate_setup_commitments           ********** THIS IS SPARTA *************
            1             1 ********** THIS IS SPARTA *************

stupid_test.cpp:25: FAILED:
  REQUIRE_NOTHROW(function_callable_only_once())
due to unexpected exception with message:
  Only one call is allowed.

Where you can see that the string ********** THIS IS SPARTA ************* is printed more than once. Please help/advise.

Thanks.

horenmar commented 8 months ago

You can't do this with Catch2, because the benchmark will always run the function at least twice. Once for the sample you ask for, but before that happens it will run at least once for the warmup. --benchmark-warmup-time sets the minimal time spent in warmup, but even setting it to 0 does not skip it.

--benchmark-resamples 1 does not do anything when combined with --benchmark-no-analysis, as these change how (and whether) the bootstrapping analysis from the taken samples happen.

Megaloblastt commented 8 months ago

And wouldn't it be relevant to specifically test if --benchmark-warmup-time 0 was set, and if yes, simply skip the warmup?

horenmar commented 8 months ago

That could be done, but that is a very specific, and frankly weird, use case. Catch2 primarily targets microbenchmarking and attempts to provide high quality statistical output. Doing only one sample, especially without warm up, can't provide that, and the only advantage of using BENCHMARK over auto t1 = std::chrono::steady_clock::now(); ... ; auto t2 = is that the output will go through the reporter.

Furthermore, if this is a function that can only be called once in program's lifetime (e.g. libfoo_init), trying to call it inside a benchmark/test is generally a bad idea, as it makes it easy to skip calling it or call it multiple times through test/benchmark selection. If, on the other hand, you have a function that can only be called once per set-up (e.g. member function on a class that steals internal state with a check that it was not called multiple times), you should instead provide sufficient setup through the advanced benchmarking facilities (this is also used e.g. for benchmarks of destructive algorithms).

Megaloblastt commented 8 months ago

Without entering the details of my use case, it consists on few steps, running in a given order, each of them being internally protected against replay attacks. And I want to benchmark each of the steps, and the benchmark results to be included in the report (thus go through the reporter). If there is a way to do so (maybe using BENCHMARK_ADVANCED, but I don't see how), please let me know.

But most important, independently of my use case, I would normally expect --benchmark-warmup-time 0 to skip the warmup. Or maybe accept a negative number if, internally, the code does something like

while (current_warmup_time <= max_warmup_time) {
// do something
}

so that --benchmark-warmup-time -1 would actually skip it.