Quuxplusone / LLVMBugzillaTest

0 stars 0 forks source link

Add automated C++ library benchmark tests #30057

Open Quuxplusone opened 7 years ago

Quuxplusone commented 7 years ago
Bugzilla Link PR31084
Status CONFIRMED
Importance P normal
Reported by Renato Golin (rengolin@gmail.com)
Reported on 2016-11-21 03:41:52 -0800
Last modified on 2018-02-18 14:23:04 -0800
Version unspecified
Hardware PC Linux
CC adhemerval.zanella@linaro.org, chandlerc@gmail.com, diana.picus@linaro.org, eric@efcs.ca, hfinkel@anl.gov, james@jamesmolloy.co.uk, llvm-bugs@lists.llvm.org, mclow.lists@gmail.com, smeenai@fb.com, smithp352@googlemail.com
Fixed by commit(s)
Attachments
Blocks
Blocked by
See also
We have plenty of C++ library tests in the libcxx repo, but no benchmarks at
all. During the "Libcxx performance" BoF at the US LLVM 2016, we discussed that
adding such a benchmark to the test-suite would be a great idea. However, I
have no idea how to even start.

First, there were some talks about integrating the Google Benchmark [1] into
the test-suite and (maybe) convert the current benchmarks to use it. Regardless
of a potential move, since we're creating a whole new benchmark, I'd assume
using a tested tool is better than using gettimeofday()...

Second, what kinds of tests do we want? I can think of two approaches:

1. We test the "number of iterations" that algorithms and containers perform on
small to medium datasets and make sure they're scaling as documented. I'm not
sure how to do that other than instrumenting libcxx and only enabling the
instrumentation in the benchmark. Other users can enable this on their own
programs, for debug?

2. We test the actual "wall time" on a single core (threading is a separate
problem), and make sure that not only we don't regress from past runs, but that
the actual time spent *also* scales with the guarantees.

What we cannot do is to compare with other standard libraries, unless we want
to include their sources (and build) in the test-suite. Relying on whatever is
in the system will never work across the board.

Finally, the "complicated set". Threading, atomics and localisation tend to be
hard to test and heavily dependent on the architecture and OS. We already have
tests for that, but I fear the run time of those tests, if measured, will have
a large deviation, even on the same architecture/OS combination. Sub-
architecture and OS configureations will play a heavy role on them.

So, how do we start? Who want's to help?

cheers,
--renato

[1] https://github.com/google/benchmark
Quuxplusone commented 7 years ago

Eric seems to have added something already to libcxx:

https://reviews.llvm.org/D8107

I think that's an excellent start!

Also, googlebenchmark is awesome! It lets you have multiple ranges, and even check the complexity directly, no additional code necessary.

However, googlebenchmark seens to be actively maintained, and has a list of bugs [1] on GitHub. I wonder which one of those we should fix (or not) to get a snapshot into the test-suite.

We discussed this earlier, and having it as a sub-module won't work, because then tests can fail due to differences in API or behaviour. This means that googlebenchmark will need a proper release process, which will involve at least running the LLVM test-suite on all supported architectures before being declared stable.

[1] https://github.com/google/benchmark/issues

Quuxplusone commented 7 years ago
(In reply to comment #1)
> Eric seems to have added something already to libcxx:
>
> https://reviews.llvm.org/D8107

Sorry but that patch is long dead.

> We have plenty of C++ library tests in the libcxx repo, but no benchmarks at
all.

That hasn't been true for a number of months now. Libc++ contains benchmarks
under `libcxx/benchmarks`. These benchmarks are written using Google Benchmark
which is checked into the libc++ tree under `libcxx/utils/google-benchmark`.

>
> Also, googlebenchmark is awesome! It lets you have multiple ranges, and even
> check the complexity directly, no additional code necessary.
>
> However, googlebenchmark seens to be actively maintained, and has a list of
> bugs [1] on GitHub. I wonder which one of those we should fix (or not) to
> get a snapshot into the test-suite.

Good news! I'm the primary maintainer of the library. So all serious bugs have
already been fixed.

> We discussed this earlier, and having it as a sub-module won't work, because
> then tests can fail due to differences in API or behaviour.

Assuming you mean git submodules then that shouldn't be an issue, since
submodules need to be updated separately so we can just leave the submodule on
the same commit until we are ready to upgrade.

Additionally Google Benchmark has just started formally versioning releases,
and the submodule could simply be set to one of those tags.

-----------

> What we cannot do is to compare with other standard libraries, unless we
> want to include their sources (and build) in the test-suite.
> Relying on whatever is in the system will never work across the board.

True, but supporting easy comparisons with the system STL is quite handy.
Libc++ already allows this and it's helped finding performance problems on
Linux. However, as you mention, the results of the system STL obviously cannot
be used as a stable baseline.

> Second, what kinds of tests do we want? I can think of two approaches:
>
> 1. [...]
>
> 2. We test the actual "wall time" on a single core (threading is a separate
problem), and make sure that not only we don't regress from past runs, but that
the actual time spent *also* scales with the guarantees.

I think #2 is the easiest solution, and is already somewhat supported by the
Google Benchmark library. Google Benchmark supports comparing both the wall
time and CPU execution time of benchmarks against previous results.

The main issue with this approach is finding stable hardware to setup a bot on.
The benchmarks need to be the only thing running to avoid test failures due to
differences in CPU load. I think this will require changing Zorg to have
nightly benchmark builders that can run at like 3AM and can block all other
bots from running at the same time.
Quuxplusone commented 7 years ago

@Renato Can I re-bin this to libc++ or did you want google-benchmark as a part of the LLVM repository?

Quuxplusone commented 7 years ago
Hi Eric, feel free to move it around.

Cheers,
Renato
Quuxplusone commented 7 years ago

OK. re-binning as libc++ specific.

Quuxplusone commented 7 years ago

re-titling to better represent the current state of the bug vs libc++.