alpaka-group / alpaka

Abstraction Library for Parallel Kernel Acceleration :llama:
https://alpaka.readthedocs.io
Mozilla Public License 2.0
337 stars 69 forks source link

Create benchmarks directory and move babelstream into it #2237

Closed mehmetyusufoglu closed 3 months ago

mehmetyusufoglu commented 5 months ago

A simple PR. A directory called "benchmarks" is created and babelstream example is copied into it. There is a new cmake flag _alpaka_BUILDBENCHMARKS. If this flag is ON then _alpaka_ACC_CPU_B_SEQ_T_SEQENABLE is turned ON (Like _alpaka_BUILDEXAMPLES flag)

The codes under benchmark directory is compiled. But Babelstream example is not run at the CI, as it was in the Examples directory before.

SimeonEhrig commented 5 months ago

Maybe we should add a cmake target all_benchmarks where we can register and execute all benchmarks.

The benchmarks need to be build in the CI. I see no reason, why we should not enable the benchmarks for all builds. Therefore we can add the CMake argument here: https://github.com/alpaka-group/alpaka/blob/dcc87b3edc8493796a748e5e551c319c875774c5/script/run_generate.sh#L80-L94

bernhardmgruber commented 5 months ago

I generally like the idea of separating benchmarks and examples, but could you please elaborate a bit on your motivation for doing this? Specifically, are you going to add more benchmarks? Are you planning to build the benchmarks differently than examples? Thx!

mehmetyusufoglu commented 5 months ago

In my opinion, examples could be designed for any reason, pedagogical or showing implementation of a new feature etc. Benchmarks will mainly focus on performance and visualising it's change through time with CI will show general performance effects of each PR merged.

bernhardmgruber commented 5 months ago

Benchmarks will mainly focus on performance and visualising it's change through time with CI will show general performance effects of each PR merged.

Alright, so you are preparing for some kind of performance CI? Here is a ticket for that: #1264

SimeonEhrig commented 5 months ago

Benchmarks will mainly focus on performance and visualising it's change through time with CI will show general performance effects of each PR merged.

Alright, so you are preparing for some kind of performance CI? Here is a ticket for that: #1264

We discussed it last week. At the moment, a CI is not possible because of lacking resources. But we want to have benchmarks to run regression benchmarks locally on laptops, workstation or server.

For example, we thought about to use mdspan for tensors in kernels. This makes the usage easier instead using raw pointers. But maybe the performance overhead is to high, which means we need also to implement an interface with raw pointers.

sliwowitz commented 5 months ago

There's also https://github.com/alpaka-group/alpaka/pull/1723 which I've just rebased on top of actual develop. It uses Catch2 for benchmarking infrastructure (thus integrated with e.g. ctest). I tried to implement a generaic fixture for benchmarking kernels that would allow us to write simple benchmarks for basic features, but I didn't implement any other use case than the random generator, so I didn't know what would be some actual sensible requirements for such a fixture.

Using Catch2 to handle the benchmarks is IMHO still a good idea since we're already using it to handle tests.