Performance test suite - Githubissues

bernhardmgruber commented 3 years ago

Alpaka's "aim is to provide performance portability across accelerators through the abstraction (not hiding!) of the underlying levels of parallelism." We should therefore also pay attention to possible performance impacts of changes applied to this library. In order to better assess such performance impacts I want to propose the creation of a performance test suite.

Performance tests are similar to the existing unit tests with some differences:

We do not need assertions to verify the correctness of the code. That is the job of unit tests.
Perform enough work so we can rule out noise. But keep them short enough so that the CI does not take ages. A good test length is probably around a few seconds.

When we have a few of these tests, we should setup a CI pipeline which can visualize how runtime performance changes with commits and PRs.

psychocoderHPC commented 3 years ago

IMO the performance test should always check for correct results. Such tests should be executable without a CI too and in that case, it could be that someone is testing different code versions without running all unit tests. Additionally, if you write an e.g. forEach as your performance test and one implementation has a logical programming mistake which results in a compiler optimization where the hot code is optimized out you would not realize the mistake if the performance difference is small.

bernhardmgruber commented 3 years ago

Such tests should be executable without a CI too and in that case, it could be that someone is testing different code versions without running all unit tests.

Absolutely!

Additionally, if you write an e.g. forEach as your performance test and one implementation has a logical programming mistake which results in a compiler optimization where the hot code is optimized out you would not realize the mistake if the performance difference is small.

You would not notice it when running the performance test suite, yes. But you would notice it when running the unit tests. And I think this is fine, because you also run your unit tests either before committing or at least as part of the CI build later before merging. Adding additional code to verify the results just makes the performance tests more complicated.

Example: filling a few GB buffers with random numbers. Would you want to run the number generator again on the host to verify that all random numbers are generated correctly? Or that the random number distribution is fine? That could easily double or triple the test runtime.

Furthermore, if the performance test consists of largely hot code that you want to benchmark, you can also easily perf it without modifying the code (i.e. disabling the computation of reference results and comparisons).

psychocoderHPC commented 3 years ago

Example: filling a few GB buffers with random numbers. Would you want to run the number generator again on the host to verify that all random numbers are generated correctly? Or that the random number distribution is fine? That could easily double or triple the test runtime.

A result verification is not required to be a point-wise comparison of values. It depends on the test what is correct or wrong. For random numbers, you would most likely check the standard derivation, ...

You would never know if your benchmark test implementation is correct if you not add a verification, so you need to do it anyway.

bernhardmgruber commented 3 years ago

A result verification is not required to be a point-wise comparison of values. It depends on the test what is correct or wrong. For random numbers, you would most likely check the standard derivation, ...

That depends. The alpaka random number generator currently being developed claims to be deterministic across accelerators. So you would need several point-wise comparisons across all accelerators.

You would never know if your benchmark test implementation is correct if you not add a verification, so you need to do it anyway.

But probably a very small check suffices. Only to know that the performance test is correct, but not that the called alpaka functionality is correct. E.g. alpaka has a random number library. And the correct generation of random numbers should be extensively verified by unit tests. The performance test could just check that you got "something".

alpaka-group / alpaka

Performance test suite #1264