Add performance measurement of the GPU accelerated functions in StackedTraces as well as their CPU counterparts on unstacked traces.
Have deterministically generated pseudorandom data.
Have configurable number of traces and samples.
Have configurable data type (uint/float, 8/16).
Potentially have configurable data distribution (uniform/normal?, just to see if there is an effect).
Also measure the amount of time the stacking takes.
On the GPU side, ideally measure the different steps required to compute separately (i.e. getting the traces into GPU memory, computing on them, getting the result back).
Get inspired by the perf code in the tests directory, but do not use the profiler defined there.
Add performance measurement of the GPU accelerated functions in StackedTraces as well as their CPU counterparts on unstacked traces.
Have deterministically generated pseudorandom data. Have configurable number of traces and samples. Have configurable data type (uint/float, 8/16). Potentially have configurable data distribution (uniform/normal?, just to see if there is an effect).
Also measure the amount of time the stacking takes.
On the GPU side, ideally measure the different steps required to compute separately (i.e. getting the traces into GPU memory, computing on them, getting the result back).
Get inspired by the perf code in the tests directory, but do not use the profiler defined there.