Open bluenote10 opened 5 years ago
The idea is to allocate the seq externally and then pass it to the measuring function. You could write an iterator that does so and then pass it to measure
in order to perform the measurement only the wanted code.
There is also the general overhead
That's expected and is also expected to have a very low variance so that comparing different runs gives meaningful values.
It would just be very convenient to have a timed
block, because
seq
I would include the time to copy the seq
in the measurement, and that's what I would like to avoid.timed
block one wouldn't have to worry about cases where the post-measurement-assertion has non-negligible complexity.I think even with an iterator the data is generated only once, and there is another loop which repeatedly uses that data, right?
Yes, for each element of the iterator N samples are collected. If you want to measure how the function behaves for M different inputs just generate them beforehand with an iterator or whatever, changing the data in between runs is bound to generate garbage results and would defeat the whole point of this benchmarking scheme.
Isn't there an unknown overhead due to Nim potentially copying the data that is passed to the function? I assume that with a large seq I would include the time to copy the seq in the measurement, and that's what I would like to avoid.
Once the data is ready a closure is allocated internally and then passed to the "real" benchmark function. The only overhead you'll see is a few cycles due to the extra indirection (if the C compiler didn't inline everything).
With the timed block one wouldn't have to worry about cases where the post-measurement-assertion has non-negligible complexity.
The blackbox
function should be enough to stop the compiler from optimizing away everything while keeping a very low overhead.
Currently it is difficult to isolate setup code from the part of the code that one really wants to measure. Silly example:
The measurement is obviously completely dominated by the setup code
let s = newSeq[int](n)
. A work-around is to initialize data in the surrounding scope and close over the data variables. But this prevents to parametrize the benchmark and there seems to be an overhead of copying the data to the benchmark. There is also the general overhead:which is ~4 cycles for me.
What about the following API:
Where
timed
is a template that internally measure the time/cycles before and after executing itsbody
and updatingresult
with the corresponding time/cycle deltas. This would allow to separate the setup & optimization prevention code from the actual benchmark code.