It would be great to exclude setup/teardown code from benchmark measurements.
Using callgrind rather than cachegrind looks like it allows you to zero and dump the counters at particular points in the program. This can be done on the command line with --zero-before=my_fn and --dump-after=my_fn. It looks like counting can be turned on/off programatically as well.
callgrind also supports cache simulation with the same command line options and has a similar output file format, in particular with summary and events lines.
It looked to me like this would allow you to skip setup and teardown code.
Another strategy I'm experimenting with is running two versions - one that just performs the setup, the other performs the setup and run. I hypothesise that the delta is attributable to the run.
Hi,
It would be great to exclude setup/teardown code from benchmark measurements.
Using
callgrind
rather thancachegrind
looks like it allows you to zero and dump the counters at particular points in the program. This can be done on the command line with--zero-before=my_fn
and--dump-after=my_fn
. It looks like counting can be turned on/off programatically as well.callgrind
also supports cache simulation with the same command line options and has a similar output file format, in particular withsummary
andevents
lines.It looked to me like this would allow you to skip setup and teardown code.