Open sleeepyjack opened 2 years ago
Note that using exec_tag::sync
isn't really reliable for CPU-only benchmarks because it still uses CUDA events for timing. This works, but it's a little hacky.
The main things a exec_tag::host
would need to do:
std::chrono
) instead of CUDA eventsCan this one get a bump given Grace is a common use case now?
Nvbench currently does not support benchmarking CPU-only code natively. Although adding
nvbench::exec_tag::sync
gives plausible measurements for cold runs, there is no mechanism for batch measurements. We could enable this feature by e.g. adding a distinct exec tagnvbench::exec_tag::host
.