NVIDIA / nvbench

CUDA Kernel Benchmarking Library
Apache License 2.0
510 stars 64 forks source link

How to properly extend more metric collections of CUPTI #91

Closed 4mod3 closed 2 years ago

4mod3 commented 2 years ago

Thanks for your great work!

But I wonder how to properly add more metric collections for CUPTI.

I have already noticed the comments within nvbench/cupti_profiler.cuh. Can there be some further explanations? For example, where should I put the constructor of cupti_profiler and do instantiation?

Thank you!

gevtushenko commented 2 years ago

Hello, @4mod3!

The comments in nvbench/cupti_profiler.cuh were intended for nvbench developers rather than for nvbench users. That is, CUPTI facilities can't be used outside of nvbench. Currently, we don't expose a way to extend metrics set. If you list the metrics you are missing, we might introduce them in the nvbench API, just like:

void bench(nvbench::state &state)
{
  // Toggle supported metrics collection
  state.add_element_count(elements, "Elements");
  state.collect_dram_throughput();
  state.collect_l1_hit_rates();
  state.collect_l2_hit_rates();
  state.collect_loads_efficiency();
  state.collect_stores_efficiency();

  state.collect_new_metric() // ?
}
4mod3 commented 2 years ago

Thanks for your reply!

I actually want metric branch_efficiency, which is smsp__sass_average_branch_targets_threads_uniform.pct in Perfworks.

gevtushenko commented 2 years ago

I think it would be a good idea to add one. I'll create a PR soon.