Describe the suggestion
Use the rocprofiler API interface instead of doing \
Justification
The idea here is that we can be way more selective about which kernels we want to profile. For instance, we could give the users the mode to (attempt) to not replay the application at all, by e.g., cycling through various sets of counters to collect for successive launches of the 'same' kernel. This lines up with some of the stuff we've talking about internally re: kernel selection / cutting down replays.
Implementation
Hook into rocprofiler API such that we can cycle through selection of counter sets per instance of the same kernel to reduce need for replay, probably an opt-in mode
Describe the suggestion Use the rocprofiler API interface instead of doing \
Justification The idea here is that we can be way more selective about which kernels we want to profile. For instance, we could give the users the mode to (attempt) to not replay the application at all, by e.g., cycling through various sets of counters to collect for successive launches of the 'same' kernel. This lines up with some of the stuff we've talking about internally re: kernel selection / cutting down replays.
Implementation
Originally posted by @arghdos in https://github.com/AMDResearch/omniperf/discussions/153#discussioncomment-6503094