Is your feature request related to a problem? Please describe.
This feature is related to benchmarks that may not be able to be cut down to a smaller representative problem. Collecting counters in parallel would significantly cut down the profile time.
Describe the solution you'd like
An argument to omniperf profile that allows running several counter collections at a time across multiple identical GPUs
Describe alternatives you've considered
Additional context
Suggested by Shane Fogerty at the SNL hackathon
Is your feature request related to a problem? Please describe. This feature is related to benchmarks that may not be able to be cut down to a smaller representative problem. Collecting counters in parallel would significantly cut down the profile time.
Describe the solution you'd like An argument to
omniperf profile
that allows running several counter collections at a time across multiple identical GPUsDescribe alternatives you've considered
Additional context Suggested by Shane Fogerty at the SNL hackathon