ROCm / rocprofiler-compute

Advanced Profiling and Analytics for AMD Hardware
https://rocm.docs.amd.com/projects/omniperf/en/latest/
MIT License
135 stars 49 forks source link

Add a batch mode that parallelizes counter collection over multiple identical GPUs #323

Open IanBogle opened 7 months ago

IanBogle commented 7 months ago

Is your feature request related to a problem? Please describe. This feature is related to benchmarks that may not be able to be cut down to a smaller representative problem. Collecting counters in parallel would significantly cut down the profile time.

Describe the solution you'd like An argument to omniperf profile that allows running several counter collections at a time across multiple identical GPUs

Describe alternatives you've considered

Additional context Suggested by Shane Fogerty at the SNL hackathon