ROCm / rocprofiler-compute

Advanced Profiling and Analytics for AMD Hardware
https://rocm.docs.amd.com/projects/omniperf/en/latest/
MIT License
135 stars 49 forks source link

Delimiters to profile blocks of application code #181

Open coleramos425 opened 1 year ago

coleramos425 commented 1 year ago

Describe the suggestion Allow breakpoint/delimiters to specify "blocks" for profiling in application source code

Justification This came up in the context of training ML models and enabling users to target specific stages in their ML training pipeline. In these multi-stage codes it would be helpful for users to understand performance and bottlenecks in different areas of execution

Implementation There's a few ways this could be done, but the first that comes to mind is by leveraging rocscope. A modified version of the rocomni plugin could be used to gather counters for these user-defined blocks.

See internal planning page for more info...

Additional Notes This would also lend itself nicely to an eventual VSCode extension

Originally posted by @coleramos425 in https://github.com/AMDResearch/omniperf/discussions/153#discussioncomment-6846892

jrmadsen commented 1 year ago

This would be easily supportable with working hipProfilerStart() and hipProfilerStop() functions. I’ll make sure those functions are actually handled properly in rocprofiler v2.