ROCm / rocprofiler-compute

Advanced Profiling and Analytics for AMD Hardware
https://rocm.docs.amd.com/projects/omniperf/en/latest/
MIT License
135 stars 49 forks source link

Allow dumping of computed metrics per-kernel for further analysis #163

Open skyreflectedinmirrors opened 1 year ago

skyreflectedinmirrors commented 1 year ago

Is your feature request related to a problem? Please describe.

This is a proposed extension of the current --save-dfs mechanism. Essentially, today when using --save-dfs, Omniperf will compute the metrics, apply the min/max/avg, etc. aggregations, and then save that to a file.

In some cases, (e.g., plotting, further data analysis, etc.) it's more useful to be able to get each of the metrics per kernel launch, for instance, so that one could suck in the data-frame and do a kNN (or whatever) to look for correlations of kernel runtime w/ metrics "outliers", or to plot the metrics over multiple invocations, etc.

Describe the solution you'd like

Provide a mode to allow computation of the metrics on each dispatch, and save that result to a file. This should allow filtering of dispatches, and blocks, as normal, i.e., it only skips the min/max/avg computation steps.

Describe alternatives you've considered

One can walk through each dispatch and use --dispatch <X> to filter the dataframe to just that dispatch, and dump like 8000 different data-frams, but that is... quite slow, particularly with the current metric parsing overheads.

Additional context

None