ROCm / rocprofiler-compute

Advanced Profiling and Analytics for AMD Hardware
https://rocm.docs.amd.com/projects/omniperf/en/latest/
MIT License
135 stars 49 forks source link

In 2.x branch "--list-kernels" argument not recognized #322

Closed seanofthemillers closed 7 months ago

seanofthemillers commented 7 months ago

Describe the bug

The help describes an option to list the top 10 (or so) kernels over which we can run some statistics over. Attempting to use this option causes omniperf to exit out with an error. The argument is --list-kernels.

Development Environment:

To Reproduce After generating a profile using omniperf -n test ..., I have a workloads directory I want to analyze. My goal is to get an average metric over all kernels of a given name. From help we have:

> omniperf analyze -p workloads/test/MI200/ -h
usage:

omniperf analyze --path <workload_path> [analyze options]

...
  -k  [ ...], --kernel  [ ...]                  Specify kernel id(s) from --list-kernels for filtering.
...

However, if I try to get the list of kernels using the --list-kernels argument I get:

> omniperf analyze -p workloads/test/MI200/ --list-kernels
usage: omniperf [mode] [options]
tool: error: unrecognized arguments: --list-kernels

Expected behavior The --list-kernels should list the kernels available in the profile.

Screenshots

Additional context

IanBogle commented 7 months ago

Cole Ramos has pointed out, this argument has been renamed to --list-stats. The help text still references --list-kernels, so this should be updated.

coleramos425 commented 7 months ago

The help text still references --list-kernels, so this should be updated.

The help should no longer reference --list-kernels. We do however need to update indices on --list-stats

coleramos425 commented 7 months ago

I circled back to review output for --list-stats. Latest 2.0.0-RC1 shows indices on output. Please upgrade Omniperf version and re-open issue if there's any follow up questions.

$ omniperf analyze -p workloads/mix/MI200/ --list-stats

  ___                  _                  __ 
 / _ \ _ __ ___  _ __ (_)_ __   ___ _ __ / _|
| | | | '_ ` _ \| '_ \| | '_ \ / _ \ '__| |_ 
| |_| | | | | | | | | | | |_) |  __/ |  |  _|
 \___/|_| |_| |_|_| |_|_| .__/ \___|_|  |_|  
                        |_|                  

Analysis mode = cli
[analysis] deriving Omniperf metrics...

--------------------------------------------------------------------------------
Detected Kernels (sorted decending by duration)
╒═════╤═════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╕
│     │ Kernel_Name                                                                                                                         │
╞═════╪═════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╡
│   0 │ void benchmark_func<int, 256, 8u, 512u>(int, int*) [clone .kd]                                                                      │
├─────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│   1 │ void benchmark_func<HIP_vector_type<float, 2u>, 256, 8u, 512u>(HIP_vector_type<float, 2u>, HIP_vector_type<float, 2u>*) [clone .kd] │
├─────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│   2 │ void benchmark_func<double, 256, 8u, 512u>(double, double*) [clone .kd]                                                             │
├─────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│   3 │ void benchmark_func<int, 256, 8u, 256u>(int, int*) [clone .kd]                                                                      │
├─────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│   4 │ void benchmark_func<__half2, 256, 8u, 512u>(__half2, __half2*) [clone .kd]                                                          │
├─────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│   5 │ void benchmark_func<float, 256, 8u, 512u>(float, float*) [clone .kd]                                                                │
├─────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│   6 │ void benchmark_func<HIP_vector_type<float, 2u>, 256, 8u, 256u>(HIP_vector_type<float, 2u>, HIP_vector_type<float, 2u>*) [clone .kd] │
├─────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│   7 │ void benchmark_func<double, 256, 8u, 256u>(double, double*) [clone .kd]                                                             │
├─────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│   8 │ void benchmark_func<int, 256, 8u, 128u>(int, int*) [clone .kd]                                                                      │
├─────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│   9 │ void benchmark_func<__half2, 256, 8u, 256u>(__half2, __half2*) [clone .kd]                                                          │
├─────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│  10 │ void benchmark_func<float, 256, 8u, 256u>(float, float*) [clone .kd]                                                                │
├─────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
...