ROCm / rocprofiler-compute

Advanced Profiling and Analytics for AMD Hardware
https://rocm.docs.amd.com/projects/omniperf/en/latest/
MIT License
135 stars 49 forks source link

Omniperf is not displaying VGPRs in results. #117

Closed skyreflectedinmirrors closed 1 year ago

skyreflectedinmirrors commented 1 year ago

Reported at a recent customer hackathon, but I've noticed it as well. Most likely the cause is that rocprof (correctly) now reports arch_vgprs and accum_vgprs (see: https://github.com/ROCm-Developer-Tools/rocprofiler/commit/5fd1c7e8e3fb35ccff9aa72baf7bbeb668f3711d). Omniperf likely just needs to read the new fields, and report them accordingly (VGPRs and AGPRs)

coleramos425 commented 1 year ago

Looks like this counter was updated in ROCm 5.3

Note that analysis of ROCm 5.2.x workload data will now yield missing values. Conditional counter-naming based on detected ROCm gets a bit tricky, eventually, we'll bump minimum ROCm up to 5.3

(base) [omniperf-pub]$ ./src/omniperf analyze -p workloads/vcopy_rocm_5_2_3/mi200/ -b 7.1

--------
Analyze
--------

--------------------------------------------------------------------------------
0. Top Stat
╒════╤══════════════════════════════════════════╤═════════╤═══════════╤════════════╤══════════════╤════════╕
│    │ KernelName                               │   Count │   Sum(ns) │   Mean(ns) │   Median(ns) │    Pct │
╞════╪══════════════════════════════════════════╪═════════╪═══════════╪════════════╪══════════════╪════════╡
│  0 │ vecCopy(double*, double*, double*, int,  │    1.00 │  26400.00 │   26400.00 │     26400.00 │ 100.00 │
│    │ int) [clone .kd]                         │         │           │            │              │        │
╘════╧══════════════════════════════════════════╧═════════╧═══════════╧════════════╧══════════════╧════════╛

--------------------------------------------------------------------------------
7. Wavefront
7.1 Wavefront Launch Stats
╒═════════╤═════════════════════╤═══════════╤═══════════╤═══════════╤════════════╕
│ Index   │ Metric              │ Avg       │ Min       │ Max       │ Unit       │
╞═════════╪═════════════════════╪═══════════╪═══════════╪═══════════╪════════════╡
│ 7.1.0   │ Grid Size           │ 1048576.0 │ 1048576.0 │ 1048576.0 │ Work items │
├─────────┼─────────────────────┼───────────┼───────────┼───────────┼────────────┤
│ 7.1.1   │ Workgroup Size      │ 256.0     │ 256.0     │ 256.0     │ Work items │
├─────────┼─────────────────────┼───────────┼───────────┼───────────┼────────────┤
│ 7.1.2   │ Total Wavefronts    │ 16384.0   │ 16384.0   │ 16384.0   │ Wavefronts │
├─────────┼─────────────────────┼───────────┼───────────┼───────────┼────────────┤
│ 7.1.3   │ Saved Wavefronts    │ 0.0       │ 0.0       │ 0.0       │ Wavefronts │
├─────────┼─────────────────────┼───────────┼───────────┼───────────┼────────────┤
│ 7.1.4   │ Restored Wavefronts │ 0.0       │ 0.0       │ 0.0       │ Wavefronts │
├─────────┼─────────────────────┼───────────┼───────────┼───────────┼────────────┤
│ 7.1.5   │ VGPRs               │           │           │           │ Registers  │
├─────────┼─────────────────────┼───────────┼───────────┼───────────┼────────────┤
│ 7.1.6   │ AGPRs               │           │           │           │ Registers  │
├─────────┼─────────────────────┼───────────┼───────────┼───────────┼────────────┤
│ 7.1.7   │ SGPRs               │ 24.0      │ 24.0      │ 24.0      │ Registers  │
├─────────┼─────────────────────┼───────────┼───────────┼───────────┼────────────┤
│ 7.1.8   │ LDS Allocation      │ 0.0       │ 0.0       │ 0.0       │ Bytes      │
├─────────┼─────────────────────┼───────────┼───────────┼───────────┼────────────┤
│ 7.1.9   │ Scratch Allocation  │ 0.0       │ 0.0       │ 0.0       │ Bytes      │
╘═════════╧═════════════════════╧═══════════╧═══════════╧═══════════╧════════════╛