ROCm / rocprofiler-compute

Advanced Profiling and Analytics for AMD Hardware
https://rocm.docs.amd.com/projects/omniperf/en/latest/
MIT License
135 stars 49 forks source link

Use MI300 chip_id instead of model to detect XCD count #448

Closed benrichard-amd closed 2 weeks ago

benrichard-amd commented 2 weeks ago

In a previous change we began using "MI300" for gpu_model instead of the full "MI300X_A0" or "MI300X_A1", etc.

The XCD detection code was receiving gpu_model and expecting the full name, causing the XCD count = 1 and several metrics to be off by a factor of 8 (e.g. VALU utilization, wavefront occupancy).

Passing chip_id instead of gpu_model fixes the issue.