ROCm / rocprofiler-compute

Advanced Profiling and Analytics for AMD Hardware
https://rocm.docs.amd.com/projects/omniperf/en/latest/
MIT License
135 stars 49 forks source link

what is cur_sclk? #149

Closed hsadasiv closed 8 months ago

hsadasiv commented 1 year ago

Hello,

seems on gfx90a, cur_sclk on the web omniperf report is 800MHz whereas I can see sclk going to 1700MHz using rocm-smi (only when the program runs). PS: I did force the clk using "rocm-smi -d 0 --setperfdeterminism 1700" and made sure to run the process on gpuid: 0

feizheng10 commented 1 year ago

Thanks for reporting it. However, it might be a complicated question to answer: It all depends on the vbios version and rocm-smi version... I believe omniperf only retrieves cur_sclk from rocm-smi. On gfx90a, with latest vbios, you might not be able to set/fix the sclk. You might try: rocm-smi --setperflevel high

Alternatively, omniperf should allow having manual spec/config in.

hsadasiv commented 1 year ago

Thank You for your response. Does cur_sclk mean current system clock? if so, should Omniperf measure the clock when it goes high to 1700MHz? I can see rocm-smi catching it as soon as it goes up.

skyreflectedinmirrors commented 1 year ago

I kinda agree with this. I think ideally for a lot of the PoP metrics, we should be actively sampling the clock during execution of the kernels. (I mean, really ideally, the profiler would spit this sorta info out for us, or had clock locking like ... other profilers). But, we could fairly easily spin up a background process that samples clock rates over the lifetime of an app and reports back an average once all runs are completed?

Right now it looks like we run it on a cold GPU when gathering the specs?

https://github.com/AMDResearch/omniperf/blob/ed31b8a988b0fde6a5e11bc949417c82c6db1abc/src/utils/specs.py#L239

which is bound to give interesting answers if e.g., an app kicks the clock up significantly then spams FLOPS.

skyreflectedinmirrors commented 1 year ago

Aha, I'm wrong, we take $sclk from here:

https://github.com/AMDResearch/omniperf/blob/ed31b8a988b0fde6a5e11bc949417c82c6db1abc/src/omniperf_analyze/utils/parser.py#L492

i.e., not cur_sclk, which ends up being pulled from the max sclk here:

https://github.com/AMDResearch/omniperf/blob/ed31b8a988b0fde6a5e11bc949417c82c6db1abc/src/utils/specs.py#L113

from rocminfo.

This is ... better at least, because it's some theoretical maximum value of FLOPs and the like you could achieve, it just doesn't take the achieved clock for your kernel into account (I think we might actually be able to do this via GRBM_GUI_ACTIVE... maybe).

This suggests that if we ever do try to change from sclk to cur_sclk, we're in for a lot of reports of "PoP exceeds 100%" tho, cc: @coleramos425

coleramos425 commented 8 months ago

This is related to issue #245. Closing ticket and linking PR #246 which closes the issue.