Closed vstempen closed 3 weeks ago
Thanks for the change @vstempen !
Just FYI - all our changes go through internal gerrit and then get published to github. Github PRs are OK but might be less visible.
Merged internally, should make it up to develop branch in the next day. @bill-shuzhou-liu is asking: "is this only applied to cu, or also applied to sdma and vram?"
@dmitrii-galantsev Is this fix available in latest ROCm 6.1.1? Thanks!
merged in 677433b
@ppanchad-amd Not sure. Please get rocm-smi version with rocm-smi --version
and see if the commit is ahead of the one linked above.
Still see this error on rocm 6.2
@yx-lamini would you be able to provide more details regarding your system configuration so we can reproduce the issue? Thanks!
@yx-lamini would you be able to provide more details regarding your system configuration so we can reproduce the issue? Thanks!
Yes, of cuz. What do you need? I am running rocm-smi on a mi300 8GPU server with the vanilla rocm 6.2.0 runtime installed.
@yx-lamini I saw your comment here https://github.com/ROCm/ROCm/issues/2595. Is the problem you are experiencing related to that issue? (If so, I will close this PR and track the problem on the other issue). Thanks!
@yx-lamini I saw your comment here ROCm/ROCm#2595. Is the problem you are experiencing related to that issue? (If so, I will close this PR and track the problem on the other issue). Thanks!
Yes, that works. Sorry for spamming between multiple places.
On some systems [rocm-smi --showpids] reports get_compute_process_info_by_pid, Not supported on the given system [PID] [PROCESS NAME] 1 UNKNOWN UNKNOWN UNKNOWN
get_compute_process_info_by_pid fails because cu_occupancy debugfs method is not provided on some graphics cards and GFX revisions by design
Proposing a change to return success status when only cu_occupancy debugfs method is not found and provide cu_occupancy invalidation value to mark only this parameter as UNKNOWN