ROCm / rocminfo

ROCm Application for Reporting System Info
Other
35 stars 32 forks source link

rocminfo and rocm_agent_enumerator print inconsistent result #54

Closed zjin-lcf closed 2 years ago

zjin-lcf commented 2 years ago

I installed rocminfo from source. However, the results from rocminfo, and the python script, rocm_agent_enumerator are not consistent for a device:

rocm_agent_enumerator shows the agent is gfx902.

rocminfo shows the agent is gfx90c.

Thank you for your answer.

jlgreathouse commented 2 years ago

Could you show me the output of lspci -n | grep 1002?

I suspect the problem is somehow related to the fact that we incorrectly reported Renoir APUs as gfx902 in the Thunk for quite a whil e(https://github.com/RadeonOpenCompute/ROCT-Thunk-Interface/commit/6745ced5dd3ed850c2a0f449fbb839f1dfc4eeb3). However, if you're running an old Thunk, I would imagine that rocminfo would report the wrong thing (since it goes through ROCr and the Thunk to get this information), while rocm_agent_enumerator would report the right thing (since it checks the HSA topology directly).

What kernel version, Thunk version, and ROCr version are you running?

zjin-lcf commented 2 years ago

Since rocminfo reports the right thing, do you think that rocm_agent_enumerator needs to be updated ?

Thanks.

jlgreathouse commented 2 years ago

I'm still trying to debug thie issue, hopefully with your help. :)

Could you please show me he output of lspci -n | grep 1002? What Linux kernel version, Thunk version, and ROCr version are you running?

zjin-lcf commented 2 years ago

Sorry. I will close the issue. It is not reproducible. Thanks.