Open ehartford opened 8 months ago
It's not surprising this isn't working given that the card is based on CDNA rather than GCN or RDNA. It's very well possible that kernel APIs are missing, and even if not, I doubt any dev off nvtop has a test card available to them. I personally would be inclined to close this issue as wontfix, but @Syllo would know better than I would if implementing support is a possibility or not.
If I had access to such card I could try and add support if there is a way to discover these GPUs. If it's not registering through the drm driver I'm not surprised it's not showing in nvtop
.
Same problem with 7900xtx
bymiller@byron-X570:~$ nvtop No GPU to monitor.
rocm-smi --showproductname
================================== End of ROCm SMI Log ===================================
bymiller@byron-X570:~$ sudo dmesg | grep drm
[ 3.645163] ACPI: bus type drm_connector registered
[ 4.921387] [drm] amdgpu kernel modesetting enabled.
[ 4.921389] [drm] amdgpu version: 6.3.6
[ 4.921390] [drm] OS DRM version: 6.5.0
[ 4.935928] [drm] initializing kernel modesetting (IP DISCOVERY 0x1002:0x744C 0x148C:0x2422 0xC8).
[ 4.935939] [drm] register mmio base: 0xFCC00000
[ 4.935940] [drm] register mmio size: 1048576
[ 4.940610] [drm] add ip block number 0
If I had access to such card I could try and add support if there is a way to discover these GPUs. If it's not registering through the drm driver I'm not surprised it's not showing in
nvtop
.
Hey I'm happy to give you access to my server
I'm experiencing this issue as well
This also happens for me on a Radeon RX 7900 XTX (as well as on a Radeon RX 7900 XT)
====================================== ROCm System Management Interface ======================================
================================================ Concise Info ================================================
Device [Model : Revision] Temp Power Partitions SCLK MCLK Fan Perf PwrCap VRAM% GPU%
Name (20 chars) (Edge) (Avg) (Mem, Compute)
==============================================================================================================
0 [0x471e : 0xc8] 30.0°C 69.0W N/A, N/A 1564Mhz 96Mhz 0% auto 303.0W 0% 56%
0x744c
==============================================================================================================
============================================ End of ROCm SMI Log =============================================
Added the ids for RX 7900 XTX / XT myself to src/amdgpu_ids.h
- it works now: https://github.com/Syllo/nvtop/pull/293
Regarding the MI100 card, I would guess the line would be:
{0x0C34, 0x01, "AMD Instinct MI100"},
Added the ids for RX 7900 XTX / XT myself to
src/amdgpu_ids.h
- it works now: #293Regarding the MI100 card, I would guess the line would be:
{0x0C34, 0x01, "AMD Instinct MI100"},
Really? What you are adding is the SubDeviceID, not the DeviceID, and nvtop
doesn't use the SubDeviceId.
And amdgpu_ids.h
is only used to get the name.
Really? What you are adding is the SubDeviceID, not the DeviceID, and
nvtop
doesn't use the SubDeviceId. Andamdgpu_ids.h
is only used to get the name.
Well, nvtop
went from "No GPU to monitor" to this:
This is the information on 7900 XTX from https://gitlab.freedesktop.org/mesa/drm/-/blob/main/data/amdgpu.ids
744C, C8, AMD Radeon RX 7900 XTX
I took 0x471e from rocm-smi
but this is the SubDeviceId? As you can see there are some missing info (N/A) so that may be because of this? I'll try with 0x744c as I somehow missed that.
Just weird that the OP couldn't get nvtop
to start with the MI100 as the DeviceID in amdgpu_ids.h
should be correct.
UPDATE: Changed to DeviceID 0x744C in amdgpu_ids.h
and the nvtop
output is identical to the screenshot above. Is more code needed to support the 7900 XTX / XT in nvtop
than adding the DeviceID?
@numas
Hmm, have you tried the unpatched build?
nvtop
gets the device name from libdrm_amdgpu, and uses amdgpu_ids.h
list when that fails.
The driver name, such as "AMD GPU", is used even if the list does not contain the device name.
So it seems strange that adding the device name to the list makes it recognize the device.
Thank you @Umio-Yasuno !
You are correct, the unpatched build works (though still with some N/A info) - I was comparing with the distro provided nvtop
which is old (1.2.2 in Ubuntu 22.04) and went straight to hacking instead of checking a clean build first...
Sorry for the noise, I will remove the pull request.
Hello I get an error message "No GPU to monitor" even though my cards are displaying in rocm-smi and lspci