Open Umio-Yasuno opened 1 year ago
Nvtop shows B/KiB/MiB depending on how much data is being transferred. The data is gathered from the pcie_bw interface and scaled accordingly https://github.com/torvalds/linux/blob/master/drivers/gpu/drm/amd/pm/amdgpu_pm.c#L1579
Umm, NVML returns the value in KiB/s, but the AMDGPU driver returns it in B/s (packet_count * max_payload_size[Byte]).
Does nvtop
convert KiB/s to B/s (for NVIDIA GPU) or B/s to KiB/s (for AMD GPU)?
P.S. nvtop
currently is not detecting devices correctly in APU+dGPU environments and therefore cannot be tested, sorry.
Sorry I did not get what you meant the first time. Indeed the code was missing a division by 1024 to get in the kilobyte range, thanks. I pushed 04721e38f9b87bc640f68332d49e6473ede45e9f to fix it.
Could you please elaborate on what is wrong with APU+dGPU? Are one, the other or both GPUs not found or missing info?
Sorry I did not get what you meant the first time. Indeed the code was missing a division by 1024 to get in the kilobyte range, thanks. I pushed 04721e3 to fix it.
Thanks.
Could you please elaborate on what is wrong with APU+dGPU? Are one, the other or both GPUs not found or missing info?
nvtop
detects both GPUs but uses the wrong index.
As a result, the processes on Device1 (RX 560) will be displayed as the processes on Device0 (APU).
https://github.com/Syllo/nvtop/issues/209
Fixed by https://github.com/Syllo/nvtop/commit/3e9ddef02d47a5aa0be1ab78d818284dd7c91cd1
nvtop
detects both GPUs but uses the wrong index. As a result, the processes on Device1 (RX 560) will be displayed as the processes on Device0 (APU).
But the pcie_bw
problem still remains.
PCIe RX/TX will always be 0 because maxPayloadSize (256)
is divided by 1024.
https://github.com/Syllo/nvtop/commit/04721e38f9b87bc640f68332d49e6473ede45e9f
- received *= maxPayloadSize;
- transmitted *= maxPayloadSize;
+ // Compute received/transmitter in KiB
+ received *= maxPayloadSize / 1024;
+ transmitted *= maxPayloadSize / 1024;
Also, the pcie_bw
sysfs causes a 1s sleep on each read, during which the nvtop
thread stops.
Probably, for multiple AMDGPUs that support pcie_bw
, the nvtop
threads will stop for that amount.
Oh my, I did not think hard enough about operator precedence in that case, thanks!
So is reading the file pcie_bw blocking when nvtop reads it faster than the driver refresh rate (1sec)?
I've been thinking about separating the data gathering and interface logic in two threads (and frankly should have done that from the start) but I have unfortunately little time to allocate to that right now.
I'm not sure about blocking, but pcie_bw
sysfs reads are synchronous, so the thread waits, and both user input and interface updates stop for 1s.
This makes nvtop
terribly difficult to use.
So is reading the file pcie_bw blocking when nvtop reads it faster than the driver refresh rate (1sec)?
I am not confident in safely using multithreading in C.
I think it would be reasonable to remove pcie_bw
sysfs support or allow pcie_bw
sysfs reading to be disabled from the configuration.
nvtop
calculates PCIe bandwidth usage based on KiB/s, but the correct value is B/s.rocm_smi_lib
usesnumber_of_received * max_packet_size (max_payload_size) / 1024.0 / 1024.0
ornumber_of_sent * max_packet_size (max_payload_size) / 1024.0 / 1024.0
to calculate PCIe bandwidth usage (MiB/s). https://github.com/RadeonOpenCompute/rocm_smi_lib/blob/master/python_smi_tools/rocm_smi.py#L1862-L1883Also,
pcie_bw
needs at least 1s to read a file because it usesmsleep(1000)
to count on the AMDGPU driver side. https://github.com/torvalds/linux/blob/master/drivers/gpu/drm/amd/amdgpu/vi.c#L1379-d, --delay
option ofnvtop
will not work ifpcie_bw
is supported.I think we should have a separate thread for
pcie_bw
if possible.