Syllo / nvtop

GPU & Accelerator process monitoring for AMD, Apple, Huawei, Intel, NVIDIA and Qualcomm
Other
7.95k stars 291 forks source link

Lag when scrolling through applications scrolling through ncurses UI #139

Open Latrolage opened 2 years ago

Latrolage commented 2 years ago

E.g. When scrolling through the list of apps utilising the GPU with arrow keys, after 3 or so entries, it stutters, it stops scrolling even if you press down/up arrow keys and it blinks to what it's supposed to after around a second.

It happens in setup menu too. Edit: it happens with mouse scrolling too

Also, is there a way to separate/distinguish which application is running on which GPU?

Syllo commented 2 years ago

Hello, How many GPUs do you have on your system? Are they AMD ones?

Latrolage commented 2 years ago

Yes, 1 amd and 1 nvidia

Syllo commented 2 years ago

Could you please try something: Compile with

In both cases run nvtop and see if you can reproduce the slowdown with only one vendor active

Latrolage commented 2 years ago

It doesn't happen when just nvidia support is compiled. it happens when with amdgpu is compiled

zhuyifei1999 commented 2 years ago

Possibly scanning all the fds in /proc caused the lag? htop does this too so I wasn't concerned.

Could you do a $ time strace -c path/to/nvtop, wait a few few seconds, and exit nvtop. it should show something like:

$ time strace -c nvtop/build/src/nvtop 
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 64.39    0.380565           6     58393       259 newfstatat
 14.45    0.085425           6     13474      4269 openat
 10.44    0.061728          19      3205           getdents64
  6.28    0.037105           4      9201           close
  2.07    0.012231           3      3328         2 fcntl
  1.16    0.006847          12       566           read
  0.53    0.003108          37        82         1 ioctl
  0.29    0.001725           4       400           kcmp
  0.15    0.000871           4       214           write
  0.07    0.000396           8        49           poll
  0.05    0.000268           3        76        60 readlink
  0.04    0.000246         246         1           execve
  0.03    0.000201           4        47           mmap
  0.02    0.000097           3        31           rt_sigaction
  0.01    0.000082           4        18           lseek
  0.01    0.000049           4        12           mprotect
  0.01    0.000036           4         8           munmap
  0.00    0.000026           2        11           pread64
  0.00    0.000017           8         2         1 access
  0.00    0.000015           0        19           brk
  0.00    0.000003           3         1           getrandom
  0.00    0.000002           2         1           arch_prctl
  0.00    0.000002           2         1           set_tid_address
  0.00    0.000002           2         1           set_robust_list
  0.00    0.000002           2         1           prlimit64
  0.00    0.000002           2         1           rseq
------ ----------- ----------- --------- --------- ----------------
100.00    0.591051           6     89143      4592 total

real    0m10.809s
user    0m0.353s
sys 0m1.749s
zhuyifei1999 commented 2 years ago

Also, approximately how many processes are running and how many fds are open? i.e. what's the output of $ ls -d /proc/{1..9}*/fd/* | wc -l and $ ls /proc/{1..9}*/fd/ | wc -l (if you nvtop as root, the second command should also be run as root)?

Latrolage commented 2 years ago
$ time strace -c /usr/bin/nvtop
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 60.03    0.090864           3     24608         1 newfstatat
 14.18    0.021468           4      5342      1621 openat
 10.68    0.016164          13      1205           getdents64
  7.12    0.010783           2      3716           close
  3.04    0.004606           6       659           read
  2.35    0.003555           2      1342           fcntl
  0.86    0.001304           2       490           kcmp
  0.65    0.000989           3       276           write
  0.42    0.000640           3       164       132 readlink
  0.31    0.000464           7        61         1 ioctl
  0.10    0.000144           1        78           poll
  0.07    0.000111           2        40           lseek
  0.05    0.000074           2        28           mmap
  0.05    0.000072           1        47           rt_sigaction
  0.03    0.000047           2        22           brk
  0.02    0.000027           4         6           munmap
  0.02    0.000025           2        10           mprotect
  0.01    0.000014           7         2         2 connect
  0.01    0.000011           5         2           socket
  0.01    0.000008           4         2         1 access
  0.00    0.000000           0         4           pread64
  0.00    0.000000           0         1           execve
  0.00    0.000000           0         2         1 arch_prctl
  0.00    0.000000           0         1           set_tid_address
  0.00    0.000000           0         1           set_robust_list
  0.00    0.000000           0         1           prlimit64
  0.00    0.000000           0         1           getrandom
  0.00    0.000000           0         1           rseq
------ ----------- ----------- --------- --------- ----------------
100.00    0.151370           3     38112      1759 total

real    0m9.998s
user    0m0.047s
sys 0m0.519s
$ ls -d /proc/{1..9}*/fd/* | wc -l
[...redacted cannot open directory. permission denied stuff from ls]
4416
$ ls /proc/{1..9}*/fd/ | wc -l
[...redacted cannot open directory. permission denied stuff from ls]
4658

Video just so we are sure we are talking about the same issue:

https://user-images.githubusercontent.com/67372293/162766862-25c48c08-d099-408f-8c16-662ed06c709b.mp4

Where the highlight jumps is where the lag/stutter happens, I continue pressing up/down arrow and it catches up after it updates the screen

zhuyifei1999 commented 2 years ago
real  0m9.998s
user  0m0.047s
sys   0m0.519s

To make sure, during this run, was lag happening? Because (0.519 + 0.047) / 9.998 = 5.7% busy and that isn't high enough to cause major lag just from being busy I think

Latrolage commented 2 years ago

Yes, it had the same stutter as in the video

zhuyifei1999 commented 2 years ago

When it lags, is the entire screen laggy, or just nvtop? I'm wondering if it's nvtop itself being laggy, or nvtop doing something to the gpu causing the gpu to become laggy.

Latrolage commented 2 years ago

Just nvtop

zhuyifei1999 commented 2 years ago

I have no idea what's wrong then. I have 5443 fds opened by my user, 8248 fds total (sudo ls /proc/{1..9}*/fd/ | wc -l), and I'm experiencing no lag at all.

Let's see if @Syllo has a better idea. (I haven't read much of the UI code of nvtop)

Syllo commented 2 years ago

From what I see in the video, it freezes when gathering the information, every second or so (which is the default update rate). The interface freezes because everything runs in the same thread.

I do not see that behavior on my system either, even when I increase the load with more processes/fd than what @Latrolage reported. I can observe a very slight slowdown when strace is running.

This might be exacerbated on systems with many AMD GPUS, in which case we will go through /proc many times.

I will think of refactoring the /proc traversal at some point and maybe put the info gathering in its own thread, but I don't see how to avoid the fstats calls.

Disr0 commented 6 months ago

This issue is also seen in my case with amd gpu. Here is my gpu and it's drivers. This problem is also seen using btop.

26:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Ellesmere [Radeon RX 470/480/570/570X/580/580X/590] (rev ef)
    Subsystem: Micro-Star International Co., Ltd. [MSI] Radeon RX 580 ARMOR 8G OC
    Kernel driver in use: amdgpu
    Kernel modules: amdgpu
Umio-Yasuno commented 5 months ago

This issue is due to pcie_bw reads not being threaded.
pcie_bw causes a 1s sleep on each read, during which the nvtop thread stops.

https://github.com/Syllo/nvtop/issues/208

Umio-Yasuno commented 5 months ago

@Syllo

I suggest disabling pcie_bw read for amdgpu.
pcie_bw is not supported from Vega20.

 src/extract_gpuinfo_amdgpu.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/src/extract_gpuinfo_amdgpu.c b/src/extract_gpuinfo_amdgpu.c
index 39b20b9..3de1093 100644
--- a/src/extract_gpuinfo_amdgpu.c
+++ b/src/extract_gpuinfo_amdgpu.c
@@ -366,10 +366,12 @@ static void initDeviceSysfsPaths(struct gpu_info_amdgpu *gpu_info) {

   // Open the PCIe bandwidth file for dynamic info gathering
   gpu_info->PCIeBW = NULL;
+  /*
   int pcieBWFD = openat(sysfsFD, "pcie_bw", O_RDONLY);
   if (pcieBWFD) {
     gpu_info->PCIeBW = fdopen(pcieBWFD, "r");
   }
+  */

   // Open the power cap file for dynamic info gathering
   gpu_info->powerCap = NULL;