Syllo / nvtop

GPU & Accelerator process monitoring for AMD, Apple, Huawei, Intel, NVIDIA and Qualcomm
Other
7.7k stars 282 forks source link

nvtop: hiding one GPU aborts with "We should not be processing a client id twice per update" #222

Closed nabijaczleweli closed 8 months ago

nabijaczleweli commented 12 months ago

Forwarding https://bugs.debian.org/1040892, nvtop/3.0.1-1.

I have a multi-GPU system:

$ lspci -s 0000:00:02.0
00:02.0 VGA compatible controller: Intel Corporation Xeon E3-1200 v3/4th Gen Core Processor Integrated Graphics Controller (rev 06)
$ lspci -s 0000:03:00.0
03:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Navi 24 [Radeon RX 6400/6500 XT/6500M] (rev c1)

and am presently using both, but debian nvtop isn't compiled with i915 support(? is it just not supported at all?), and I don't really care what happens there, and thus I've disabled the "Xeon" one in Setup, GPU Select>; this caused the following on the next update (and on every subsequent restart with the same config)

nvtop: ./src/extract_gpuinfo_amdgpu.c:946: parse_drm_fdinfo_amd: Assertion `!cache_entry_check && "We should not be processing a client id twice per update"' failed.
Aborted

config at https://bugs.debian.org/cgi-bin/bugreport.cgi?att=1;bug=1040892;filename=interface.ini;msg=5

Watching both GPUs does work.

klausman commented 10 months ago

I have the same problem, but with an RTX2070S (that I want to watch) and a CPU-builtin AMD Radeon (which I don't care about). nvtop v3.0.2 as shipped by Debian.

Lucas-Servi commented 8 months ago

Same issue here with a RTX3060 in UBUNTU 23.04 (nvtop used to work, but now I get this error)

towo2099 commented 8 months ago

Same problem here

towo@polaris:~$ inxi -Gxxx
Graphics:
  Device-1: NVIDIA TU106M [GeForce RTX 2060 Mobile] vendor: Tongfang Hongkong driver: nvidia
    v: 545.23.06 pcie: speed: 2.5 GT/s lanes: 8 ports: active: none empty: DP-1,DP-2,HDMI-A-1
    bus-ID: 01:00.0 chip-ID: 10de:1f15 class-ID: 0300
  Device-2: AMD Renoir vendor: Tongfang Hongkong driver: amdgpu v: kernel pcie: speed: 16 GT/s
    lanes: 16 ports: active: eDP-1 empty: none bus-ID: 04:00.0 chip-ID: 1002:1636 class-ID: 0300
  Device-3: Chicony HD Webcam type: USB driver: uvcvideo bus-ID: 3-4:4 chip-ID: 04f2:b642
    class-ID: 0e02
  Display: server: X.org v: 1.21.1.4 with: Xwayland v: 22.1.1 compositor: kwin_x11 driver: X:
    loaded: amdgpu,ati,nvidia unloaded: fbdev,modesetting,nouveau,vesa gpu: amdgpu tty: 256x47
  Monitor-1: eDP-1 model: BOE Display res: 1920x1080 dpi: 142 size: 344x194mm (13.5x7.6")
    diag: 395mm (15.5") modes: max: 1920x1080 min: 640x480
  Message: GL data unavailable in console. Try -G --display

Occurs only in nvidia mode, on-demand is working fine.

nvtop: ./src/extract_gpuinfo_amdgpu.c:964: parse_drm_fdinfo_amd: Assertion `!cache_entry_check && "We should not be processing a client id twice per update"' failed.

Syllo commented 8 months ago

Hello guys, I think that I found the issue, the AMD GPUs are added to fdinfo callback entries when being initialized. However they are not removed from this callback list when they are marked as hidden in the interface. The cache cleanup is only called when they are being watched, leading to the assertion when being hidden.

I'll come up with a patch to avoid needless fdinfo parsing/walk-through for hidden GPUs.

Syllo commented 8 months ago

Can you guys please try with the patch in #247 to see if my reasoning was right?

towo2099 commented 8 months ago

Sadly, it doesn't work, message remains and then chrash.

towo2099 commented 8 months ago

And now, on my intel Device it does not work anymore, similar message.

jackyyf commented 8 months ago

I think this is a similar issue to #196 which haven't been fully fixed and I've proposed #248 for a full fix, and it should fix this issue as well. Please test #248 to see if it could fix this issue.

towo2099 commented 8 months ago

Does it need #247 too? ~Building with only #248 it does not help.~

towo2099 commented 8 months ago

~Hm, building with both, no luck.~

towo2099 commented 8 months ago

Grr, little blind, had installed the wrong package, i build. It is working on my AMD+Nvidia system, the Intel-AMD system i can test tomorrow.

towo2099 commented 8 months ago

So, applied both MR and it is working on both of my systems.

AMD+Nvidia ==> Ok
Intel+Nvidia ==> Ok

Syllo commented 8 months ago

Thanks @jackyyf

Merging both automatically closed the issue. If anything persists feel free to re-open.

jackyyf commented 6 months ago

Hi @Syllo, sorry to trouble you, but could we have a new release for this? (and other patches maybe?) At least for Debian they only catch up with version bump release :)