aristocratos / btop

A monitor of resources
Apache License 2.0
21.09k stars 647 forks source link

[BUG] WARNING: Intel GPU: Failed to initialize PMU #942

Closed dontknowhy closed 1 week ago

dontknowhy commented 1 month ago

Describe the bug Intel updated their amazing/ugly intel_gpu_top and maybe they messed up btop's GPU detection. Now I can't see my GPU usage again :( Unless I always run it with sudo

To Reproduce Compile it and do some setcap magic.

Expected behavior It can detect my GPU usage without sudo.

Screenshots

[If applicable, add screenshots to help explain your problem.]

Info (please complete the following information):

Additional context

2024/09/30 (22:10:13) | ===> btop++ v.1.4.0
2024/09/30 (22:10:13) | DEBUG: Running in DEBUG mode!
2024/09/30 (22:10:13) | INFO: Logger set to DEBUG
2024/09/30 (22:10:13) | DEBUG: Using locale zh_CN.UTF-8
2024/09/30 (22:10:13) | INFO: Running on /dev/pts/2
2024/09/30 (22:10:14) | INFO: Failed to load libnvidia-ml.so, NVIDIA GPUs will not be detected: libnvidia-ml.so.1: 无法打开共享目标文件: 没有那个文件或目录
2024/09/30 (22:10:14) | INFO: Failed to load librocm_smi64.so, AMD GPUs will not be detected: librocm_smi64.so.6: 无法打开共享目标文件: 没有那个文件或目录2024/09/30 (22:10:14) | WARNING: Intel GPU: Failed to initialize PMU
2024/09/30 (22:10:14) | DEBUG: Shared::init() : Initialized.

The Chinese part means: Unable to open shared object: No such file or directory

dontknowhy commented 1 month ago

intel-gpu-tools version in Debian testing:

Package: intel-gpu-tools
Version: 1.29-1
Priority: optional
Section: x11
Maintainer: Debian X Strike Force <debian-x@lists.debian.org>
Installed-Size: 17.2 MB
Depends: python3, libc6 (>= 2.38), libcairo2 (>= 1.12.0), libdrm-amdgpu1 (>= 2.4.100), libdrm-nouveau2 (>= 2.4.75), libdrm2 (>= 2.4.82), libdw1t64 (>= 0.127), libglib2.0-0t64 (>= 2.36.0), libkmod2 (>= 5~), libpciaccess0 (>= 0.11.0), libpixman-1-0 (>= 0.15.14), libproc2-0 (>= 2:4.0.4), libudev1 (>= 183), libunwind8, libx11-6 (>= 2:1.4.99.1), libxext6, libxv1, zlib1g (>= 1:1.1.4)
Conflicts: xserver-xorg-video-intel (<< 2.9.1)
Homepage: https://01.org/linuxgraphics/
Tag: devel::debugger, hardware::video, implemented-in::c, role::program
Download-Size: 1,672 kB
APT-Manual-Installed: yes
APT-Sources: https://mirrors.ustc.edu.cn/debian testing/main amd64 Packages
Description: tools for debugging the Intel graphics driver
 intel-gpu-tools is a package of tools for debugging the Intel graphics driver,
 including a GPU hang dumping program, performance monitor, and performance
 microbenchmarks for regression testing the DRM.

It can read vram usage now. DISCLAIMER:I guess Intel has messed up something but I'm NOT sure.

RudolphSedlin commented 1 month ago

Might I ask if this is a discrete Arc Alchemist card like an A770? Seems like a common thread with another issue I saw.

dontknowhy commented 1 month ago

Might I ask if this is a discrete Arc Alchemist card like an A770? Seems like a common thread with another issue I saw.

Nope, This is an Iris Xe AlderLake-P. (why not AlderLake-:P) And I have to say GPU detection works randomly between different builds in the same commit.

dm17 commented 1 month ago

Might I ask if this is a discrete Arc Alchemist card like an A770? Seems like a common thread with another issue I saw.

Nope, This is an Iris Xe AlderLake-P. (why not AlderLake-:P) And I have to say GPU detection works randomly between different builds in the same commit.

Same here with Alder Lake Iris Xe. However, it does automatically show /dev/fb0 in the custom gpu name0 preference.

chikobara commented 1 month ago

same here with Iris Xe

pranaovs commented 1 week ago

Have you tried setting CAP_PERFMON capabilities to btop? Doing so worked for me. sudo setcap cap_perfmon=+ep /usr/bin/btop

~To persist with reboots, you can modify this systemd unit: https://github.com/luisbocanegra/plasma-intel-gpu-monitor#requirements~ https://github.com/aristocratos/btop/issues/942#issuecomment-2461990861

dontknowhy commented 1 week ago

To persist with reboots, you can modify this systemd unit: https://github.com/luisbocanegra/plasma-intel-gpu-monitor#requirements

Wait... setcap is not persist? I will test it on Friday :(

pranaovs commented 1 week ago

Turns out setcap does persist. I have not used it before and i found the plasma-intel-gpu-monitor repo. I just assumed it isn't persistent because the guide said so. My apologies.

Here's btop after a reboot. GPU monitoring works as intended.

image

dontknowhy commented 1 week ago

Turns out setcap does persist. I have not used it before and i found the plasma-intel-gpu-monitor repo. I just assumed it isn't persistent because the guide said so. My apologies.

Here's btop after a reboot. GPU monitoring works as intended.

image

But I uses Plasma btw. Thank you for discovering this extension. (?) It might actually do something about setcap, or it wouldn't be there.

In systems such as Ubuntu, performance events monitoring are disabled by default. For intel_gpu_top to work without root you need to set /proc/sys/kernel perf_event_paranoid to 2. Otherwise you may get an error like this:

I guess the problem is about the perf_event_paranoid.

dm17 commented 1 week ago

Have you tried setting CAP_PERFMON capabilities to btop? Doing so worked for me. sudo setcap cap_perfmon=+ep /usr/bin/btop

To persist with reboots, you can modify this systemd unit: https://github.com/luisbocanegra/plasma-intel-gpu-monitor#requirements

Thanks! That worked. I wonder why it shows the GPU at between 3 and 50 Mhz while nvtop shows it as being always at 300Mhz. I'm guessing it does not show GPU memory frequency, which nvtop shows (but in the case of this Intel GPU nvtop shows N/A, which I doubt).

dontknowhy commented 1 week ago

Turns out setcap does persist. I have not used it before and i found the plasma-intel-gpu-monitor repo. I just assumed it isn't persistent because the guide said so. My apologies.

Here's btop after a reboot. GPU monitoring works as intended.

image

bro it works :) Also the widget might looks useful to me.

aristocratos commented 1 week ago

@dontknowhy This is all mentioned in the README.md : https://github.com/aristocratos/btop?tab=readme-ov-file#prerequisites

INTEL

Requires a working C compiler if compiling from source - tested with GCC12 and Clang16.

Also requires the user to have permission to read from SYSFS.

Can be set with make setcap (preferred) or make setuid or by running btop with sudo or equivalent.

setcap is persistent just like setuid is. It's an extended attribute set on the binary. As long as you don't replace the binary the attribute remain.

dontknowhy commented 1 week ago

@dontknowhy This is all mentioned in the README.md : https://github.com/aristocratos/btop?tab=readme-ov-file#prerequisites

INTEL

Requires a working C compiler if compiling from source - tested with GCC12 and Clang16.

Also requires the user to have permission to read from SYSFS.

Can be set with make setcap (preferred) or make setuid or by running btop with sudo or equivalent.

setcap is persistent just like setuid is. It's an extended attribute set on the binary. As long as you don't replace the binary the attribute remain.

🤔 idk But the systemd stuff works and I really ran the setcap command. It works for me so I'd like to keep it.