Syllo / nvtop

GPU & Accelerator process monitoring for AMD, Apple, Huawei, Intel, NVIDIA and Qualcomm
Other
7.79k stars 287 forks source link

Missing support for reporting Intel GPU memory, power, fan and temperature #197

Open ich777 opened 1 year ago

ich777 commented 1 year ago

Hi, I'm trying to build NVTOP for Slackware but sadly enough it gives me this output after installing the compiled version:

This version of Nvtop is missing support for reporting Intel GPU memory, power,
fan and temperature

                            <Don't Show Again> <Ok>
         Press Enter to select, arrows ">" and "<" to switch options

This is the output from cmake:

cmake .. -DNVIDIA_SUPPORT=ON -DAMDGPU_SUPPORT=ON -DINTEL_SUPPORT=ON -DCMAKE_INSTALL_PREFIX=/usr
- The C compiler identification is GNU 11.2.0
-- The CXX compiler identification is GNU 11.2.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Setting build type to 'Release' as none was specified.
-- Looking for cbreak in /usr/lib64/libncursesw.so
-- Looking for cbreak in /usr/lib64/libncursesw.so - found
-- Found Curses: /usr/lib64/libncursesw.so  
-- Performing Test HAS_REALLOCARRAY
-- Performing Test HAS_REALLOCARRAY - Success
-- Found UDev: /usr/lib64/libudev.so (found version "243") 
-- Libudev stable: TRUE
-- Could NOT find Systemd (missing: SYSTEMD_LIBRARY SYSTEMD_INCLUDE_DIR) (found version "")
-- Found PkgConfig: /usr/bin/pkg-config (found version "0.29.2") 
-- Found Libdrm: /usr/lib64/libdrm.so (found version "2.4.109") 
-- Found libdrm; Enabling AMDGPU support
-- Performing Test compiler_has-Wall
-- Performing Test compiler_has-Wall - Success
-- Performing Test compiler_has-Wextra
-- Performing Test compiler_has-Wextra - Success
-- Performing Test compiler_has-Waddress
-- Performing Test compiler_has-Waddress - Success
-- Performing Test compiler_has-Waggressive-loop-optimizations
-- Performing Test compiler_has-Waggressive-loop-optimizations - Success
-- Performing Test compiler_has-Wbad-function-cast
-- Performing Test compiler_has-Wbad-function-cast - Success
-- Performing Test compiler_has-Wmissing-declarations
-- Performing Test compiler_has-Wmissing-declarations - Success
-- Performing Test compiler_has-Wmissing-parameter-type
-- Performing Test compiler_has-Wmissing-parameter-type - Success
-- Performing Test compiler_has-Wmissing-prototypes
-- Performing Test compiler_has-Wmissing-prototypes - Success
-- Performing Test compiler_has-Wnested-externs
-- Performing Test compiler_has-Wnested-externs - Success
-- Performing Test compiler_has-Wold-style-declaration
-- Performing Test compiler_has-Wold-style-declaration - Success
-- Performing Test compiler_has-Wold-style-definition
-- Performing Test compiler_has-Wold-style-definition - Success
-- Performing Test compiler_has-Wstrict-prototypes
-- Performing Test compiler_has-Wstrict-prototypes - Success
-- Performing Test compiler_has-Wpointer-sign
-- Performing Test compiler_has-Wpointer-sign - Success
-- Performing Test compiler_has-Wdouble-promotion
-- Performing Test compiler_has-Wdouble-promotion - Success
-- Performing Test compiler_has-Wuninitialized
-- Performing Test compiler_has-Wuninitialized - Success
-- Performing Test compiler_has-Winit-self
-- Performing Test compiler_has-Winit-self - Success
-- Performing Test compiler_has-Wstrict-aliasing
-- Performing Test compiler_has-Wstrict-aliasing - Success
-- Performing Test compiler_has-Wsuggest-attribute-const
-- Performing Test compiler_has-Wsuggest-attribute-const - Success
-- Performing Test compiler_has-Wtrampolines
-- Performing Test compiler_has-Wtrampolines - Success
-- Performing Test compiler_has-Wfloat-equal
-- Performing Test compiler_has-Wfloat-equal - Success
-- Performing Test compiler_has-Wshadow
-- Performing Test compiler_has-Wshadow - Success
-- Performing Test compiler_has-Wunsafe-loop-optimizations
-- Performing Test compiler_has-Wunsafe-loop-optimizations - Success
-- Performing Test compiler_has-Wfloat-conversion
-- Performing Test compiler_has-Wfloat-conversion - Success
-- Performing Test compiler_has-Wlogical-op
-- Performing Test compiler_has-Wlogical-op - Success
-- Performing Test compiler_has-Wnormalized
-- Performing Test compiler_has-Wnormalized - Success
-- Performing Test compiler_has-Wdisabled-optimization
-- Performing Test compiler_has-Wdisabled-optimization - Success
-- Performing Test compiler_has-Whsa
-- Performing Test compiler_has-Whsa - Success
-- Performing Test compiler_has-Wunused-result
-- Performing Test compiler_has-Wunused-result - Success
-- Performing Test compiler_has-Werror-implicit-function-declaration
-- Performing Test compiler_has-Werror-implicit-function-declaration - Success
-- Performing Test compiler_has-Wformat
-- Performing Test compiler_has-Wformat - Success
-- Performing Test compiler_has-Wformat-security
-- Performing Test compiler_has-Wformat-security - Success
-- Performing Test linker_has-Wl_-z_relro
-- Performing Test linker_has-Wl_-z_relro - Success
-- Could NOT find GTest (missing: GTEST_LIBRARY GTEST_INCLUDE_DIR GTEST_MAIN_LIBRARY) 
-- Configuring done
-- Generating done

I'm on Kernel version 6.1.12 Am I missing something obvious?

Cheers

Fijxu commented 1 year ago

Same error here, build from source GPU: Intel TigerLake-H GT1 [UHD Graphics] with modesetting driver

pbanj commented 1 year ago

try running it as sudo. that seems to work for me. but when launched without sudo it says it doesnt have support for it

ich777 commented 1 year ago

I'm already root so sudo wouldn't do much.

Are you also sure that you've not already have set it to not display the message?

I already get a output but the message is what made me ask. grafik

pepijndevos commented 1 year ago

I'm seeing this on the Arch package as well as on the AppImage so I doubt it's a build issue. Are Intel cards just not supported in general, or just particular models? I have an Intel Arc A770 for the record.

Syllo commented 1 year ago

Hello, Yes, these information were not exposed by the driver when I implemented the Intel support. I'll look at the state of the current Linux driver to see if the patches got mainlined and add support for that.

ich777 commented 1 year ago

@Syllo A little OT but I completely forgot to mention that I've created a plugin for Unraid for nvtop over here.

I hope that's okay for you. :) It was downloaded about 7300 times so far, so this means that about 7300 people are using it on Unraid.

K4ktus123 commented 1 year ago

I can confirm this on Arch with 6.1.24-1-lts kernel and modesetting driver on TigerLake-H GT1 iGPU. The only stat NVTOP can display is the clock rate, while intel_gpu_top displays other stats as well - see the attached screenshot. Running NVTOP with sudo yields the same results. 2023-04-15_16-18

Syllo commented 1 year ago

All right. While browsing the kernel code I uncovered two piece of info:

  1. Newest Intel GPUs have a "Graphics micro (μ) Controller (GuC)". If this GuC is active, you might not see some media related workload usage in nvtop since the reporting is not implemented/activated even in kernel in 6.2.
  2. Hardware monitoring info is only available for dedicated graphics cards and only exposes power (Voltage, Power, Energy, Current)

If someone with a discreet Intel GPU could dump what is under the hwmon folder under /sys/bus/pci/devices/<pci addr>/drm/card1/hwmon/ where "pci addr" can be retrieved with a lspci | grep VGA I can at least implement power draw for these.

pepijndevos commented 1 year ago
$ find /sys/bus/pci/devices/0000\:03\:00.0/drm/card0/device/hwmon/ -type f
/sys/bus/pci/devices/0000:03:00.0/drm/card0/device/hwmon/hwmon2/uevent
/sys/bus/pci/devices/0000:03:00.0/drm/card0/device/hwmon/hwmon2/power1_max_interval
/sys/bus/pci/devices/0000:03:00.0/drm/card0/device/hwmon/hwmon2/power1_max
/sys/bus/pci/devices/0000:03:00.0/drm/card0/device/hwmon/hwmon2/energy1_input
/sys/bus/pci/devices/0000:03:00.0/drm/card0/device/hwmon/hwmon2/in0_input
/sys/bus/pci/devices/0000:03:00.0/drm/card0/device/hwmon/hwmon2/power/runtime_active_time
/sys/bus/pci/devices/0000:03:00.0/drm/card0/device/hwmon/hwmon2/power/runtime_status
/sys/bus/pci/devices/0000:03:00.0/drm/card0/device/hwmon/hwmon2/power/autosuspend_delay_ms
/sys/bus/pci/devices/0000:03:00.0/drm/card0/device/hwmon/hwmon2/power/runtime_suspended_time
/sys/bus/pci/devices/0000:03:00.0/drm/card0/device/hwmon/hwmon2/power/control
/sys/bus/pci/devices/0000:03:00.0/drm/card0/device/hwmon/hwmon2/power1_rated_max
/sys/bus/pci/devices/0000:03:00.0/drm/card0/device/hwmon/hwmon2/name
K4zoku commented 1 year ago

I don't know if this help, I has used lsof -p <intel_gpu_top pid> too see which files are being access within intel_gpu_top, and see this image In that directory it has some files like this image

Edit: Content of some files image

pepijndevos commented 1 year ago

Hmmm for me intel_gpu_top also doesn't really show much interesting for my Arc A770 GPU though

intel-gpu-top: Intel Dg2 (Gen12) @ /dev/dri/card0
      0/   0 MHz; 100% RC6;        0 irqs/s

         ENGINES     BUSY                     MI_SEMA MI_WAIT
       Render/3D    0.00% |                 |      0%      0%
         Blitter    0.00% |                 |      0%      0%
           Video    0.00% |                 |      0%      0%
    VideoEnhance    0.00% |                 |      0%      0%
       [unknown]    0.00% |                 |      0%      0%
ich777 commented 1 year ago

Hmmm for me intel_gpu_top also doesn't really show much interesting for my Arc A770 GPU though

Because it seems nothing is using your GPU.

chealy commented 3 months ago

Of note, starting with the 6.8 kernel, Intel GPUs now expose memory usage via fdinfo. I've validated this with my embedded Intel GPU. With this in place, nvtop should be able to expose per-process GPU memory usage now.

cyear commented 13 hours ago

Of note, starting with the 6.8 kernel, Intel GPUs now expose memory usage via fdinfo. I've validated this with my embedded Intel GPU. With this in place, nvtop should be able to expose per-process GPU memory usage now.

May I ask how to view the memory information