Umio-Yasuno / amdgpu_top

Tool to display AMDGPU usage
MIT License
663 stars 14 forks source link

sunshine process not shown in fdinfo even though a stream is running #7

Closed gschintgen closed 1 year ago

gschintgen commented 1 year ago

Sunshine is an open source gamestreaming solution. As such it encodes the screen contents in realtime using hardware or software encoders. In my case I'm using sunshine's VA-API support (which is based on its bundled/inbuilt ffmpeg) to stream either h264 or h265. I've double-checked that a) a stream is running b) VA-API is used (as opposed to x264 software encoding). Yet, amdgpu_top fails to detect this encoder usage.

It's noteworthy that somewhow sunshine isn't showing up either when I issue the command lsof /dev/dri/renderD128. Currently my sunshine host is explicitely configured to use VA-API. This is confirmed by the current logs stating the following:

[2023:04:12:14:34:56]: Info: // Testing for available encoders, this may generate errors. You can safely ignore those errors. //
[2023:04:12:14:34:56]: Info: 
[2023:04:12:14:34:56]: Info: // Ignore any errors mentioned above, they are not relevant. //
[2023:04:12:14:34:56]: Info: 
[2023:04:12:14:34:56]: Info: Found encoder vaapi: [h264_vaapi, hevc_vaapi]

Otherwise x264 would also be listed. And sunshine's cpu usage is also in line with hardware encoding. It just seems "hidden" somewhow.

gschintgen commented 1 year ago

Forgot to specify some basics on my setup: Ubuntu 22.04, kernel oem-6.1 (ubuntu), mesa 23.0 (via kisak PPA), sunshine 19.1, RX 6650XT.

I've also now checked the open file descriptors of sunshine's process (sudo ls -l /proc/$sunshinePID/fd) and I noticed that there are quite some mentions of /dev/dri/card0 but none for /dev/dri/renderD128. Maybe that explains it and could be used for a fix?

gschintgen commented 1 year ago

Using lsof /dev/dri/card0 I can find sunshine among the processes using it, but only if I run the command as root. (sunshine itself is running with elevated privileges.) There are three other processes showing up in the list: systemd, systemd-login, gnome-shell. (The first two may be worth filtering out.)

Umio-Yasuno commented 1 year ago

Could you try this branch? https://github.com/Umio-Yasuno/amdgpu_top/tree/cardN

gschintgen commented 1 year ago

Thanks for your quick reaction! I'm not familiar with rust, cargo, etc. I tried anyway and got some errors while compiling.

root@tmp2204:~/amdgpu_top# cargo install --locked --path .
  Installing amdgpu_top v0.1.4 (/root/amdgpu_top)
    Updating crates.io index
warning: package `crossbeam-channel v0.5.7` in Cargo.lock is yanked in registry `crates-io`, consider running without --locked
   Compiling amdgpu_top v0.1.4 (/root/amdgpu_top)
error[E0432]: unresolved import `std::os::fd::IntoRawFd`
  --> src/main.rs:63:13                                                                                                                                              |
63 |         use std::os::fd::IntoRawFd;
   |             ^^^^^^^^^^^^^^^^^^^^^^ no `IntoRawFd` in `os::fd`

error[E0603]: module `fd` is private
  --> src/main.rs:63:22
   |
63 |         use std::os::fd::IntoRawFd;                                                                                                                             |                      ^^ private module                                                                                                                          |
note: the module `fd` is defined here

warning: unused doc comment
   --> src/stat/gpu_metrics.rs:140:9
    |
140 |           /// Only Aldebaran (MI200) supports it.
    |           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
141 | /         if let Some(hbm_temp) = self.metrics.get_temperature_hbm().and_then(|hbm_temp|
142 | |             (!hbm_temp.contains(&u16::MAX)).then_some(hbm_temp)
143 | |         ) {
144 | |             write!(self.text.buf, "HBM Temp (C) [")?;
...   |
149 | |             writeln!(self.text.buf, "]")?;
150 | |         }
    | |_________- rustdoc does not generate documentation for expressions
    |
    = note: `#[warn(unused_doc_comments)]` on by default
    = help: use `//` for a plain comment

error[E0599]: no method named `into_raw_fd` found for struct `File` in the current scope
  --> src/main.rs:67:30
   |
67 |         DeviceHandle::init(f.into_raw_fd()).unwrap()
   |                              ^^^^^^^^^^^ method not found in `File`
   |
   = help: items from traits can only be used if the trait is in scope
help: the following trait is implemented but not in scope; perhaps add a `use` for it:
   |
1  | use std::os::unix::io::IntoRawFd;

I'm compiling in a fairly minimal Ubuntu 22.04 container. It seems as if some dependency is missing. It's strange though that it complains about std::os::fd. That reads as if it were some very basic library? (But before that there were some screens full of successful compilations so I don't know.)

I tried adding a use directive as proposed by the compiler but that didn't help. I should definitely read up a bit on rust. Its compiler output seems rather helpful when compared to e.g. C++.

gschintgen commented 1 year ago

Nevermind. I just followed your general build instructions but forgot to switch to the proper branch! I'll try again.

gschintgen commented 1 year ago

Same errors though when compiling the cardN branch.

gschintgen commented 1 year ago

image

Fastest fix ever ;-)

(Somehow rust was broken in ubuntu-22.04. When I retried with a 23.04 container it worked just fine.)

Umio-Yasuno commented 1 year ago

Ok, Thanks. Also, you can sort fdinfo by MediaEngine usage with the 'M' key.

gschintgen commented 1 year ago

I'm just not sure about VRAM. Before adding those additional processes, the sum of fdinfo VRAM columns was less than total used VRAM, but now it's more. I suppose some memory is counted twice. That could be normal though. Sunshine's VRAM is probably correct. (I expected a bit less, but the order of magnitude seems right, there are some patches in development to reduce it.)

Sorting by 'M' is working fine too.

Umio-Yasuno commented 1 year ago

They may be sharing the context of the AMDGPU driver across processes. I added a commit to the cardN branch, please try it.

https://github.com/Umio-Yasuno/amdgpu_top/commit/78bb494cc272143387f52618970582548a4c0bf2

gschintgen commented 1 year ago

image Nice. There's only a minor discrepancy left. It seems strange though to have systemd use the VRAM and gnome-shell not, but I don't know anything about their internals.

Anyway this is a definite improvement. Great work.

Umio-Yasuno commented 1 year ago

Nice. There's only a minor discrepancy left. It seems strange though to have systemd use the VRAM and gnome-shell not, but I don't know anything about their internals.

If multiple processes share the context of AMDGPU driver, the first process (probably in PID order) is treated as using the VRAM in that context. In the future, amdgpu_top may exclude systemd processes for fdinfo.

Umio-Yasuno commented 1 year ago

Done. https://github.com/Umio-Yasuno/amdgpu_top/commit/47bd00efdfaf58e9db2d7eaaf2d83be6a104701d