Umio-Yasuno / amdgpu_top

Tool to display AMDGPU usage
MIT License
669 stars 14 forks source link

PRIME Reporting #90

Closed FireBurn closed 2 weeks ago

FireBurn commented 2 months ago

Is it possible to report powered down PRIME cards as off until something else starts using them?

cat /sys/bus/pci/devices/0000\:03\:00.0/power/runtime_status will show either active or suspended

amdgpu_top currently keeps the card active

DianaNites commented 2 months ago

As a casual observer whose looked into this before: as far as I know, the kernel api here is atrocious, you'll have to trial and error it to find out which sysfs files block and wake the card, which error if its asleep, which error but still wake it, which have legit values when asleep, and which return dummy values. when opening, reading, or just writing?

reading power_state won't wake it, any of the stats usually does, including how busy it is. there is no way to probe stats without waking the card. rom has less checking and errors. power/autosuspend_delay_ms for quicker feedback loop (ex /sys/class/drm/card1/device/power/autosuspend_delay_ms)

power_state, ACPI/PCI device power states, https://learn.microsoft.com/en-us/windows-hardware/drivers/kernel/device-sleeping-states, can be safely checked. easy to avoid waking a sleeping card

however, once on, there is no way to avoid keeping it awake except by not touching any files that would wake it up for at least power/autosuspend_delay_ms(plus or minus, account for scheduler and timer accuracy etc). This value is usually in the seconds.

In theory it would be possible to wait for the device to appear idle for some time based on activity, halt all reading for plus or minus autosuspend_delay_ms, poll power_state to see if its D3Cold. obvious problem of seconds-long gaps in activity, depending on system it could be minutes, and any activity will reset the timer, without any way of knowing.

i dont think its possible to reliably do automatically, through sysfs at least. user either kills application, opens again when cards asleep, or app has a manual way to truly halt activity(not just freeze the screen/output). assuming app is aware to not wake a sleeping one in the first place.

I never got around to using the mesa or drm APIs, so don't know if they do better here and look forward to finding out. I hope they have a way to ask "hey give me info if your awake, otherwise nothing" and i was just doing it wrong and/or missing something obvious.

Umio-Yasuno commented 2 months ago

Is it possible to report powered down PRIME cards as off until something else starts using them?

amdgpu_top does not currently support it.

amdgpu_top currently keeps the card active

If amdgpu_top is launched while the dGPU is active, amdgpu_top creates the device handle, so the dGPU remains active.

I have an idea for an optimization for a PRIME system (APU+dGPU laptop), however I do not have such a laptop.

FireBurn commented 2 months ago

Is it possible to report powered down PRIME cards as off until something else starts using them?

amdgpu_top does not currently support it.

amdgpu_top currently keeps the card active

If amdgpu_top is launched while the dGPU is active, amdgpu_top creates the device handle, so the dGPU remains active.

I have an idea for an optimization for a PRIME system (APU+dGPU laptop), however I do not have such a laptop.

If you need anything tested, just at me

Umio-Yasuno commented 2 months ago

amdgpu_top has changed to drop DeviceHandle if "amdgpu_top" is the only GPU process running.
However, I don't know if this will turn off the dGPU.

https://github.com/Umio-Yasuno/amdgpu_top/commit/ea2ade381baddaf6041303743f75802fac9d5ee7

I think it is necessary to check whether a display connector is connected to the dGPU.
Also, amdgpu_top has not yet implemented the function to detect dGPUs that are turned off at startup.

Umio-Yasuno commented 1 month ago

I found that hwmon and gpu_metrics should not be accessed in order for the dGPU to transition to D3hot on my desktop system.

https://github.com/Umio-Yasuno/amdgpu_top/commit/f9362416da2b1a8ead8735b458940bf0c9a7b83a

I need to know what the sysfs file a dGPU in D3cold state outputs...

DianaNites commented 1 month ago

@Umio-Yasuno

I need to know what the sysfs file a dGPU in D3cold state outputs...

/sys/class/drm/<cardN>/device/power_state, documented by Linux here

Umio-Yasuno commented 1 month ago

@DianaNites Will the device be visible in /sys/bus/pci/drivers/amdgpu/ even if it is in D3cold state?
Or do we need to scan /sys/bus/pci/devices/?
Also, does it work after pci_disable_device has been called for the device?

DianaNites commented 1 month ago

@Umio-Yasuno

Will the device be visible in /sys/bus/pci/drivers/amdgpu/ even if it is in D3cold state?

Yes, the device is always visible on the PCI bus, D3Cold is defined between ACPI and PCIe, and requires support by the device, all busses between it and the CPU, and support and cooperation with the OS.

Also, does it work after pci_disable_device has been called for the device?

if it disappears from the Linux sysfs view and needs a re-scan to re-appear, I assume not.

Umio-Yasuno commented 1 month ago

amdgpu_top has changed to drop DeviceHandle and stop reading hwmon, gpu_metrics if "amdgpu_top" is the only GPU process running.

I think it would be difficult to further optimize it for the hybrid system (APU+dGPU) without an actual device.

Umio-Yasuno commented 1 month ago

Okay, amdgpu_top can now start up in SMI mode without waking up a suspended device.

https://github.com/Umio-Yasuno/amdgpu_top/commit/7d1a8302624efb8c9380712b3e79a53ac352e8af

More work and time is needed to implement this in TUI, GUI, and JSON modes.

FireBurn commented 4 weeks ago

It looks good, thanks

Umio-Yasuno commented 2 weeks ago

I added support for suspended devices to TUI and GUI modes.

Thank you for your suggestion.