Open flukejones opened 1 year ago
This seems more like a bug - could you fill in the bug report form?
That said, I could also look into adding GPU filtering, yes. Curious how that might look like though - would filtering by PCI info seem too confusing?
Alternatively, I could filter by name + add options to disable any GPU activities for certain GPU names, in addition to more granular filtering for other widgets. Does the current dGPU show up by name in the temperatures tab? If you have a screenshot, that would be helpful.
filter by name + add options to disable any GPU activities for certain GPU names
I like the idea. It should probably be done by index; to avoid device initialization by nvml's device_by_index
while getting the name.
Alternatively a white list based approach could support uuid/pcie names pretty easily via device_by_pci_bus_id
and device_by_uuid
Edit: Short term build without the gpu feature flag. PR 1276 should allow disabling of the gpu via config until filtering is done. This was probably introduced around 0.7.0
Some/all AMD GPUs are also affected. I have an RX580 that doesn't drive any monitors, and reading the hwmons wakes it up and keeps it awake. Unfortunately, it seems the device/power_state
file is the only thing I can read without waking the GPU, so in my fan control script I had work around this by modeling the GPU's idle poweroff logic.
The model is an ON/WARM/OFF state machine, where ON reads sensors and utilization, and transitions to WARM if utilization is 0 for some time, and WARM reads no sensors or util but transitions to OFF if the power_state
file changes to D3hot
, or to ON if it's still in D0
after elapsed time exceeds a value greater than the GPU's idle power off timeout. OFF transitions to ON if power_state
shows D0
.
Theoretically you could also see D3cold
that saves even more power, but the motherboard has to support it somehow and mine seemingly doesn't.
Hmm... It seems that this should perhaps be fixed in the kernel. I have written a note to myself to report this to the hwmon mailing list/bug tracker.
bottom already actually does a fairly simple check with device/power_state
, and only grabbing further sensor data if it either did not exist, or was D0
/unknown
, so yeah I might need to make it a bit more sophisticated with checks.... that or my implementation is bugged. It's a bit frustrating too since I don't think I have any way to debug this at the moment.
If anyone can check, would be interested to see if a simple logic change in https://github.com/ClementTsang/bottom/pull/1355 helps with it.
@ClementTsang I've tried that branch, is it supposed to show Nvidia/GPU temps if it is already active? Currently it does not.
The change would hide any entry for any device that's asleep; if it turns back on though in theory it should show up again...
Mostly also just curious whether it stops the GPU from waking, or if there's more that I need to do in that part first.
Mostly also just curious whether it stops the GPU from waking, or if there's more that I need to do in that part first.
Seems like I don't.
Hm, so the GPU is still waking up?
Sorry mate. It looks like I had a brainfart.. The dgpu appears to not be waking.
Just merged #1355, could you see in main
if the output looks reasonable for you and doesn't wake up the dgpu? Thanks!
It doesn't wake it, but also does not show details if it is awake? It may be worth reading through this also https://gitlab.com/mission-center-devs/mission-center/-/issues/30#note_1697130114
Hmm... that's weird, thanks for the link. Also just curious, could you provide screenshots of what the temp table looks like on stable and on main
now? Thanks!
:facepalm: just realized that I never changed the sleep checks for nvidia GPUs... let me try looking at that too.
Checklist
Describe the feature request
I noticed in a recent update the sensors tab (on linux) gained the dGPU temperature. On hybrid systems this is an issue as it causes the dGPU to stay awake and drain battery.
I can't see any easy option to disable this one sensor.