NVIDIA / open-gpu-kernel-modules

NVIDIA Linux open GPU kernel module source
Other
15.06k stars 1.25k forks source link

Suboptimal power management at idle #314

Open zaltysz opened 2 years ago

zaltysz commented 2 years ago

NVIDIA Open GPU Kernel Modules Version

515.57

Does this happen with the proprietary driver (of the same version) as well?

No

Operating System and Version

Gentoo

Kernel Release

5.18.8 (custom)

Hardware: GPU

NVIDIA GeForce RTX 3080

Describe the bug

Open driver has higher idle power usage than proprietary one. 44W vs 22W

nvidia-smi with proprietary driver reports: Clocks Graphics : 0 MHz SM : 0 MHz Memory : 405 MHz Video : 555 MHz Voltage Graphics : 0.000 mV

nvidia-smi with open driver reports: Clocks Graphics : 210 MHz SM : 210 MHz Memory : 405 MHz Video : 555 MHz Voltage Graphics : 750.000 mV

To Reproduce

I just have to switch from proprietary to open driver and run nvidia-smi to see the difference.

Bug Incidence

Always

nvidia-bug-report.log.gz

Can't share without sanitization and there is a lot to review.

More Info

I am using this card only for compute. It has connected display, but it isn't being used (excluded from Xorg, nvidia-smi reports Display Mode/Active as disabled), display is being driven by other card. Driver persistence is enabled.

vans163 commented 2 years ago

In my case no displays are connected to any cards.

I have a thread on NVIDIA forums with similar issue. https://forums.developer.nvidia.com/t/idle-power-usage-stuck-at-10-20watts-after-running-an-app/217520.

Cards should be using 4w in proper idle (one of my cards randomly achieves that).

mtijanic commented 2 years ago

Hi @zaltysz ,

Does this happen with the proprietary driver (of the same version) as well?

No

Per the open driver readme, power management is still not implemented in this codebase / the GSP driver model. The initial release is only targeting data center use-cases for production, and this is not considered a critical feature there.

We really appreciate you testing the new open drivers, but for the sake of your electricity bill maybe best to switch over to the proprietary one when the GPU needs to run at idle for longer periods of time. Or, just unload the driver and let the GPU shut down fully.

Power management (and everything else for full parity with the proprietary driver) will come in one of the future major releases. For now, tagging this issue as Feature Pending. Thanks for the report!

birdie-github commented 2 years ago

Could be a duplicate of #295

BlueGoliath commented 2 years ago

How is nvidia-smi reporting the voltage? NVML only supports voltage readings for S-class units.

Sorry for the irrelevant question, but this is the first time I've seen a consumer GPU report that.