I am seeing constant throttling during even idling. Right now, just idling, I am seeing:
nvidia-smi -q -d PERFORMANCE
==============NVSMI LOG==============
Timestamp : Sat May 8 13:19:52 2021
Driver Version : 465.27
CUDA Version : 11.3
Attached GPUs : 1
GPU 00000000:2D:00.0
Performance State : P0
Clocks Throttle Reasons
Idle : Not Active
Applications Clocks Setting : Not Active
SW Power Cap : Not Active
HW Slowdown : Not Active
HW Thermal Slowdown : Not Active
HW Power Brake Slowdown : Not Active
Sync Boost : Not Active
SW Thermal Slowdown : Active
Display Clock Setting : Not Active
Where SW Thermal Slowdown is indicating that the GPU is throttled, despite being at 59 degrees Celsius. Running glxgears and checking clocks, I get:
nvidia-smi -q -d CLOCK
==============NVSMI LOG==============
Timestamp : Sat May 8 13:23:43 2021
Driver Version : 465.27
CUDA Version : 11.3
Attached GPUs : 1
GPU 00000000:2D:00.0
Clocks
Graphics : 139 MHz
SM : 139 MHz
Memory : 405 MHz
Video : 544 MHz
Applications Clocks
Graphics : N/A
Memory : N/A
Default Applications Clocks
Graphics : N/A
Memory : N/A
Max Clocks
Graphics : 1911 MHz
SM : 1911 MHz
Memory : 3504 MHz
Video : 1708 MHz
Max Customer Boost Clocks
Graphics : N/A
SM Clock Samples
Duration : 18446744073709.55 sec
Number of Samples : 100
Max : 1531 MHz
Min : 139 MHz
Avg : 0 MHz
Memory Clock Samples
Duration : 18446744073709.55 sec
Number of Samples : 100
Max : 3504 MHz
Min : 405 MHz
Avg : 0 MHz
Clock Policy
Auto Boost : N/A
Auto Boost Default : N/A
So the GPU is clearly being heavily throttled.
My guess is that this is related to the following settings:
nvidia-smi -q -d TEMPERATURE
==============NVSMI LOG==============
Timestamp : Sat May 8 13:25:04 2021
Driver Version : 465.27
CUDA Version : 11.3
Attached GPUs : 1
GPU 00000000:2D:00.0
Temperature
GPU Current Temp : 56 C
GPU Shutdown Temp : 102 C
GPU Slowdown Temp : 97 C
GPU Max Operating Temp : 57 C
GPU Target Temperature : N/A
Memory Current Temp : N/A
Memory Max Operating Temp : N/A
Interestingly, if I enable thermald with the --adaptive flag, I get this:
==============NVSMI LOG==============
Timestamp : Sat May 8 13:29:56 2021
Driver Version : 465.27
CUDA Version : 11.3
Attached GPUs : 1
GPU 00000000:2D:00.0
Temperature
GPU Current Temp : 56 C
GPU Shutdown Temp : 102 C
GPU Slowdown Temp : 97 C
GPU Max Operating Temp : 75 C
GPU Target Temperature : N/A
Memory Current Temp : N/A
Memory Max Operating Temp : N/A
And the throttling goes away and performance is suddenly much improved.
So apparently thermald can change this setting, but I cannot seem to be able to do so manually since "GPUMaxOperatingTempThreshold" is a read-only variable:
nvidia-settings -a GPUMaxOperatingTempThreshold=80
ERROR: The attribute 'GPUMaxOperatingTempThreshold' specified in assignment 'GPUMaxOperatingTempThreshold=80' cannot be assigned (it is a read-only
attribute).
I am now on Fedora 34 but I saw the exact same problem on Ubuntu 20.10.
I don't really know what's going on here, but it seems strange that I should have to run thermald just to escape this throttling problem (and then I still think that 75C is too low to be throttling on. To be honest, I don't really understand the interplay between GPU Slowdown Temp and GPU Max Operating Temp. It seems to me that they are synonymous.
Here's the full output from nvidia-smi:
Sat May 8 15:23:05 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 465.27 Driver Version: 465.27 CUDA Version: 11.3 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... Off | 00000000:2D:00.0 Off | N/A |
| N/A 67C P0 N/A / N/A | 578MiB / 2002MiB | 7% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 2762 G /usr/libexec/Xorg 293MiB |
| 0 N/A N/A 2953 G /usr/bin/gnome-shell 88MiB |
| 0 N/A N/A 4524 G ...AAAAAAAAA= --shared-files 134MiB |
| 0 N/A N/A 5395 G ...e/Steam/ubuntu12_32/steam 18MiB |
| 0 N/A N/A 5604 G ./steamwebhelper 1MiB |
| 0 N/A N/A 6303 G ...AAAAAAAAA= --shared-files 6MiB |
| 0 N/A N/A 7422 G anki 27MiB |
| 0 N/A N/A 21305 G /usr/bin/gjs 2MiB |
+-----------------------------------------------------------------------------+
I am experiencing severe throttling on my NVIDIA GPU. I have a Thinkpad T14 Gen1 with Geforce MX330. I have followed the guides to install the drivers (https://rpmfusion.org/Howto/NVIDIA) and to make my nvidia GPU primary (https://docs.fedoraproject.org/en-US/quick-docs/how-to-set-nvidia-as-primary-gpu-on-optimus-based-laptops/). I am on version 465.27 of the driver and have a Fedora 34 workstation setup.
I am seeing constant throttling during even idling. Right now, just idling, I am seeing:
Where SW Thermal Slowdown is indicating that the GPU is throttled, despite being at 59 degrees Celsius. Running glxgears and checking clocks, I get:
So the GPU is clearly being heavily throttled.
My guess is that this is related to the following settings:
Interestingly, if I enable thermald with the
--adaptive flag
, I get this:And the throttling goes away and performance is suddenly much improved.
So apparently thermald can change this setting, but I cannot seem to be able to do so manually since "GPUMaxOperatingTempThreshold" is a read-only variable:
I am now on Fedora 34 but I saw the exact same problem on Ubuntu 20.10.
I don't really know what's going on here, but it seems strange that I should have to run thermald just to escape this throttling problem (and then I still think that 75C is too low to be throttling on. To be honest, I don't really understand the interplay between GPU Slowdown Temp and GPU Max Operating Temp. It seems to me that they are synonymous.
Here's the full output from
nvidia-smi
:I wasn't really sure whether to post this bug here or on the NVIDIA forums, so I've cross-posted it (https://forums.developer.nvidia.com/t/severe-throttling-on-thinkpad-t14-gen-1-with-geforce-mx330/177366).