NVIDIA / nvidia-settings

NVIDIA driver control panel
http://www.nvidia.com/object/unix.html
GNU General Public License v2.0
296 stars 77 forks source link

Severe throttling on Thinkpad T14 Gen 1 with GeForce MX330 #67

Open jolars opened 3 years ago

jolars commented 3 years ago

I am experiencing severe throttling on my NVIDIA GPU. I have a Thinkpad T14 Gen1 with Geforce MX330. I have followed the guides to install the drivers (https://rpmfusion.org/Howto/NVIDIA) and to make my nvidia GPU primary (https://docs.fedoraproject.org/en-US/quick-docs/how-to-set-nvidia-as-primary-gpu-on-optimus-based-laptops/). I am on version 465.27 of the driver and have a Fedora 34 workstation setup.

I am seeing constant throttling during even idling. Right now, just idling, I am seeing:

nvidia-smi -q -d PERFORMANCE

==============NVSMI LOG==============

Timestamp                                 : Sat May  8 13:19:52 2021
Driver Version                            : 465.27
CUDA Version                              : 11.3

Attached GPUs                             : 1
GPU 00000000:2D:00.0
    Performance State                     : P0
    Clocks Throttle Reasons
        Idle                              : Not Active
        Applications Clocks Setting       : Not Active
        SW Power Cap                      : Not Active
        HW Slowdown                       : Not Active
            HW Thermal Slowdown           : Not Active
            HW Power Brake Slowdown       : Not Active
        Sync Boost                        : Not Active
        SW Thermal Slowdown               : Active
        Display Clock Setting             : Not Active

Where SW Thermal Slowdown is indicating that the GPU is throttled, despite being at 59 degrees Celsius. Running glxgears and checking clocks, I get:

nvidia-smi -q -d CLOCK

==============NVSMI LOG==============

Timestamp                                 : Sat May  8 13:23:43 2021
Driver Version                            : 465.27
CUDA Version                              : 11.3

Attached GPUs                             : 1
GPU 00000000:2D:00.0
    Clocks
        Graphics                          : 139 MHz
        SM                                : 139 MHz
        Memory                            : 405 MHz
        Video                             : 544 MHz
    Applications Clocks
        Graphics                          : N/A
        Memory                            : N/A
    Default Applications Clocks
        Graphics                          : N/A
        Memory                            : N/A
    Max Clocks
        Graphics                          : 1911 MHz
        SM                                : 1911 MHz
        Memory                            : 3504 MHz
        Video                             : 1708 MHz
    Max Customer Boost Clocks
        Graphics                          : N/A
    SM Clock Samples
        Duration                          : 18446744073709.55 sec
        Number of Samples                 : 100
        Max                               : 1531 MHz
        Min                               : 139 MHz
        Avg                               : 0 MHz
    Memory Clock Samples
        Duration                          : 18446744073709.55 sec
        Number of Samples                 : 100
        Max                               : 3504 MHz
        Min                               : 405 MHz
        Avg                               : 0 MHz
    Clock Policy
        Auto Boost                        : N/A
        Auto Boost Default                : N/A

So the GPU is clearly being heavily throttled.

My guess is that this is related to the following settings:

nvidia-smi -q -d TEMPERATURE

==============NVSMI LOG==============

Timestamp                                 : Sat May  8 13:25:04 2021
Driver Version                            : 465.27
CUDA Version                              : 11.3

Attached GPUs                             : 1
GPU 00000000:2D:00.0
    Temperature
        GPU Current Temp                  : 56 C
        GPU Shutdown Temp                 : 102 C
        GPU Slowdown Temp                 : 97 C
        GPU Max Operating Temp            : 57 C
        GPU Target Temperature            : N/A
        Memory Current Temp               : N/A
        Memory Max Operating Temp         : N/A

Interestingly, if I enable thermald with the --adaptive flag, I get this:

==============NVSMI LOG==============

Timestamp                                 : Sat May  8 13:29:56 2021
Driver Version                            : 465.27
CUDA Version                              : 11.3

Attached GPUs                             : 1
GPU 00000000:2D:00.0
    Temperature
        GPU Current Temp                  : 56 C
        GPU Shutdown Temp                 : 102 C
        GPU Slowdown Temp                 : 97 C
        GPU Max Operating Temp            : 75 C
        GPU Target Temperature            : N/A
        Memory Current Temp               : N/A
        Memory Max Operating Temp         : N/A

And the throttling goes away and performance is suddenly much improved.

So apparently thermald can change this setting, but I cannot seem to be able to do so manually since "GPUMaxOperatingTempThreshold" is a read-only variable:

nvidia-settings -a GPUMaxOperatingTempThreshold=80

ERROR: The attribute 'GPUMaxOperatingTempThreshold' specified in assignment 'GPUMaxOperatingTempThreshold=80' cannot be assigned (it is a read-only
       attribute).

I am now on Fedora 34 but I saw the exact same problem on Ubuntu 20.10.

I don't really know what's going on here, but it seems strange that I should have to run thermald just to escape this throttling problem (and then I still think that 75C is too low to be throttling on. To be honest, I don't really understand the interplay between GPU Slowdown Temp and GPU Max Operating Temp. It seems to me that they are synonymous.

Here's the full output from nvidia-smi:

Sat May  8 15:23:05 2021       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 465.27       Driver Version: 465.27       CUDA Version: 11.3     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:2D:00.0 Off |                  N/A |
| N/A   67C    P0    N/A /  N/A |    578MiB /  2002MiB |      7%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      2762      G   /usr/libexec/Xorg                 293MiB |
|    0   N/A  N/A      2953      G   /usr/bin/gnome-shell               88MiB |
|    0   N/A  N/A      4524      G   ...AAAAAAAAA= --shared-files      134MiB |
|    0   N/A  N/A      5395      G   ...e/Steam/ubuntu12_32/steam       18MiB |
|    0   N/A  N/A      5604      G   ./steamwebhelper                    1MiB |
|    0   N/A  N/A      6303      G   ...AAAAAAAAA= --shared-files        6MiB |
|    0   N/A  N/A      7422      G   anki                               27MiB |
|    0   N/A  N/A     21305      G   /usr/bin/gjs                        2MiB |
+-----------------------------------------------------------------------------+

I wasn't really sure whether to post this bug here or on the NVIDIA forums, so I've cross-posted it (https://forums.developer.nvidia.com/t/severe-throttling-on-thinkpad-t14-gen-1-with-geforce-mx330/177366).