CFSworks / nvml_fix

A workaround for an annoying bug in nVidia's NVML library. Allows nvidia-smi to work once more!
98 stars 19 forks source link

NVIDIA-SMI still shows mostly N/A #35

Closed juliovicenzi closed 3 years ago

juliovicenzi commented 3 years ago

So, I successfully applied the fix on my GEFORCE 920M, but $nvidia-smi -a still mostly shows N/A. The main information that I wanted to query, power.draw is still not supported, but some other statistics still show using $nvidia-smi --query-gpu, namely utilization. Should this be supported after the fix or is it just unavailable for my GPU? I attached the outputs from nvidia-smi for both commands, before and after the fix.

smi-a-new.txt smi-a-old.txt output_new.txt output_old.txt

Thanks in advance.

tofurky commented 3 years ago

i haven't been able to find clear answers on which gpus are supposed to support power monitoring. it might be implementation specific. there is an area of non-volatile memory on the gpu called "inforom" that has the gpu firmware configuration in it. it might be that even if the gpu chip itself supports power monitoring, it could be disabled by the vendor based on the contents of the inforom. and/or that additional external circuitry is required which some vendors opt not to implement.

anyhow, nvml_fix is doing what it's supposed to (tricking libnvidia-ml into thinking you've got a quadro). sorry, i don't think i can assist further.

it'd be inexact, but many laptops allow you to see battery/power draw via the linux kernel (i.e. somewhere in /sys). you could check the consumption using something like 'powertop' when idle and when pushing the gpu and get an idea based on what power draw correlates to gpu utilization %.

btw, here is a more clear picture of the effect nvml_fix is having:

--- smi-a-old.txt   2020-11-26 16:10:46.594557431 -0500
+++ smi-a-new.txt   2020-11-26 16:10:44.097914265 -0500
@@ -1,18 +1,18 @@

 ==============NVSMI LOG==============

-Timestamp                           : Thu Nov 26 17:54:23 2020
+Timestamp                           : Thu Nov 26 17:57:31 2020
 Driver Version                      : 390.138

 Attached GPUs                       : 1
 GPU 00000000:03:00.0
     Product Name                    : GeForce 920M
-    Product Brand                   : GeForce
-    Display Mode                    : N/A
-    Display Active                  : N/A
+    Product Brand                   : Quadro
+    Display Mode                    : Disabled
+    Display Active                  : Disabled
     Persistence Mode                : Disabled
-    Accounting Mode                 : N/A
-    Accounting Mode Buffer Size     : N/A
+    Accounting Mode                 : Disabled
+    Accounting Mode Buffer Size     : 4000
     Driver Model
         Current                     : N/A
         Pending                     : N/A
@@ -20,8 +20,8 @@
     GPU UUID                        : GPU-66886553-a62a-fe2c-8a1a-b74232e0e80c
     Minor Number                    : 0
     VBIOS Version                   : 80.28.92.00.4D
-    MultiGPU Board                  : N/A
-    Board ID                        : N/A
+    MultiGPU Board                  : No
+    Board ID                        : 0x300
     GPU Part Number                 : N/A
     Inforom Version
         Image Version               : N/A
@@ -32,7 +32,7 @@
         Current                     : N/A
         Pending                     : N/A
     GPU Virtualization Mode
-        Virtualization mode         : N/A
+        Virtualization mode         : None
     PCI
         Bus                         : 0x03
         Device                      : 0x00
@@ -42,11 +42,11 @@
         Sub System Id               : 0xC770144D
         GPU Link Info
             PCIe Generation
-                Max                 : N/A
-                Current             : N/A
+                Max                 : 2
+                Current             : 2
             Link Width
-                Max                 : N/A
-                Current             : N/A
+                Max                 : 8x
+                Current             : 4x
         Bridge Chip
             Type                    : N/A
             Firmware                : N/A
@@ -55,25 +55,34 @@
         Rx Throughput               : N/A
     Fan Speed                       : N/A
     Performance State               : P0
-    Clocks Throttle Reasons         : N/A
+    Clocks Throttle Reasons
+        Idle                        : Not Active
+        Applications Clocks Setting : Not Active
+        SW Power Cap                : Not Active
+        HW Slowdown                 : Not Active
+            HW Thermal Slowdown     : N/A
+            HW Power Brake Slowdown : N/A
+        Sync Boost                  : Not Active
+        SW Thermal Slowdown         : Not Active
+        Display Clock Setting       : Not Active
     FB Memory Usage
         Total                       : 2004 MiB
-        Used                        : 264 MiB
-        Free                        : 1740 MiB
+        Used                        : 287 MiB
+        Free                        : 1717 MiB
     BAR1 Memory Usage
-        Total                       : N/A
-        Used                        : N/A
-        Free                        : N/A
+        Total                       : 256 MiB
+        Used                        : 4 MiB
+        Free                        : 252 MiB
     Compute Mode                    : Default
     Utilization
-        Gpu                         : N/A
-        Memory                      : N/A
-        Encoder                     : N/A
-        Decoder                     : N/A
+        Gpu                         : 5 %
+        Memory                      : 2 %
+        Encoder                     : 0 %
+        Decoder                     : 0 %
     Encoder Stats
-        Active Sessions             : N/A
-        Average FPS                 : N/A
-        Average Latency             : N/A
+        Active Sessions             : 0
+        Average FPS                 : 0
+        Average Latency             : 0
     Ecc Mode
         Current                     : N/A
         Pending                     : N/A
@@ -121,10 +130,10 @@
         Double Bit ECC              : N/A
         Pending                     : N/A
     Temperature
-        GPU Current Temp            : 51 C
-        GPU Shutdown Temp           : N/A
-        GPU Slowdown Temp           : N/A
-        GPU Max Operating Temp      : N/A
+        GPU Current Temp            : 50 C
+        GPU Shutdown Temp           : 102 C
+        GPU Slowdown Temp           : 97 C
+        GPU Max Operating Temp      : 93 C
         Memory Current Temp         : N/A
         Memory Max Operating Temp   : N/A
     Power Readings
@@ -136,21 +145,21 @@
         Min Power Limit             : N/A
         Max Power Limit             : N/A
     Clocks
-        Graphics                    : N/A
-        SM                          : N/A
-        Memory                      : N/A
-        Video                       : N/A
-    Applications Clocks
         Graphics                    : 954 MHz
+        SM                          : 954 MHz
         Memory                      : 1001 MHz
+        Video                       : 540 MHz
+    Applications Clocks
+        Graphics                    : N/A
+        Memory                      : N/A
     Default Applications Clocks
-        Graphics                    : 954 MHz
-        Memory                      : 1001 MHz
-    Max Clocks
         Graphics                    : N/A
-        SM                          : N/A
         Memory                      : N/A
-        Video                       : N/A
+    Max Clocks
+        Graphics                    : 954 MHz
+        SM                          : 954 MHz
+        Memory                      : 1001 MHz
+        Video                       : 540 MHz
     Max Customer Boost Clocks
         Graphics                    : N/A
     Clock Policy