Closed juliovicenzi closed 3 years ago
i haven't been able to find clear answers on which gpus are supposed to support power monitoring. it might be implementation specific. there is an area of non-volatile memory on the gpu called "inforom" that has the gpu firmware configuration in it. it might be that even if the gpu chip itself supports power monitoring, it could be disabled by the vendor based on the contents of the inforom. and/or that additional external circuitry is required which some vendors opt not to implement.
anyhow, nvml_fix is doing what it's supposed to (tricking libnvidia-ml into thinking you've got a quadro). sorry, i don't think i can assist further.
it'd be inexact, but many laptops allow you to see battery/power draw via the linux kernel (i.e. somewhere in /sys). you could check the consumption using something like 'powertop' when idle and when pushing the gpu and get an idea based on what power draw correlates to gpu utilization %.
btw, here is a more clear picture of the effect nvml_fix is having:
--- smi-a-old.txt 2020-11-26 16:10:46.594557431 -0500
+++ smi-a-new.txt 2020-11-26 16:10:44.097914265 -0500
@@ -1,18 +1,18 @@
==============NVSMI LOG==============
-Timestamp : Thu Nov 26 17:54:23 2020
+Timestamp : Thu Nov 26 17:57:31 2020
Driver Version : 390.138
Attached GPUs : 1
GPU 00000000:03:00.0
Product Name : GeForce 920M
- Product Brand : GeForce
- Display Mode : N/A
- Display Active : N/A
+ Product Brand : Quadro
+ Display Mode : Disabled
+ Display Active : Disabled
Persistence Mode : Disabled
- Accounting Mode : N/A
- Accounting Mode Buffer Size : N/A
+ Accounting Mode : Disabled
+ Accounting Mode Buffer Size : 4000
Driver Model
Current : N/A
Pending : N/A
@@ -20,8 +20,8 @@
GPU UUID : GPU-66886553-a62a-fe2c-8a1a-b74232e0e80c
Minor Number : 0
VBIOS Version : 80.28.92.00.4D
- MultiGPU Board : N/A
- Board ID : N/A
+ MultiGPU Board : No
+ Board ID : 0x300
GPU Part Number : N/A
Inforom Version
Image Version : N/A
@@ -32,7 +32,7 @@
Current : N/A
Pending : N/A
GPU Virtualization Mode
- Virtualization mode : N/A
+ Virtualization mode : None
PCI
Bus : 0x03
Device : 0x00
@@ -42,11 +42,11 @@
Sub System Id : 0xC770144D
GPU Link Info
PCIe Generation
- Max : N/A
- Current : N/A
+ Max : 2
+ Current : 2
Link Width
- Max : N/A
- Current : N/A
+ Max : 8x
+ Current : 4x
Bridge Chip
Type : N/A
Firmware : N/A
@@ -55,25 +55,34 @@
Rx Throughput : N/A
Fan Speed : N/A
Performance State : P0
- Clocks Throttle Reasons : N/A
+ Clocks Throttle Reasons
+ Idle : Not Active
+ Applications Clocks Setting : Not Active
+ SW Power Cap : Not Active
+ HW Slowdown : Not Active
+ HW Thermal Slowdown : N/A
+ HW Power Brake Slowdown : N/A
+ Sync Boost : Not Active
+ SW Thermal Slowdown : Not Active
+ Display Clock Setting : Not Active
FB Memory Usage
Total : 2004 MiB
- Used : 264 MiB
- Free : 1740 MiB
+ Used : 287 MiB
+ Free : 1717 MiB
BAR1 Memory Usage
- Total : N/A
- Used : N/A
- Free : N/A
+ Total : 256 MiB
+ Used : 4 MiB
+ Free : 252 MiB
Compute Mode : Default
Utilization
- Gpu : N/A
- Memory : N/A
- Encoder : N/A
- Decoder : N/A
+ Gpu : 5 %
+ Memory : 2 %
+ Encoder : 0 %
+ Decoder : 0 %
Encoder Stats
- Active Sessions : N/A
- Average FPS : N/A
- Average Latency : N/A
+ Active Sessions : 0
+ Average FPS : 0
+ Average Latency : 0
Ecc Mode
Current : N/A
Pending : N/A
@@ -121,10 +130,10 @@
Double Bit ECC : N/A
Pending : N/A
Temperature
- GPU Current Temp : 51 C
- GPU Shutdown Temp : N/A
- GPU Slowdown Temp : N/A
- GPU Max Operating Temp : N/A
+ GPU Current Temp : 50 C
+ GPU Shutdown Temp : 102 C
+ GPU Slowdown Temp : 97 C
+ GPU Max Operating Temp : 93 C
Memory Current Temp : N/A
Memory Max Operating Temp : N/A
Power Readings
@@ -136,21 +145,21 @@
Min Power Limit : N/A
Max Power Limit : N/A
Clocks
- Graphics : N/A
- SM : N/A
- Memory : N/A
- Video : N/A
- Applications Clocks
Graphics : 954 MHz
+ SM : 954 MHz
Memory : 1001 MHz
+ Video : 540 MHz
+ Applications Clocks
+ Graphics : N/A
+ Memory : N/A
Default Applications Clocks
- Graphics : 954 MHz
- Memory : 1001 MHz
- Max Clocks
Graphics : N/A
- SM : N/A
Memory : N/A
- Video : N/A
+ Max Clocks
+ Graphics : 954 MHz
+ SM : 954 MHz
+ Memory : 1001 MHz
+ Video : 540 MHz
Max Customer Boost Clocks
Graphics : N/A
Clock Policy
So, I successfully applied the fix on my GEFORCE 920M, but $nvidia-smi -a still mostly shows N/A. The main information that I wanted to query, power.draw is still not supported, but some other statistics still show using $nvidia-smi --query-gpu, namely utilization. Should this be supported after the fix or is it just unavailable for my GPU? I attached the outputs from nvidia-smi for both commands, before and after the fix.
smi-a-new.txt smi-a-old.txt output_new.txt output_old.txt
Thanks in advance.