OpenAIOS vGPU device plugin for Kubernetes is originated from the OpenAIOS project to virtualize GPU device memory, in order to allow applications to access larger memory space than its physical capacity. It is designed for ease of use of extended device memory for AI workloads.
The template below is mostly useful for bug reports and support questions. Feel free to remove anything which doesn't apply to you and add more information where it makes sense.
[root@gpu-pod /]# nvidia-smi
[4pdvGPU Warn(38:140500277225280:hook.c:396)]: can't find function nvmlDeviceGetMemoryInfo_v2 in libnvidia-ml.so.1
[4pdvGPU Warn(38:140500277225280:hook.c:396)]: can't find function nvmlDeviceSetTemperatureThreshold in libnvidia-ml.so.1
[4pdvGPU Warn(38:140500277225280:hook.c:396)]: can't find function nvmlVgpuInstanceGetGpuInstanceId in libnvidia-ml.so.1
[4pdvGPU Warn(38:140500277225280:hook.c:339)]: NVML error at line 339: 1
Failed to initialize NVML: Unknown Error
2. Steps to reproduce the issue
3. Information to attach (optional if deemed irrelevant)
Common error checking:
[ ] The output of nvidia-smi -a on your host
[root@xxx ~]# nvidia-smi -a
==============NVSMI LOG==============
Timestamp : Sat Aug 13 16:10:39 2022
Driver Version : 455.38
CUDA Version : 11.1
Attached GPUs : 4
GPU 00000000:02:00.0
Product Name : TITAN V
Product Brand : Titan
Display Mode : Disabled
Display Active : Disabled
Persistence Mode : Disabled
MIG Mode
Current : N/A
Pending : N/A
Accounting Mode : Disabled
Accounting Mode Buffer Size : 4000
Driver Model
Current : N/A
Pending : N/A
Serial Number : 0320618057831
GPU UUID : GPU-f5a3f95f-2685-cf01-2063-7bc624963433
Minor Number : 0
VBIOS Version : 88.00.41.00.18
MultiGPU Board : No
Board ID : 0x200
GPU Part Number : 900-1G500-2500-000
Inforom Version
Image Version : G001.0000.01.04
OEM Object : 1.1
ECC Object : N/A
Power Management Object : N/A
GPU Operation Mode
Current : N/A
Pending : N/A
GPU Virtualization Mode
Virtualization Mode : None
Host VGPU Mode : N/A
IBMNPU
Relaxed Ordering Mode : N/A
PCI
Bus : 0x02
Device : 0x00
Domain : 0x0000
Device Id : 0x1D8110DE
Bus Id : 00000000:02:00.0
Sub System Id : 0x121810DE
GPU Link Info
PCIe Generation
Max : 3
Current : 1
Link Width
Max : 16x
Current : 16x
Bridge Chip
Type : N/A
Firmware : N/A
Replays Since Reset : 0
Replay Number Rollovers : 0
Tx Throughput : 0 KB/s
Rx Throughput : 0 KB/s
Fan Speed : 28 %
Performance State : P8
Clocks Throttle Reasons
Idle : Active
Applications Clocks Setting : Not Active
SW Power Cap : Not Active
HW Slowdown : Not Active
HW Thermal Slowdown : Not Active
HW Power Brake Slowdown : Not Active
Sync Boost : Not Active
SW Thermal Slowdown : Not Active
Display Clock Setting : Not Active
FB Memory Usage
Total : 12066 MiB
Used : 0 MiB
Free : 12066 MiB
BAR1 Memory Usage
Total : 256 MiB
Used : 2 MiB
Free : 254 MiB
Compute Mode : Default
Utilization
Gpu : 0 %
Memory : 0 %
Encoder : 0 %
Decoder : 0 %
Encoder Stats
Active Sessions : 0
Average FPS : 0
Average Latency : 0
FBC Stats
Active Sessions : 0
Average FPS : 0
Average Latency : 0
Ecc Mode
Current : N/A
Pending : N/A
ECC Errors
Volatile
Single Bit
Device Memory : N/A
Register File : N/A
L1 Cache : N/A
L2 Cache : N/A
Texture Memory : N/A
Texture Shared : N/A
CBU : N/A
Total : N/A
Double Bit
Device Memory : N/A
Register File : N/A
L1 Cache : N/A
L2 Cache : N/A
Texture Memory : N/A
Texture Shared : N/A
CBU : N/A
Total : N/A
Aggregate
Single Bit
Device Memory : N/A
Register File : N/A
L1 Cache : N/A
L2 Cache : N/A
Texture Memory : N/A
Texture Shared : N/A
CBU : N/A
Total : N/A
Double Bit
Device Memory : N/A
Register File : N/A
L1 Cache : N/A
L2 Cache : N/A
Texture Memory : N/A
Texture Shared : N/A
CBU : N/A
Total : N/A
Retired Pages
Single Bit ECC : N/A
Double Bit ECC : N/A
Pending Page Blacklist : N/A
Remapped Rows : N/A
Temperature
GPU Current Temp : 39 C
GPU Shutdown Temp : 100 C
GPU Slowdown Temp : 97 C
GPU Max Operating Temp : 91 C
Memory Current Temp : 36 C
Memory Max Operating Temp : 95 C
Power Readings
Power Management : Supported
Power Draw : 27.28 W
Power Limit : 250.00 W
Default Power Limit : 250.00 W
Enforced Power Limit : 250.00 W
Min Power Limit : 100.00 W
Max Power Limit : 300.00 W
Clocks
Graphics : 135 MHz
SM : 135 MHz
Memory : 850 MHz
Video : 555 MHz
Applications Clocks
Graphics : 1200 MHz
Memory : 850 MHz
Default Applications Clocks
Graphics : 1200 MHz
Memory : 850 MHz
Max Clocks
Graphics : 1912 MHz
SM : 1912 MHz
Memory : 850 MHz
Video : 1717 MHz
Max Customer Boost Clocks
Graphics : N/A
Clock Policy
Auto Boost : N/A
Auto Boost Default : N/A
Processes : None
GPU 00000000:03:00.0
Product Name : TITAN V
Product Brand : Titan
Display Mode : Disabled
Display Active : Disabled
Persistence Mode : Disabled
MIG Mode
Current : N/A
Pending : N/A
Accounting Mode : Disabled
Accounting Mode Buffer Size : 4000
Driver Model
Current : N/A
Pending : N/A
Serial Number : 0320618058217
GPU UUID : GPU-3d5859fc-73e1-23d1-2e59-78c5e7049d61
Minor Number : 1
VBIOS Version : 88.00.41.00.18
MultiGPU Board : No
Board ID : 0x300
GPU Part Number : 900-1G500-2500-000
Inforom Version
Image Version : G001.0000.01.04
OEM Object : 1.1
ECC Object : N/A
Power Management Object : N/A
GPU Operation Mode
Current : N/A
Pending : N/A
GPU Virtualization Mode
Virtualization Mode : None
Host VGPU Mode : N/A
IBMNPU
Relaxed Ordering Mode : N/A
PCI
Bus : 0x03
Device : 0x00
Domain : 0x0000
Device Id : 0x1D8110DE
Bus Id : 00000000:03:00.0
Sub System Id : 0x121810DE
GPU Link Info
PCIe Generation
Max : 3
Current : 1
Link Width
Max : 16x
Current : 16x
Bridge Chip
Type : N/A
Firmware : N/A
Replays Since Reset : 0
Replay Number Rollovers : 0
Tx Throughput : 0 KB/s
Rx Throughput : 0 KB/s
Fan Speed : 31 %
Performance State : P8
Clocks Throttle Reasons
Idle : Active
Applications Clocks Setting : Not Active
SW Power Cap : Not Active
HW Slowdown : Not Active
HW Thermal Slowdown : Not Active
HW Power Brake Slowdown : Not Active
Sync Boost : Not Active
SW Thermal Slowdown : Not Active
Display Clock Setting : Not Active
FB Memory Usage
Total : 12066 MiB
Used : 0 MiB
Free : 12066 MiB
BAR1 Memory Usage
Total : 256 MiB
Used : 2 MiB
Free : 254 MiB
Compute Mode : Default
Utilization
Gpu : 0 %
Memory : 0 %
Encoder : 0 %
Decoder : 0 %
Encoder Stats
Active Sessions : 0
Average FPS : 0
Average Latency : 0
FBC Stats
Active Sessions : 0
Average FPS : 0
Average Latency : 0
Ecc Mode
Current : N/A
Pending : N/A
ECC Errors
Volatile
Single Bit
Device Memory : N/A
Register File : N/A
L1 Cache : N/A
L2 Cache : N/A
Texture Memory : N/A
Texture Shared : N/A
CBU : N/A
Total : N/A
Double Bit
Device Memory : N/A
Register File : N/A
L1 Cache : N/A
L2 Cache : N/A
Texture Memory : N/A
Texture Shared : N/A
CBU : N/A
Total : N/A
Aggregate
Single Bit
Device Memory : N/A
Register File : N/A
L1 Cache : N/A
L2 Cache : N/A
Texture Memory : N/A
Texture Shared : N/A
CBU : N/A
Total : N/A
Double Bit
Device Memory : N/A
Register File : N/A
L1 Cache : N/A
L2 Cache : N/A
Texture Memory : N/A
Texture Shared : N/A
CBU : N/A
Total : N/A
Retired Pages
Single Bit ECC : N/A
Double Bit ECC : N/A
Pending Page Blacklist : N/A
Remapped Rows : N/A
Temperature
GPU Current Temp : 45 C
GPU Shutdown Temp : 100 C
GPU Slowdown Temp : 97 C
GPU Max Operating Temp : 91 C
Memory Current Temp : 43 C
Memory Max Operating Temp : 95 C
Power Readings
Power Management : Supported
Power Draw : 30.67 W
Power Limit : 250.00 W
Default Power Limit : 250.00 W
Enforced Power Limit : 250.00 W
Min Power Limit : 100.00 W
Max Power Limit : 300.00 W
Clocks
Graphics : 135 MHz
SM : 135 MHz
Memory : 850 MHz
Video : 555 MHz
Applications Clocks
Graphics : 1200 MHz
Memory : 850 MHz
Default Applications Clocks
Graphics : 1200 MHz
Memory : 850 MHz
Max Clocks
Graphics : 1912 MHz
SM : 1912 MHz
Memory : 850 MHz
Video : 1717 MHz
Max Customer Boost Clocks
Graphics : N/A
Clock Policy
Auto Boost : N/A
Auto Boost Default : N/A
Processes : None
GPU 00000000:82:00.0
Product Name : TITAN V
Product Brand : Titan
Display Mode : Disabled
Display Active : Disabled
Persistence Mode : Disabled
MIG Mode
Current : N/A
Pending : N/A
Accounting Mode : Disabled
Accounting Mode Buffer Size : 4000
Driver Model
Current : N/A
Pending : N/A
Serial Number : 0324917182924
GPU UUID : GPU-e2336a65-b527-8ba6-c005-209ebc071c78
Minor Number : 2
VBIOS Version : 88.00.36.00.01
MultiGPU Board : No
Board ID : 0x8200
GPU Part Number : 900-1G500-2500-000
Inforom Version
Image Version : G001.0000.01.04
OEM Object : 1.1
ECC Object : N/A
Power Management Object : N/A
GPU Operation Mode
Current : N/A
Pending : N/A
GPU Virtualization Mode
Virtualization Mode : None
Host VGPU Mode : N/A
IBMNPU
Relaxed Ordering Mode : N/A
PCI
Bus : 0x82
Device : 0x00
Domain : 0x0000
Device Id : 0x1D8110DE
Bus Id : 00000000:82:00.0
Sub System Id : 0x121810DE
GPU Link Info
PCIe Generation
Max : 3
Current : 1
Link Width
Max : 16x
Current : 16x
Bridge Chip
Type : N/A
Firmware : N/A
Replays Since Reset : 0
Replay Number Rollovers : 0
Tx Throughput : 0 KB/s
Rx Throughput : 0 KB/s
Fan Speed : 28 %
Performance State : P8
Clocks Throttle Reasons
Idle : Active
Applications Clocks Setting : Not Active
SW Power Cap : Not Active
HW Slowdown : Not Active
HW Thermal Slowdown : Not Active
HW Power Brake Slowdown : Not Active
Sync Boost : Not Active
SW Thermal Slowdown : Not Active
Display Clock Setting : Not Active
FB Memory Usage
Total : 12066 MiB
Used : 0 MiB
Free : 12066 MiB
BAR1 Memory Usage
Total : 256 MiB
Used : 2 MiB
Free : 254 MiB
Compute Mode : Default
Utilization
Gpu : 0 %
Memory : 0 %
Encoder : 0 %
Decoder : 0 %
Encoder Stats
Active Sessions : 0
Average FPS : 0
Average Latency : 0
FBC Stats
Active Sessions : 0
Average FPS : 0
Average Latency : 0
Ecc Mode
Current : N/A
Pending : N/A
ECC Errors
Volatile
Single Bit
Device Memory : N/A
Register File : N/A
L1 Cache : N/A
L2 Cache : N/A
Texture Memory : N/A
Texture Shared : N/A
CBU : N/A
Total : N/A
Double Bit
Device Memory : N/A
Register File : N/A
L1 Cache : N/A
L2 Cache : N/A
Texture Memory : N/A
Texture Shared : N/A
CBU : N/A
Total : N/A
Aggregate
Single Bit
Device Memory : N/A
Register File : N/A
L1 Cache : N/A
L2 Cache : N/A
Texture Memory : N/A
Texture Shared : N/A
CBU : N/A
Total : N/A
Double Bit
Device Memory : N/A
Register File : N/A
L1 Cache : N/A
L2 Cache : N/A
Texture Memory : N/A
Texture Shared : N/A
CBU : N/A
Total : N/A
Retired Pages
Single Bit ECC : N/A
Double Bit ECC : N/A
Pending Page Blacklist : N/A
Remapped Rows : N/A
Temperature
GPU Current Temp : 42 C
GPU Shutdown Temp : 100 C
GPU Slowdown Temp : 97 C
GPU Max Operating Temp : 91 C
Memory Current Temp : 38 C
Memory Max Operating Temp : 95 C
Power Readings
Power Management : Supported
Power Draw : 27.65 W
Power Limit : 250.00 W
Default Power Limit : 250.00 W
Enforced Power Limit : 250.00 W
Min Power Limit : 100.00 W
Max Power Limit : 300.00 W
Clocks
Graphics : 135 MHz
SM : 135 MHz
Memory : 850 MHz
Video : 555 MHz
Applications Clocks
Graphics : 1200 MHz
Memory : 850 MHz
Default Applications Clocks
Graphics : 1200 MHz
Memory : 850 MHz
Max Clocks
Graphics : 1912 MHz
SM : 1912 MHz
Memory : 850 MHz
Video : 1717 MHz
Max Customer Boost Clocks
Graphics : N/A
Clock Policy
Auto Boost : N/A
Auto Boost Default : N/A
Processes : None
GPU 00000000:83:00.0
Product Name : TITAN V
Product Brand : Titan
Display Mode : Disabled
Display Active : Disabled
Persistence Mode : Disabled
MIG Mode
Current : N/A
Pending : N/A
Accounting Mode : Disabled
Accounting Mode Buffer Size : 4000
Driver Model
Current : N/A
Pending : N/A
Serial Number : 0320618057916
GPU UUID : GPU-5908bbc9-ddab-ebe7-5624-446b3fc15348
Minor Number : 3
VBIOS Version : 88.00.41.00.18
MultiGPU Board : No
Board ID : 0x8300
GPU Part Number : 900-1G500-2500-000
Inforom Version
Image Version : G001.0000.01.04
OEM Object : 1.1
ECC Object : N/A
Power Management Object : N/A
GPU Operation Mode
Current : N/A
Pending : N/A
GPU Virtualization Mode
Virtualization Mode : None
Host VGPU Mode : N/A
IBMNPU
Relaxed Ordering Mode : N/A
PCI
Bus : 0x83
Device : 0x00
Domain : 0x0000
Device Id : 0x1D8110DE
Bus Id : 00000000:83:00.0
Sub System Id : 0x121810DE
GPU Link Info
PCIe Generation
Max : 3
Current : 1
Link Width
Max : 16x
Current : 16x
Bridge Chip
Type : N/A
Firmware : N/A
Replays Since Reset : 0
Replay Number Rollovers : 0
Tx Throughput : 0 KB/s
Rx Throughput : 0 KB/s
Fan Speed : 28 %
Performance State : P8
Clocks Throttle Reasons
Idle : Active
Applications Clocks Setting : Not Active
SW Power Cap : Not Active
HW Slowdown : Not Active
HW Thermal Slowdown : Not Active
HW Power Brake Slowdown : Not Active
Sync Boost : Not Active
SW Thermal Slowdown : Not Active
Display Clock Setting : Not Active
FB Memory Usage
Total : 12066 MiB
Used : 0 MiB
Free : 12066 MiB
BAR1 Memory Usage
Total : 256 MiB
Used : 2 MiB
Free : 254 MiB
Compute Mode : Default
Utilization
Gpu : 0 %
Memory : 0 %
Encoder : 0 %
Decoder : 0 %
Encoder Stats
Active Sessions : 0
Average FPS : 0
Average Latency : 0
FBC Stats
Active Sessions : 0
Average FPS : 0
Average Latency : 0
Ecc Mode
Current : N/A
Pending : N/A
ECC Errors
Volatile
Single Bit
Device Memory : N/A
Register File : N/A
L1 Cache : N/A
L2 Cache : N/A
Texture Memory : N/A
Texture Shared : N/A
CBU : N/A
Total : N/A
Double Bit
Device Memory : N/A
Register File : N/A
L1 Cache : N/A
L2 Cache : N/A
Texture Memory : N/A
Texture Shared : N/A
CBU : N/A
Total : N/A
Aggregate
Single Bit
Device Memory : N/A
Register File : N/A
L1 Cache : N/A
L2 Cache : N/A
Texture Memory : N/A
Texture Shared : N/A
CBU : N/A
Total : N/A
Double Bit
Device Memory : N/A
Register File : N/A
L1 Cache : N/A
L2 Cache : N/A
Texture Memory : N/A
Texture Shared : N/A
CBU : N/A
Total : N/A
Retired Pages
Single Bit ECC : N/A
Double Bit ECC : N/A
Pending Page Blacklist : N/A
Remapped Rows : N/A
Temperature
GPU Current Temp : 41 C
GPU Shutdown Temp : 100 C
GPU Slowdown Temp : 97 C
GPU Max Operating Temp : 91 C
Memory Current Temp : 40 C
Memory Max Operating Temp : 95 C
Power Readings
Power Management : Supported
Power Draw : 25.91 W
Power Limit : 250.00 W
Default Power Limit : 250.00 W
Enforced Power Limit : 250.00 W
Min Power Limit : 100.00 W
Max Power Limit : 300.00 W
Clocks
Graphics : 135 MHz
SM : 135 MHz
Memory : 850 MHz
Video : 555 MHz
Applications Clocks
Graphics : 1200 MHz
Memory : 850 MHz
Default Applications Clocks
Graphics : 1200 MHz
Memory : 850 MHz
Max Clocks
Graphics : 1912 MHz
SM : 1912 MHz
Memory : 850 MHz
Video : 1717 MHz
Max Customer Boost Clocks
Graphics : N/A
Clock Policy
Auto Boost : N/A
Auto Boost Default : N/A
Processes : None
- [ ] Your docker configuration file (e.g: `/etc/docker/daemon.json`)
2022/08/13 07:41:28 Starting FS watcher.
2022/08/13 07:41:28 Starting OS watcher.
2022/08/13 07:41:28 Retreiving plugins.
2022/08/13 07:41:28 migstrategy= none
2022/08/13 07:41:28 uuid= GPU-f5a3f95f-2685-cf01-2063-7bc624963433
2022/08/13 07:41:28 uuid= GPU-3d5859fc-73e1-23d1-2e59-78c5e7049d61
2022/08/13 07:41:28 uuid= GPU-e2336a65-b527-8ba6-c005-209ebc071c78
2022/08/13 07:41:28 uuid= GPU-5908bbc9-ddab-ebe7-5624-446b3fc15348
2022/08/13 07:41:28 Starting GRPC server for 'nvidia.com/gpu'
2022/08/13 07:41:28 Starting to serve 'nvidia.com/gpu' on /var/lib/kubelet/device-plugins/nvidia-gpu.sock
2022/08/13 07:41:28 Registered device plugin for 'nvidia.com/gpu' with Kubelet
- [ ] The kubelet logs on the node (e.g: `sudo journalctl -r -u kubelet`)
Additional information that might help better understand your environment and reproduce the bug:
- [ ] Docker version from `docker version`
Client: Docker Engine - Community
Version: 19.03.3
API version: 1.40
Go version: go1.12.10
Git commit: a872fc2f86
Built: Tue Oct 8 00:58:10 2019
OS/Arch: linux/amd64
Experimental: false
The template below is mostly useful for bug reports and support questions. Feel free to remove anything which doesn't apply to you and add more information where it makes sense.
1. Issue or feature description
create pod
run nvidia-smi err:
2. Steps to reproduce the issue
3. Information to attach (optional if deemed irrelevant)
Common error checking:
nvidia-smi -a
on your host==============NVSMI LOG==============
Timestamp : Sat Aug 13 16:10:39 2022 Driver Version : 455.38 CUDA Version : 11.1
Attached GPUs : 4 GPU 00000000:02:00.0 Product Name : TITAN V Product Brand : Titan Display Mode : Disabled Display Active : Disabled Persistence Mode : Disabled MIG Mode Current : N/A Pending : N/A Accounting Mode : Disabled Accounting Mode Buffer Size : 4000 Driver Model Current : N/A Pending : N/A Serial Number : 0320618057831 GPU UUID : GPU-f5a3f95f-2685-cf01-2063-7bc624963433 Minor Number : 0 VBIOS Version : 88.00.41.00.18 MultiGPU Board : No Board ID : 0x200 GPU Part Number : 900-1G500-2500-000 Inforom Version Image Version : G001.0000.01.04 OEM Object : 1.1 ECC Object : N/A Power Management Object : N/A GPU Operation Mode Current : N/A Pending : N/A GPU Virtualization Mode Virtualization Mode : None Host VGPU Mode : N/A IBMNPU Relaxed Ordering Mode : N/A PCI Bus : 0x02 Device : 0x00 Domain : 0x0000 Device Id : 0x1D8110DE Bus Id : 00000000:02:00.0 Sub System Id : 0x121810DE GPU Link Info PCIe Generation Max : 3 Current : 1 Link Width Max : 16x Current : 16x Bridge Chip Type : N/A Firmware : N/A Replays Since Reset : 0 Replay Number Rollovers : 0 Tx Throughput : 0 KB/s Rx Throughput : 0 KB/s Fan Speed : 28 % Performance State : P8 Clocks Throttle Reasons Idle : Active Applications Clocks Setting : Not Active SW Power Cap : Not Active HW Slowdown : Not Active HW Thermal Slowdown : Not Active HW Power Brake Slowdown : Not Active Sync Boost : Not Active SW Thermal Slowdown : Not Active Display Clock Setting : Not Active FB Memory Usage Total : 12066 MiB Used : 0 MiB Free : 12066 MiB BAR1 Memory Usage Total : 256 MiB Used : 2 MiB Free : 254 MiB Compute Mode : Default Utilization Gpu : 0 % Memory : 0 % Encoder : 0 % Decoder : 0 % Encoder Stats Active Sessions : 0 Average FPS : 0 Average Latency : 0 FBC Stats Active Sessions : 0 Average FPS : 0 Average Latency : 0 Ecc Mode Current : N/A Pending : N/A ECC Errors Volatile Single Bit Device Memory : N/A Register File : N/A L1 Cache : N/A L2 Cache : N/A Texture Memory : N/A Texture Shared : N/A CBU : N/A Total : N/A Double Bit Device Memory : N/A Register File : N/A L1 Cache : N/A L2 Cache : N/A Texture Memory : N/A Texture Shared : N/A CBU : N/A Total : N/A Aggregate Single Bit Device Memory : N/A Register File : N/A L1 Cache : N/A L2 Cache : N/A Texture Memory : N/A Texture Shared : N/A CBU : N/A Total : N/A Double Bit Device Memory : N/A Register File : N/A L1 Cache : N/A L2 Cache : N/A Texture Memory : N/A Texture Shared : N/A CBU : N/A Total : N/A Retired Pages Single Bit ECC : N/A Double Bit ECC : N/A Pending Page Blacklist : N/A Remapped Rows : N/A Temperature GPU Current Temp : 39 C GPU Shutdown Temp : 100 C GPU Slowdown Temp : 97 C GPU Max Operating Temp : 91 C Memory Current Temp : 36 C Memory Max Operating Temp : 95 C Power Readings Power Management : Supported Power Draw : 27.28 W Power Limit : 250.00 W Default Power Limit : 250.00 W Enforced Power Limit : 250.00 W Min Power Limit : 100.00 W Max Power Limit : 300.00 W Clocks Graphics : 135 MHz SM : 135 MHz Memory : 850 MHz Video : 555 MHz Applications Clocks Graphics : 1200 MHz Memory : 850 MHz Default Applications Clocks Graphics : 1200 MHz Memory : 850 MHz Max Clocks Graphics : 1912 MHz SM : 1912 MHz Memory : 850 MHz Video : 1717 MHz Max Customer Boost Clocks Graphics : N/A Clock Policy Auto Boost : N/A Auto Boost Default : N/A Processes : None
GPU 00000000:03:00.0 Product Name : TITAN V Product Brand : Titan Display Mode : Disabled Display Active : Disabled Persistence Mode : Disabled MIG Mode Current : N/A Pending : N/A Accounting Mode : Disabled Accounting Mode Buffer Size : 4000 Driver Model Current : N/A Pending : N/A Serial Number : 0320618058217 GPU UUID : GPU-3d5859fc-73e1-23d1-2e59-78c5e7049d61 Minor Number : 1 VBIOS Version : 88.00.41.00.18 MultiGPU Board : No Board ID : 0x300 GPU Part Number : 900-1G500-2500-000 Inforom Version Image Version : G001.0000.01.04 OEM Object : 1.1 ECC Object : N/A Power Management Object : N/A GPU Operation Mode Current : N/A Pending : N/A GPU Virtualization Mode Virtualization Mode : None Host VGPU Mode : N/A IBMNPU Relaxed Ordering Mode : N/A PCI Bus : 0x03 Device : 0x00 Domain : 0x0000 Device Id : 0x1D8110DE Bus Id : 00000000:03:00.0 Sub System Id : 0x121810DE GPU Link Info PCIe Generation Max : 3 Current : 1 Link Width Max : 16x Current : 16x Bridge Chip Type : N/A Firmware : N/A Replays Since Reset : 0 Replay Number Rollovers : 0 Tx Throughput : 0 KB/s Rx Throughput : 0 KB/s Fan Speed : 31 % Performance State : P8 Clocks Throttle Reasons Idle : Active Applications Clocks Setting : Not Active SW Power Cap : Not Active HW Slowdown : Not Active HW Thermal Slowdown : Not Active HW Power Brake Slowdown : Not Active Sync Boost : Not Active SW Thermal Slowdown : Not Active Display Clock Setting : Not Active FB Memory Usage Total : 12066 MiB Used : 0 MiB Free : 12066 MiB BAR1 Memory Usage Total : 256 MiB Used : 2 MiB Free : 254 MiB Compute Mode : Default Utilization Gpu : 0 % Memory : 0 % Encoder : 0 % Decoder : 0 % Encoder Stats Active Sessions : 0 Average FPS : 0 Average Latency : 0 FBC Stats Active Sessions : 0 Average FPS : 0 Average Latency : 0 Ecc Mode Current : N/A Pending : N/A ECC Errors Volatile Single Bit Device Memory : N/A Register File : N/A L1 Cache : N/A L2 Cache : N/A Texture Memory : N/A Texture Shared : N/A CBU : N/A Total : N/A Double Bit Device Memory : N/A Register File : N/A L1 Cache : N/A L2 Cache : N/A Texture Memory : N/A Texture Shared : N/A CBU : N/A Total : N/A Aggregate Single Bit Device Memory : N/A Register File : N/A L1 Cache : N/A L2 Cache : N/A Texture Memory : N/A Texture Shared : N/A CBU : N/A Total : N/A Double Bit Device Memory : N/A Register File : N/A L1 Cache : N/A L2 Cache : N/A Texture Memory : N/A Texture Shared : N/A CBU : N/A Total : N/A Retired Pages Single Bit ECC : N/A Double Bit ECC : N/A Pending Page Blacklist : N/A Remapped Rows : N/A Temperature GPU Current Temp : 45 C GPU Shutdown Temp : 100 C GPU Slowdown Temp : 97 C GPU Max Operating Temp : 91 C Memory Current Temp : 43 C Memory Max Operating Temp : 95 C Power Readings Power Management : Supported Power Draw : 30.67 W Power Limit : 250.00 W Default Power Limit : 250.00 W Enforced Power Limit : 250.00 W Min Power Limit : 100.00 W Max Power Limit : 300.00 W Clocks Graphics : 135 MHz SM : 135 MHz Memory : 850 MHz Video : 555 MHz Applications Clocks Graphics : 1200 MHz Memory : 850 MHz Default Applications Clocks Graphics : 1200 MHz Memory : 850 MHz Max Clocks Graphics : 1912 MHz SM : 1912 MHz Memory : 850 MHz Video : 1717 MHz Max Customer Boost Clocks Graphics : N/A Clock Policy Auto Boost : N/A Auto Boost Default : N/A Processes : None
GPU 00000000:82:00.0 Product Name : TITAN V Product Brand : Titan Display Mode : Disabled Display Active : Disabled Persistence Mode : Disabled MIG Mode Current : N/A Pending : N/A Accounting Mode : Disabled Accounting Mode Buffer Size : 4000 Driver Model Current : N/A Pending : N/A Serial Number : 0324917182924 GPU UUID : GPU-e2336a65-b527-8ba6-c005-209ebc071c78 Minor Number : 2 VBIOS Version : 88.00.36.00.01 MultiGPU Board : No Board ID : 0x8200 GPU Part Number : 900-1G500-2500-000 Inforom Version Image Version : G001.0000.01.04 OEM Object : 1.1 ECC Object : N/A Power Management Object : N/A GPU Operation Mode Current : N/A Pending : N/A GPU Virtualization Mode Virtualization Mode : None Host VGPU Mode : N/A IBMNPU Relaxed Ordering Mode : N/A PCI Bus : 0x82 Device : 0x00 Domain : 0x0000 Device Id : 0x1D8110DE Bus Id : 00000000:82:00.0 Sub System Id : 0x121810DE GPU Link Info PCIe Generation Max : 3 Current : 1 Link Width Max : 16x Current : 16x Bridge Chip Type : N/A Firmware : N/A Replays Since Reset : 0 Replay Number Rollovers : 0 Tx Throughput : 0 KB/s Rx Throughput : 0 KB/s Fan Speed : 28 % Performance State : P8 Clocks Throttle Reasons Idle : Active Applications Clocks Setting : Not Active SW Power Cap : Not Active HW Slowdown : Not Active HW Thermal Slowdown : Not Active HW Power Brake Slowdown : Not Active Sync Boost : Not Active SW Thermal Slowdown : Not Active Display Clock Setting : Not Active FB Memory Usage Total : 12066 MiB Used : 0 MiB Free : 12066 MiB BAR1 Memory Usage Total : 256 MiB Used : 2 MiB Free : 254 MiB Compute Mode : Default Utilization Gpu : 0 % Memory : 0 % Encoder : 0 % Decoder : 0 % Encoder Stats Active Sessions : 0 Average FPS : 0 Average Latency : 0 FBC Stats Active Sessions : 0 Average FPS : 0 Average Latency : 0 Ecc Mode Current : N/A Pending : N/A ECC Errors Volatile Single Bit Device Memory : N/A Register File : N/A L1 Cache : N/A L2 Cache : N/A Texture Memory : N/A Texture Shared : N/A CBU : N/A Total : N/A Double Bit Device Memory : N/A Register File : N/A L1 Cache : N/A L2 Cache : N/A Texture Memory : N/A Texture Shared : N/A CBU : N/A Total : N/A Aggregate Single Bit Device Memory : N/A Register File : N/A L1 Cache : N/A L2 Cache : N/A Texture Memory : N/A Texture Shared : N/A CBU : N/A Total : N/A Double Bit Device Memory : N/A Register File : N/A L1 Cache : N/A L2 Cache : N/A Texture Memory : N/A Texture Shared : N/A CBU : N/A Total : N/A Retired Pages Single Bit ECC : N/A Double Bit ECC : N/A Pending Page Blacklist : N/A Remapped Rows : N/A Temperature GPU Current Temp : 42 C GPU Shutdown Temp : 100 C GPU Slowdown Temp : 97 C GPU Max Operating Temp : 91 C Memory Current Temp : 38 C Memory Max Operating Temp : 95 C Power Readings Power Management : Supported Power Draw : 27.65 W Power Limit : 250.00 W Default Power Limit : 250.00 W Enforced Power Limit : 250.00 W Min Power Limit : 100.00 W Max Power Limit : 300.00 W Clocks Graphics : 135 MHz SM : 135 MHz Memory : 850 MHz Video : 555 MHz Applications Clocks Graphics : 1200 MHz Memory : 850 MHz Default Applications Clocks Graphics : 1200 MHz Memory : 850 MHz Max Clocks Graphics : 1912 MHz SM : 1912 MHz Memory : 850 MHz Video : 1717 MHz Max Customer Boost Clocks Graphics : N/A Clock Policy Auto Boost : N/A Auto Boost Default : N/A Processes : None
GPU 00000000:83:00.0 Product Name : TITAN V Product Brand : Titan Display Mode : Disabled Display Active : Disabled Persistence Mode : Disabled MIG Mode Current : N/A Pending : N/A Accounting Mode : Disabled Accounting Mode Buffer Size : 4000 Driver Model Current : N/A Pending : N/A Serial Number : 0320618057916 GPU UUID : GPU-5908bbc9-ddab-ebe7-5624-446b3fc15348 Minor Number : 3 VBIOS Version : 88.00.41.00.18 MultiGPU Board : No Board ID : 0x8300 GPU Part Number : 900-1G500-2500-000 Inforom Version Image Version : G001.0000.01.04 OEM Object : 1.1 ECC Object : N/A Power Management Object : N/A GPU Operation Mode Current : N/A Pending : N/A GPU Virtualization Mode Virtualization Mode : None Host VGPU Mode : N/A IBMNPU Relaxed Ordering Mode : N/A PCI Bus : 0x83 Device : 0x00 Domain : 0x0000 Device Id : 0x1D8110DE Bus Id : 00000000:83:00.0 Sub System Id : 0x121810DE GPU Link Info PCIe Generation Max : 3 Current : 1 Link Width Max : 16x Current : 16x Bridge Chip Type : N/A Firmware : N/A Replays Since Reset : 0 Replay Number Rollovers : 0 Tx Throughput : 0 KB/s Rx Throughput : 0 KB/s Fan Speed : 28 % Performance State : P8 Clocks Throttle Reasons Idle : Active Applications Clocks Setting : Not Active SW Power Cap : Not Active HW Slowdown : Not Active HW Thermal Slowdown : Not Active HW Power Brake Slowdown : Not Active Sync Boost : Not Active SW Thermal Slowdown : Not Active Display Clock Setting : Not Active FB Memory Usage Total : 12066 MiB Used : 0 MiB Free : 12066 MiB BAR1 Memory Usage Total : 256 MiB Used : 2 MiB Free : 254 MiB Compute Mode : Default Utilization Gpu : 0 % Memory : 0 % Encoder : 0 % Decoder : 0 % Encoder Stats Active Sessions : 0 Average FPS : 0 Average Latency : 0 FBC Stats Active Sessions : 0 Average FPS : 0 Average Latency : 0 Ecc Mode Current : N/A Pending : N/A ECC Errors Volatile Single Bit Device Memory : N/A Register File : N/A L1 Cache : N/A L2 Cache : N/A Texture Memory : N/A Texture Shared : N/A CBU : N/A Total : N/A Double Bit Device Memory : N/A Register File : N/A L1 Cache : N/A L2 Cache : N/A Texture Memory : N/A Texture Shared : N/A CBU : N/A Total : N/A Aggregate Single Bit Device Memory : N/A Register File : N/A L1 Cache : N/A L2 Cache : N/A Texture Memory : N/A Texture Shared : N/A CBU : N/A Total : N/A Double Bit Device Memory : N/A Register File : N/A L1 Cache : N/A L2 Cache : N/A Texture Memory : N/A Texture Shared : N/A CBU : N/A Total : N/A Retired Pages Single Bit ECC : N/A Double Bit ECC : N/A Pending Page Blacklist : N/A Remapped Rows : N/A Temperature GPU Current Temp : 41 C GPU Shutdown Temp : 100 C GPU Slowdown Temp : 97 C GPU Max Operating Temp : 91 C Memory Current Temp : 40 C Memory Max Operating Temp : 95 C Power Readings Power Management : Supported Power Draw : 25.91 W Power Limit : 250.00 W Default Power Limit : 250.00 W Enforced Power Limit : 250.00 W Min Power Limit : 100.00 W Max Power Limit : 300.00 W Clocks Graphics : 135 MHz SM : 135 MHz Memory : 850 MHz Video : 555 MHz Applications Clocks Graphics : 1200 MHz Memory : 850 MHz Default Applications Clocks Graphics : 1200 MHz Memory : 850 MHz Max Clocks Graphics : 1912 MHz SM : 1912 MHz Memory : 850 MHz Video : 1717 MHz Max Customer Boost Clocks Graphics : N/A Clock Policy Auto Boost : N/A Auto Boost Default : N/A Processes : None
{ "init": true, "exec-opts": ["native.cgroupdriver=systemd"], "default-runtime": "nvidia", "runtimes": { "nvidia": { "path": "/usr/bin/nvidia-container-runtime", "runtimeArgs": [] } } }
2022/08/13 07:41:28 Starting FS watcher. 2022/08/13 07:41:28 Starting OS watcher. 2022/08/13 07:41:28 Retreiving plugins. 2022/08/13 07:41:28 migstrategy= none 2022/08/13 07:41:28 uuid= GPU-f5a3f95f-2685-cf01-2063-7bc624963433 2022/08/13 07:41:28 uuid= GPU-3d5859fc-73e1-23d1-2e59-78c5e7049d61 2022/08/13 07:41:28 uuid= GPU-e2336a65-b527-8ba6-c005-209ebc071c78 2022/08/13 07:41:28 uuid= GPU-5908bbc9-ddab-ebe7-5624-446b3fc15348 2022/08/13 07:41:28 Starting GRPC server for 'nvidia.com/gpu' 2022/08/13 07:41:28 Starting to serve 'nvidia.com/gpu' on /var/lib/kubelet/device-plugins/nvidia-gpu.sock 2022/08/13 07:41:28 Registered device plugin for 'nvidia.com/gpu' with Kubelet
Client: Docker Engine - Community Version: 19.03.3 API version: 1.40 Go version: go1.12.10 Git commit: a872fc2f86 Built: Tue Oct 8 00:58:10 2019 OS/Arch: linux/amd64 Experimental: false
Server: Docker Engine - Community Engine: Version: 19.03.13 API version: 1.40 (minimum version 1.12) Go version: go1.13.15 Git commit: 4484c46d9d Built: Wed Sep 16 17:02:21 2020 OS/Arch: linux/amd64 Experimental: false containerd: Version: 1.2.10 GitCommit: b34a5c8af56e510852c35414db4c1f4fa6172339 nvidia: Version: 1.0.0-rc8+dev GitCommit: 3e425f80a8c931f88e6d94a8c831b9d5aa481657 docker-init: Version: 0.18.0 GitCommit: fec3683
version: 1.0.0 build date: 2018-03-06T02:05+0000 build revision: be797da00b156493e80f1ae6f38d69f23c932554 build compiler: gcc 4.8.5 20150623 (Red Hat 4.8.5-16) build platform: x86_64 build flags: -D_GNU_SOURCE -D_FORTIFY_SOURCE=2 -DNDEBUG -std=gnu11 -O2 -g -fdata-sections -ffunction-sections -fstack-protector -fno-strict-aliasing -fvisibility=hidden -Wall -Wextra -Wcast-align -Wpointer-arith -Wmissing-prototypes -Wnonnull -Wwrite-strings -Wlogical-op -Wformat=2 -Wmissing-format-attribute -Winit-self -Wshadow -Wstrict-prototypes -Wunreachable-code -Wconversion -Wsign-conversion -Wno-unknown-warning-option -Wno-format-extra-args -Wno-gnu-alignof-expression -Wl,-zrelro -Wl,-znow -Wl,-zdefs -Wl,--gc-sections