Closed haijohn closed 2 years ago
detailed logs:
into dlsym nvmlInitWithFlags
nvmlInitWithFlags
can't find function nvmlDeviceGetComputeRunningProcesses_v2 in libnvidia-ml.so.1
loaded nvml libraries
NVML DeviceGetHandleByUUIDNot supportedGPU-caba9b00-6386-2c33-7834-646ef2692cb7
v=0 p=GPU-caba9b00-6386-2c33-7834-646ef2692cb7 idx=0
virtual devices=1
sm_limit 0:100
sm_limit 1:100
sm_limit 2:100
sm_limit 3:100
sm_limit 4:100
sm_limit 5:100
sm_limit 6:100
sm_limit 7:100
sm_limit 8:100
sm_limit 9:100
sm_limit 10:100
sm_limit 11:100
sm_limit 12:100
sm_limit 13:100
sm_limit 14:100
sm_limit 15:100
into dlsym nvmlInternalGetExportTable
into dlsym nvmlDeviceGetCount_v2
NVML DeviceGetCount virtual=1
into dlsym nvmlDeviceGetHandleByIndex_v2
nvmlDeviceGetHandleByIndex_v2 index=0
into dlsym nvmlEventSetCreate
into dlsym nvmlSystemGetDriverVersion
into dlsym nvmlSystemGetCudaDriverVersion_v2
into dlsym cuDriverGetVersion
Wed Aug 4 01:57:25 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.33.01 Driver Version: 440.33.01 CUDA Version: 11.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
into dlsym nvmlDeviceGetIndex
into dlsym nvmlDeviceGetName
into dlsym nvmlDeviceGetPciInfo_v3
into dlsym nvmlDeviceGetPersistenceMode
into dlsym nvmlDeviceGetDisplayActive
into dlsym nvmlDeviceGetEccMode
into dlsym nvmlDeviceGetFanSpeed
into dlsym nvmlDeviceGetTemperature
into dlsym nvmlDeviceGetPerformanceState
into dlsym nvmlDeviceGetPowerUsage
into dlsym nvmlDeviceGetEnforcedPowerLimit
into dlsym nvmlDeviceGetMemoryInfo
origin_free=12808486912 total=12808486912
dev=0 i=0
get_current_device_memory_usage:tick=5 result=117440512
usage=117440512 limit=12808355840
into dlsym nvmlDeviceGetUtilizationRates
into dlsym nvmlDeviceGetComputeMode
| 0 Tesla M40 Off | 00000000:00:08.0 Off | Off |
| N/A 25C P8 16W / 250W | 112MiB / 12215MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
into dlsym nvmlDeviceGetComputeRunningProcesses
Get RunningProcesses_v2
into dlsym nvmlDeviceGetGraphicsRunningProcesses
into dlsym nvmlDeviceGetMPSComputeRunningProcesses
| No running processes found |
+-----------------------------------------------------------------------------+
into dlsym nvmlDeviceValidateInforom
into dlsym nvmlEventSetFree
into dlsym nvmlShutdown
Calling exit handler
这是个warning,代表你的驱动版本不是最新的,有一些cuda11的接口找不到,不会影响结果,另外请升级镜像到4pdosc/k8s-device-plugin:latest
thanks,the latest image solves the problem
nvidia-smi
runs successfully on host and insider the container if I use the original k8s-device-plugin, but I got following erorr if using this vgpu device pluginoutput of
nvidia-smi