AliyunContainerService / gpushare-device-plugin

GPU Sharing Device Plugin for Kubernetes Cluster
Apache License 2.0
468 stars 144 forks source link

GPU device not detected with nvidia driver > 430.XX #36

Closed ptonelli closed 3 years ago

ptonelli commented 3 years ago

When running with 450.XX or 460.XX drivers, the logs of the pod are:

gpumanager.go:28] Loading NVML
gpumanager.go:31] Failed to initialize NVML: could not load NVML library.
gpumanager.go:32] If this is a GPU node, did you set the docker default runtime to `nvidia`?

The nvidia driver is running correctly on the machine as nvidia-smi show the gpu.

We are currently trying to update the dependancies of the project and rebuilding the device plugin but have failed to solve the issue.

ptonelli commented 3 years ago

by lowering the linux kernel image version from 5.10 to 4.18, it solved the issue.