What happened:All previous Gpus of the cluster were 515 version of the driver and cuda11.7.Rencently I add a machine with L20(only support driver 535 at least and cuda12, then I ran into a problem that the gpus were not recognized correctly:
What you expected to happen:
How to reproduce it (as minimally and precisely as possible):
Anything else we need to know?:
The output of nvidia-smi -a on your host
Your docker or containerd configuration file (e.g: /etc/docker/daemon.json)
The hami-device-plugin container logs
The hami-scheduler container logs
The kubelet logs on the node (e.g: sudo journalctl -r -u kubelet)
What happened:All previous Gpus of the cluster were 515 version of the driver and cuda11.7.Rencently I add a machine with L20(only support driver 535 at least and cuda12, then I ran into a problem that the gpus were not recognized correctly:
What you expected to happen:
How to reproduce it (as minimally and precisely as possible):
Anything else we need to know?:
nvidia-smi -a
on your host/etc/docker/daemon.json
)sudo journalctl -r -u kubelet
)dmesg
Environment:
docker version
uname -a