Open blackjack2015 opened 2 years ago
Hi, were you able to solve this ? @blackjack2015 I am stuck in the same spot.
Double-check that pods can be scheduled to your node. I forgot to remove the node-role.kubernetes.io/control-plane
taint and was having this problem.
Double-check that pods can be scheduled to your node. I forgot to remove the
node-role.kubernetes.io/control-plane
taint and was having this problem.
I have certainly confirmed this. My current solution is just using kubernetes 1.22 instead of 1.24. The point is that the versions above 1.22 apply contained as the default container manager, while 1.22 applies dockerd.
Hey! Any update on this issue? @blackjack2015 @anibali
This issue is stale because it has been open 90 days with no activity. This issue will be closed in 30 days unless new comments are made or the stale label is removed.
I ran into this issue using Talos Kubernetes and ended up manually adding the "nvidia.com/gpu.present=true" label to my node, which concerns me because there should be a lot of other nvidia labels automatically added by....something that appears to have failed to do so. But on the other hand, everything works now so 🤷🏻
This seems to be an issue starting with device plugin v0.15.0
. Adding the label @v1nsai mentioned makes the daemonset select the right nodes.
Thanks for the brilliant tool to deploy GPU-enabled pods by k8s. I have successfully installed all the prerequisites (including docker, nvidia-docker2, kubernetes). Some system and software information is as follows:
GPU device: Nvidia GeForce 2070 SUPER Driver version: 515.48.07 Docker version: 20.10.17 Kubernetes version: 1.24.2
The /etc/docker/daemon.json has been edited as follows:
I have also checked that nvidia docker runs successfully with "docker run --rm --gpus all nvidia/cuda:11.0-base nvidia-smi".
After I executed the following instruction to deploy "nvidia-device-plugin-daemonset":
kubectl create -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/v0.12.2/nvidia-device-plugin.yml
Then I checked the daemonset status with "kubectl get daemonset -A" and had:
The pod information is:
It seems that no pod of "nvidia-device-plugin" is launched.
Would you mind giving some suggestions to solve this? Thank you!