NVIDIA / gpu-operator

NVIDIA GPU Operator creates/configures/manages GPUs atop Kubernetes
Apache License 2.0
1.54k stars 265 forks source link

"nvidia-smi": executable file not found in $PATH: unknown #707

Open zlianzhuang opened 2 months ago

zlianzhuang commented 2 months ago

1. Quick Debug Information

2. Issue or feature descriptionn

node reboot. when the pod start. nvidia-smi can't use. "nvidia-smi": executable file not found in $PATH: unknown

3. Steps to reproduce the issue

a) create a nvidia pod with nodeselector on x node b) reboot x node c) pod start. nvidia-smi executable file not found.

4. Information to [attach]

kubectl replace the pod. nvidia-smi is executable.

shivamerla commented 1 month ago

@zlianzhuang on node reboot, it takes around 3-5 minutes for the GPU stack to be ready (driver installation, container-toolkit setup etc), these errors are expected before the stack is ready. Using pre-compiled drivers will minimize this delay, but that feature is not yet GA. Please make sure that the driver images are available for the kernel you are using.