NVIDIA / gpu-operator

NVIDIA GPU Operator creates/configures/manages GPUs atop Kubernetes
Apache License 2.0
1.25k stars 237 forks source link

Issue with the nvidia-device-plugin-daemonset error mounting /run/nvidia/driver/usr/lib/x86_64-linux-gnu/libnvidia-egl-gbm.so.1.1.1 #778

Open adwiza opened 1 week ago

adwiza commented 1 week ago

The template below is mostly useful for bug reports and support questions. Feel free to remove anything which doesn't apply to you and add more information where it makes sense.

Important Note: NVIDIA AI Enterprise customers can get support from NVIDIA Enterprise support. Please open a case here.

1. Quick Debug Information

2. Issue or feature description

Briefly explain the issue in terms of expected behavior and current behavior.

I have an issue with the nvidia-device-plugin-daemonset

3. Steps to reproduce the issue

Detailed steps to reproduce the issue.

4. Information to attach (optional if deemed irrelevant)

+-----------------------------------------------------------------------------------------+ | MIG devices: | +------------------+----------------------------------+-----------+-----------------------+ | GPU GI CI MIG | Memory-Usage | Vol| Shared | | ID ID Dev | BAR1-Usage | SM Unc| CE ENC DEC OFA JPG | | | | ECC| | |==================+==================================+===========+=======================| | 0 0 0 0 | 1MiB / 81221MiB | 98 0 | 7 0 5 1 1 | | | 1MiB / 131072MiB | | | +------------------+----------------------------------+-----------+-----------------------+

+-----------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=========================================================================================| | No running processes found | +-----------------------------------------------------------------------------------------+] Output from running nvidia-smi from the driver container: kubectl exec DRIVER_POD_NAME -n OPERATOR_NAMESPACE -c nvidia-driver-ctr -- nvidia-smi

Collecting full debug bundle (optional):

curl -o must-gather.sh -L https://raw.githubusercontent.com/NVIDIA/gpu-operator/master/hack/must-gather.sh 
chmod +x must-gather.sh
./must-gather.sh

NOTE: please refer to the must-gather script for debug data collected.

This bundle can be submitted to us via email: operator_feedback@nvidia.com