I have a 3 node cluster running microk8s 1.29.4 with a nvidia RTX 3060 in gpu01 node.
$ microk8s.kubectl get nodes
NAME STATUS ROLES AGE VERSION
gpu01 Ready <none> 47m v1.29.4
mm321 Ready <none> 57m v1.29.4
mm322 Ready <none> 48m v1.29.4
On executing microk8s enable nvidia on master node (mm321), some of the pods related to gpu operator are stuck in Init state
$ microk8s inspect
Inspecting system
Inspecting Certificates
Inspecting services
Service snap.microk8s.daemon-cluster-agent is running
Service snap.microk8s.daemon-containerd is running
Service snap.microk8s.daemon-kubelite is running
Service snap.microk8s.daemon-k8s-dqlite is running
Service snap.microk8s.daemon-apiserver-kicker is running
Copy service arguments to the final report tarball
Inspecting AppArmor configuration
Gathering system information
Copy processes list to the final report tarball
Copy disk usage information to the final report tarball
Copy memory usage information to the final report tarball
Copy server uptime to the final report tarball
Copy openSSL information to the final report tarball
Copy snap list to the final report tarball
Copy VM name (or none) to the final report tarball
Copy current linux distribution to the final report tarball
Copy asnycio usage and limits to the final report tarball
Copy inotify max_user_instances and max_user_watches to the final report tarball
Copy network configuration to the final report tarball
Inspecting kubernetes cluster
Inspect kubernetes cluster
Inspecting dqlite
Inspect dqlite
cp: cannot stat '/var/snap/microk8s/6809/var/kubernetes/backend/localnode.yaml': No such file or directory
WARNING: Maximum number of inotify user watches is less than the recommended value of 1048576.
Increase the limit with:
echo fs.inotify.max_user_watches=1048576 | sudo tee -a /etc/sysctl.conf
sudo sysctl --system
Building the report tarball
Report tarball is at /var/snap/microk8s/6809/inspection-report-20240630_000040.tar.gz
Summary
I have a 3 node cluster running microk8s 1.29.4 with a nvidia RTX 3060 in gpu01 node.
On executing microk8s enable nvidia on master node (mm321), some of the pods related to gpu operator are stuck in Init state
What Should Happen Instead?
Pods should not be stuck in init state
Reproduction Steps
Introspection Report
inspection-report-20240630_000040.tar.gz
Can you suggest a fix?
No
Are you interested in contributing with a fix?
No