canonical / microk8s

MicroK8s is a small, fast, single-package Kubernetes for datacenters and the edge.
https://microk8s.io
Apache License 2.0
8.39k stars 766 forks source link

Worker nodes fail to start after reboot as nf_conntrack kernel module not loaded #4462

Open rp42 opened 6 months ago

rp42 commented 6 months ago

Summary

I added a node as a worker to a small cluster that had GPU enabled. It works fine initially, but on rebooting the node the microk8s.daemon-kubelite.service fails to start as it is unable to open /proc/sys/net/netfilter/nf_conntrack_max :

microk8s.daemon-kubelite[2119]: E0314 18:28:26.546586    2119 server.go:537] "Error running ProxyServer" err="open /proc/sys/net/netfilter/nf_conntrack_max: no such file or directory"

Adding nf_conntrack to the end of /etc/modules-load.d/modules.conf in the worker node VM works around the issue.

Nodes are all running Ubuntu Server 22.04.4 and microk8s v1.28.7 from snap. They run as VMs in a Proxmox cluster.

What Should Happen Instead?

Node should come up into Ready status after it is rebooted

Reproduction Steps

  1. Single node with GPU enabled, but no GPU h/w
  2. Add a GPU node as a worker to the non-GPU node
  3. Verify all nodes are ready and cluster is functional
  4. Reboot the GPU node and wait for it to return to Ready status

Introspection Report

Please contact me directly if this is required.

Can you suggest a fix?

Ensure the nf_conntrack module is loaded on worker nodes as it is on full nodes.

Are you interested in contributing with a fix?

Not sure where to fix this issue properly.

andrew-landsverk-win commented 4 months ago

We're running microk8s on Red Hat 9 and saw the same problem during our patching for this cycle. The suggested fix of adding nf_conntrack to modules.conf has also corrected the issue on our end. Is there a long term fix coming for this issue?

Thanks!

geocomm-jmeunier commented 3 months ago

We're running microk8s on Ubuntu 22.04 and saw this problem in different environments. The suggested fix of adding nf_conntrack to modules.conf has fixed our issue. We would appreciate a long-term fix.

SphtKr commented 2 months ago

Seeing this also with 1.29.4 (snap) on Ubuntu 22.04, single node cluster with Calico (and multus, if relevant). The modules.conf worked for me as well, but I have no idea what changed or why it didn't require this before.

giner commented 2 months ago

We are experiencing the same issue with Microk8s 1.29.4 running on AWS EC2 with Ubuntu 22.04. The current workaround is to forcefully load the module however it's not clear what has changed since before and why we have to do this manually.

echo nf_conntrack | sudo tee /etc/modules-load.d/nf_conntrack.conf
pvginkel commented 2 months ago

I suddenly got the same issue on Ubuntu 24.04. The fix solved the issue for me also.

hackstepz commented 1 month ago

I suddenly got the same issue on Ubuntu 24.04 Desktop. The fix solved the issue for me also.