Open Aaron-Ritter opened 1 month ago
while trying to extract inspect information i discovered the following, after shuting down one of my master nodes and the node being stuck at NotReady, as soon i run microk8s inspect it became Ready.
When i looked at the snap.microk8s.daemon-kubelite.service logs i discovered that it was in a endless restart loop and somehow the inspect got it out of.
my.log:Jul 20 17:29:36 k8s-test-m3 systemd[1]: snap.microk8s.daemon-kubelite.service: Scheduled restart job, restart counter is at 1.
my.log:Jul 20 17:29:39 k8s-test-m3 systemd[1]: snap.microk8s.daemon-kubelite.service: Scheduled restart job, restart counter is at 2.
my.log:Jul 20 17:29:42 k8s-test-m3 systemd[1]: snap.microk8s.daemon-kubelite.service: Scheduled restart job, restart counter is at 3.
my.log:Jul 20 17:29:44 k8s-test-m3 systemd[1]: snap.microk8s.daemon-kubelite.service: Scheduled restart job, restart counter is at 4.
my.log:Jul 20 17:29:47 k8s-test-m3 systemd[1]: snap.microk8s.daemon-kubelite.service: Scheduled restart job, restart counter is at 5.
my.log:Jul 20 17:29:50 k8s-test-m3 systemd[1]: snap.microk8s.daemon-kubelite.service: Scheduled restart job, restart counter is at 6.
....
my.log:Jul 20 17:36:11 k8s-test-m3 systemd[1]: snap.microk8s.daemon-kubelite.service: Scheduled restart job, restart counter is at 132.
my.log:Jul 20 17:36:18 k8s-test-m3 systemd[1]: snap.microk8s.daemon-kubelite.service: Scheduled restart job, restart counter is at 133.
my.log:Jul 20 17:36:25 k8s-test-m3 systemd[1]: snap.microk8s.daemon-kubelite.service: Scheduled restart job, restart counter is at 134.
my.log:Jul 20 17:36:32 k8s-test-m3 systemd[1]: snap.microk8s.daemon-kubelite.service: Scheduled restart job, restart counter is at 135.
during the whole time it complained about netfilter Error: open /proc/sys/net/netfilter/nf_conntrack_max: no such file or directory
is there anything the inspect would influence with regards to that?
Jul 20 17:29:25 k8s-test-m3 microk8s.daemon-kubelite[1148]: + /sbin/modprobe br_netfilter
Jul 20 17:29:25 k8s-test-m3 microk8s.daemon-kubelite[1148]: + echo 'Successfully loaded br_netfilter module.'
Jul 20 17:29:25 k8s-test-m3 microk8s.daemon-kubelite[1148]: Successfully loaded br_netfilter module.
Jul 20 17:29:35 k8s-test-m3 microk8s.daemon-kubelite[1148]: I0720 17:29:35.919600 1148 conntrack.go:119] "Set sysctl" entry="net/netfilter/nf_conntrack_max" value=524288
Jul 20 17:29:35 k8s-test-m3 microk8s.daemon-kubelite[1148]: E0720 17:29:35.919632 1148 server.go:558] "Error running ProxyServer" err="open /proc/sys/net/netfilter/nf_conntrack_max: no such file or directory"
Jul 20 17:29:35 k8s-test-m3 microk8s.daemon-kubelite[1148]: Error: open /proc/sys/net/netfilter/nf_conntrack_max: no such file or directory
Jul 20 17:29:35 k8s-test-m3 microk8s.daemon-kubelite[1148]: F0720 17:29:35.921810 1148 daemon.go:46] Proxy exited open /proc/sys/net/netfilter/nf_conntrack_max: no such file or directory
Jul 20 17:29:38 k8s-test-m3 microk8s.daemon-kubelite[1754]: I0720 17:29:38.863352 1754 conntrack.go:119] "Set sysctl" entry="net/netfilter/nf_conntrack_max" value=524288
Jul 20 17:29:38 k8s-test-m3 microk8s.daemon-kubelite[1754]: E0720 17:29:38.863386 1754 server.go:558] "Error running ProxyServer" err="open /proc/sys/net/netfilter/nf_conntrack_max: no such file or directory"
Jul 20 17:29:38 k8s-test-m3 microk8s.daemon-kubelite[1754]: Error: open /proc/sys/net/netfilter/nf_conntrack_max: no such file or directory
Jul 20 17:29:38 k8s-test-m3 microk8s.daemon-kubelite[1754]: F0720 17:29:38.863906 1754 daemon.go:46] Proxy exited open /proc/sys/net/netfilter/nf_conntrack_max: no such file or directory
....
Jul 20 17:36:25 k8s-test-m3 microk8s.daemon-kubelite[33529]: I0720 17:36:25.108520 33529 conntrack.go:119] "Set sysctl" entry="net/netfilter/nf_conntrack_max" value=524288
Jul 20 17:36:25 k8s-test-m3 microk8s.daemon-kubelite[33529]: E0720 17:36:25.108547 33529 server.go:558] "Error running ProxyServer" err="open /proc/sys/net/netfilter/nf_conntrack_max: no such file or directory"
Jul 20 17:36:25 k8s-test-m3 microk8s.daemon-kubelite[33529]: Error: open /proc/sys/net/netfilter/nf_conntrack_max: no such file or directory
Jul 20 17:36:25 k8s-test-m3 microk8s.daemon-kubelite[33529]: F0720 17:36:25.109051 33529 daemon.go:46] Proxy exited open /proc/sys/net/netfilter/nf_conntrack_max: no such file or directory
Jul 20 17:36:31 k8s-test-m3 microk8s.daemon-kubelite[33869]: I0720 17:36:31.877017 33869 conntrack.go:119] "Set sysctl" entry="net/netfilter/nf_conntrack_max" value=524288
Jul 20 17:36:31 k8s-test-m3 microk8s.daemon-kubelite[33869]: E0720 17:36:31.877041 33869 server.go:558] "Error running ProxyServer" err="open /proc/sys/net/netfilter/nf_conntrack_max: no such file or directory"
Jul 20 17:36:31 k8s-test-m3 microk8s.daemon-kubelite[33869]: Error: open /proc/sys/net/netfilter/nf_conntrack_max: no such file or directory
Jul 20 17:36:31 k8s-test-m3 microk8s.daemon-kubelite[33869]: F0720 17:36:31.877603 33869 daemon.go:46] Proxy exited open /proc/sys/net/netfilter/nf_conntrack_max: no such file or directory
Jul 20 17:36:38 k8s-test-m3 microk8s.daemon-kubelite[34141]: I0720 17:36:38.599899 34141 conntrack.go:119] "Set sysctl" entry="net/netfilter/nf_conntrack_max" value=524288
Jul 20 17:36:38 k8s-test-m3 microk8s.daemon-kubelite[34141]: I0720 17:36:38.599994 34141 conntrack.go:119] "Set sysctl" entry="net/netfilter/nf_conntrack_tcp_timeout_established" value=86400
Jul 20 17:36:38 k8s-test-m3 microk8s.daemon-kubelite[34141]: I0720 17:36:38.600029 34141 conntrack.go:119] "Set sysctl" entry="net/netfilter/nf_conntrack_tcp_timeout_close_wait" value=3600
Summary
When stopping (restarting) a node in 1.30 we have regularly the issue that it does not get Ready again in the cluster.
microk8s inspect shows
FAIL: Service snap.microk8s.daemon-kubelite is not running
What Should Happen Instead?
The node should come online without issues.
Reproduction Steps
The reproduction is not consistant so it is related to the start of the node or the shutdown before that.
On both nodes, kubernetes related pods just stay on running as status. And all application pods are Terminating.
snap.microk8s.daemon-kubelite.service.txt
after restarting the microk8s worker node manually with
sudo snap stop microk8s
andsudo snap start microk8s
it recovered and reconnected:If restarting the affected node does not work, removing and adding the node again is the only thing which helps.
Introspection Report
todo
Can you suggest a fix?
not at this moment
Are you interested in contributing with a fix?
yes, very happy to test and collaborate further on problem finding