canonical / microk8s

MicroK8s is a small, fast, single-package Kubernetes for datacenters and the edge.
https://microk8s.io
Apache License 2.0
8.52k stars 773 forks source link

Docker crashing MicroK8s kubelet #2269

Closed JakeMHughes closed 3 years ago

JakeMHughes commented 3 years ago

Hi,

I'm in the process of changing all of my docker deployments over to a microK8s cluster. After successfully deploying the application in K8s I go to my docker's application repository and run docker-compose down

This stops and removes the docker container but so far, every time I have done this ( 3 or 4 times now ) my kubelet changes to a FAIL! state and my node changes to NotReady

So far I've just been restarting the kubelet manually as my solution.

Host: Ubuntu 20.04 Docker: 20.10.6 MicroK8s: 1.20.6

inspection-report-20210517_023626.tar.gz

balchua commented 3 years ago

Can you try adding --advertise-address=127.0.0.1 to the file /var/snap/microk8s/current/args/kube-apiserver?

MrCorncob commented 3 years ago

Same for me. Try switching to microk8s v1.21. The microk8s daemon will still being down but the running pods/deploymenst/services will still be working.

balchua commented 3 years ago

@MrCorncob if you can try adding the arguments i mentioned here https://github.com/ubuntu/microk8s/issues/2269#issuecomment-842353921 see if it still restarts the daemons.

JakeMHughes commented 3 years ago

Can you try adding --advertise-address=127.0.0.1 to the file /var/snap/microk8s/current/args/kube-apiserver?

After adding this snippet to the file, i stopped microk8s and then started it again to make sure it applied. This seemed to put the kubelet in a crash loop with status '255'

balchua commented 3 years ago

I can't remember what to put into these values. Can replacing advertise-address with bind-address=192.168.0.21 instead? Sorry bou that. Right now, im guessing that the apiservice kicker is detecting a new network interface then restarts the apiserver atleast thats what i see in the logs...

JakeMHughes commented 3 years ago

I can't remember what to put into these values. Can replacing advertise-address with bind-address=192.168.0.21 instead? Sorry bou that. Right now, im guessing that the apiservice kicker is detecting a new network interface then restarts the apiserver atleast thats what i see in the logs...

So I tried this and rebooted MicroK8s, this time everything was running but I was unable to access the cluster ( not even using the kubectl built into MicroK8s )

After resetting the settings in the file back to my default I tried to reproduce the issue using the same compose file that was crashing the kubelet and it wasn't happening this time around so I'm wondering if it wasn't actually docker causing it before and if it was just a coincidence it was happening at the same time

balchua commented 3 years ago

My gut tells me that the apiservice kicker seems to be restarting kubelite. Some were able to alleviate this by adding --bind-address-0.0.0.0 forcing the apiservice kicker to ignore any network changes hence reducing kubelite restarts. There's also a PR (https://github.com/ubuntu/microk8s/pull/2217) to add a lock file ${SNAP_DATA}/var/lock/no-cert-reissue instead of changing the kube-apiserver args.

andrew-landsverk-win commented 3 years ago

I am also seeing something similar, we use docker-in-docker for integration inside pods on microk8s, every time the integration pod calls docker, kubelet crashes.

andrew-landsverk-win commented 3 years ago

I just tested adding --bind-address=0.0.0.0 to /var/snap/microk8s/current/args/kube-apiserver and whenever my pod tries to start another container using Docker in Docker inside the container, the kubelet still crashes. For the PR for that lock file, is that something I need to add to my nodes myself? Is that PR present in latest/stable?

Thanks!

balchua commented 3 years ago

@andrew-landsverk-win which version are you using? Since adding the arg --bind-address and you are still facing the issue, then it must be something else other than the apiservice-kicker.

Can you share your inspect tarball? And also any steps to reproduce? Thanks

andrew-landsverk-win commented 3 years ago

Sorry for the late response! I discovered that my VMs were not on our fast storage, so I have remedied that. I am using latest/stable which looks like it's tracking 1.21 right now.

I will post back here if I still see issues. Thanks!

JakeMHughes commented 3 years ago

with --bind-address=0.0.0.0 and after updating to 1.21, I havent ran into the issue since ( also have not tried 1.21 without the bind address either )

JakeMHughes commented 3 years ago

Gonna go ahead and close this since updating fixed my issue and there hasnt been any activity in almost a month. If any other commentators have a persisting issue and are unable to update, I would recommend creating a new ticket and reference this one

ktx-kirtan commented 4 months ago

Just to add, --bind-address=0.0.0.0 also worked for me in the latest v1.30.1