tarasglek commented 7 years ago

Issue Report

Bug

I tried the sample kubernetes deployment in ignition repo. It seems overly fragile as compared to what happens on ubuntu. There kubelet is able to recover after dying.

Container Linux Version

ssh core@node2.example.com  cat /etc/os-release
Warning: Permanently added 'node2.example.com,172.17.0.22' (ECDSA) to the list of known hosts.
NAME="Container Linux by CoreOS"
ID=coreos
VERSION=1409.7.0
VERSION_ID=1409.7.0
BUILD_ID=2017-07-19-0005
PRETTY_NAME="Container Linux by CoreOS 1409.7.0 (Ladybug)"
ANSI_COLOR="38;5;75"
HOME_URL="https://coreos.com/"
BUG_REPORT_URL="https://issues.coreos.com"
COREOS_BOARD="amd64-usr"

Environment

What hardware/cloud provider/hypervisor is being used to run Container Linux?

Expected Behavior

master kubelet is able to restart successfully

Actual Behavior

Aug 04 16:07:59 node1.example.com kubelet-wrapper[4443]: I0804 16:07:59.224584    4443 kubelet_node_status.go:82] Attempting to register node node1.example.com
Aug 04 16:07:59 node1.example.com kubelet-wrapper[4443]: E0804 16:07:59.227879    4443 kubelet_node_status.go:106] Unable to register node "node1.example.com" with API server: Post https://node1.example.com:443/api/v1/nodes: dial tcp 172.17.0.21:443: getsockopt: connection refused

Reproduction Steps

sudo CONTAINER_RUNTIME=docker ./scripts/devnet create bootkube
sudo ./scripts/libvirt create-docker
https://github.com/coreos/matchbox/blob/master/Documentation/bootkube.md to start kubernetes
ssh core@node1.example.com sudo pkill -f kubelet
Verify that above registration error occurs
ssh core@node1.example.com sudo systemctl start bootkube
Cluster is operational again

Other Information

Feature Request

Environment

What hardware/cloud provider/hypervisor is being used to run Container Linux? kvm

Desired Feature

Other Information

tarasglek commented 7 years ago

Note, same applies to terraform kubelet

euank commented 7 years ago

cc @dghubble

dghubble commented 7 years ago

Kubelets will log error messages about being unable to reach a kube-apiserver, until an an apiserver is bootstrapped. You'll find logs like these before Kubernetes is successfully bootstrapped. They are normal. In a single-master cluster, if you restart or kill the kubelet (or reboot the node, etc), you'll see kubelet restart and it will show these same errors for a time because, indeed, there is no apiserver pod. Checkpointer pods will bring back the control plane, but this may take a minute or so, it is complex. At step 5, can you tail the kubelet logs and report back if it doesn't recover after a few minutes?

Even with a single master, our Kubernetes clusters are designed to tolerate the temporary loss of the master (reboot, restart kubelet, kill pid) and recover without intervention, as enabled Container Linux auto-updates will naturally reboot nodes over time anyway.

Note:

bootkube.service should only be run for the initial cluster bootstrapping. Never again. It is not designed to repair a cluster that is in a partially created state. I suspect the cluster becoming operational again is the checkpointer at work and you likely got lucky and re-running bootkube didn't harm anything.
The terraform-based example clusters are identical, just setup by Terraform.

tarasglek commented 7 years ago

Sorry, you are right.

tarasglek commented 7 years ago

Sorry, I filed the bugreport wrong. Indeed killing just the master kubelet seems safe.

If I kill just master kubelet, it seems to recover. However if I kill all kubelets, the cluster doesn't recover(waited 10min). This is 100% repeatable.

#!/bin/sh
set -x -e
for node in 'node1' 'node2' 'node3'; do
    ssh core@$node.example.com sudo pkill -f kubelet
done

Luckily bootkube tickles it into coming back

ssh core@node1.example.com sudo systemctl start bootkube

dghubble commented 7 years ago

I can retry your specific example soon. Clusters are expected to tolerate all nodes being rebooted since complete power losses or simply turning off a cluster (e.g. load is periodic) for a while is ok. We do usually test this with shutdowns rather than process killing so maybe there's something there.

dghubble commented 7 years ago

I can produce this, it does seem to be an issue. You can shutdown any/all nodes, you can restart any/all kubelets, you can restart any/all docker daemons - all recover.

Killing the kubelet process however, the kubelet.service starts again, but ~doesn't seem to trigger checkpoint recovery~ apiserver remains inaccessible. This can be alleviated by rebooting the controller node, which recovers the control plane on a fresh bootup, but isn't ideal.

Should be discussed with https://github.com/kubernetes-incubator/bootkube

aaronlevy commented 7 years ago

Killing the kubelet shouldn't affect running workloads at all. It should just come back up and re-inspect the existing state from docker. So there shouldn't be any checkpoint recovery coming into this at all (you're not killing docker containers) - unless I'm missing some part of the reproduction besides killing the kubelet process.

Might need a bit more info here (or we can try and reproduce as well)

dghubble commented 7 years ago

I've reproduced this on QEMU/KVM nodes. Running pkill -f kubelet on the master makes the apiserver inaccessible to users as OP described. You can tail the apiserver and it stops immediately upon the kubelet being killed. No useful messages at verbosity 8.

coreos / bugs

bootkube kubelet is not resilient #2091

Issue Report

Bug

Container Linux Version

Environment

Expected Behavior

Actual Behavior

Reproduction Steps

Other Information

Feature Request

Environment

Desired Feature

Other Information