chris-short / rak8s

Stand up a Raspberry Pi based Kubernetes cluster with Ansible
MIT License
365 stars 112 forks source link

TASK [master : Initialize Master v1.14.1] Fails #57

Open peiman opened 5 years ago

peiman commented 5 years ago

OS running on Ansible host:

macOS 10.14.4

Ansible Version (ansible --version):

ansible 2.7.10 config file = None configured module search path = ['/Users/peiman/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules'] ansible python module location = /usr/local/Cellar/ansible/2.7.10/libexec/lib/python3.7/site-packages/ansible executable location = /usr/local/bin/ansible python version = 3.7.3 (default, Mar 29 2019, 15:51:26) [Clang 10.0.1 (clang-1001.0.46.3)]

Uploaded logs showing errors(rak8s/.log/ansible.log)

2019-04-20 10:13:47,555 p=21017 u=peiman | TASK [master : Initialize Master v1.14.1] **** 2019-04-20 10:21:26,984 p=21017 u=peiman | fatal: [rak8s000]: FAILED! => {"changed": true, "cmd": "kubeadm init --apiserver-advertise-address=192.168.1.60 --token=udy29x.ugyyk3tumg27atmr --kubernetes-version=v1.14.1 --pod-network-cidr=10.244.0.0/16", "delta": "0:07:38.901933", "end": "2019-04-20 08:21:26.902845", "msg": "non-zero return code", "rc": 1, "start": "2019-04-20 08:13:48.000912", "stderr": "\t[WARNING IsDockerSystemdCheck]: detected \"cgroupfs\" as the Docker cgroup driver. The recommended driver is \"systemd\". Please follow the guide at https://kubernetes.io/docs/setup/cri/\nerror execution phase wait-control-plane: couldn't initialize a Kubernetes cluster", "stderr_lines": ["\t[WARNING IsDockerSystemdCheck]: detected \"cgroupfs\" as the Docker cgroup driver. The recommended driver is \"systemd\". Please follow the guide at https://kubernetes.io/docs/setup/cri/", "error execution phase wait-control-plane: couldn't initialize a Kubernetes cluster"], "stdout": "[init] Using Kubernetes version: v1.14.1\n[preflight] Running pre-flight checks\n[preflight] Pulling images required for setting up a Kubernetes cluster\n[preflight] This might take a minute or two, depending on the speed of your internet connection\n[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'\n[kubelet-start] Writing kubelet environment file with flags to file \"/var/lib/kubelet/kubeadm-flags.env\"\n[kubelet-start] Writing kubelet configuration to file \"/var/lib/kubelet/config.yaml\"\n[kubelet-start] Activating the kubelet service\n[certs] Using certificateDir folder \"/etc/kubernetes/pki\"\n[certs] Generating \"etcd/ca\" certificate and key\n[certs] Generating \"etcd/healthcheck-client\" certificate and key\n[certs] Generating \"apiserver-etcd-client\" certificate and key\n[certs] Generating \"etcd/server\" certificate and key\n[certs] etcd/server serving cert is signed for DNS names [rak8s000 localhost] and IPs [192.168.1.60 127.0.0.1 ::1]\n[certs] Generating \"etcd/peer\" certificate and key\n[certs] etcd/peer serving cert is signed for DNS names [rak8s000 localhost] and IPs [192.168.1.60 127.0.0.1 ::1]\n[certs] Generating \"ca\" certificate and key\n[certs] Generating \"apiserver\" certificate and key\n[certs] apiserver serving cert is signed for DNS names [rak8s000 kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local] and IPs [10.96.0.1 192.168.1.60]\n[certs] Generating \"apiserver-kubelet-client\" certificate and key\n[certs] Generating \"front-proxy-ca\" certificate and key\n[certs] Generating \"front-proxy-client\" certificate and key\n[certs] Generating \"sa\" key and public key\n[kubeconfig] Using kubeconfig folder \"/etc/kubernetes\"\n[kubeconfig] Writing \"admin.conf\" kubeconfig file\n[kubeconfig] Writing \"kubelet.conf\" kubeconfig file\n[kubeconfig] Writing \"controller-manager.conf\" kubeconfig file\n[kubeconfig] Writing \"scheduler.conf\" kubeconfig file\n[control-plane] Using manifest folder \"/etc/kubernetes/manifests\"\n[control-plane] Creating static Pod manifest for \"kube-apiserver\"\n[control-plane] Creating static Pod manifest for \"kube-controller-manager\"\n[control-plane] Creating static Pod manifest for \"kube-scheduler\"\n[etcd] Creating static Pod manifest for local etcd in \"/etc/kubernetes/manifests\"\n[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory \"/etc/kubernetes/manifests\". This can take up to 4m0s\n[kubelet-check] Initial timeout of 40s passed.\n[kubelet-check] It seems like the kubelet isn't running or healthy.\n[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get http://localhost:10248/healthz: dial tcp [::1]:10248: connect: connection refused.\n[kubelet-check] It seems like the kubelet isn't running or healthy.\n[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get http://localhost:10248/healthz: dial tcp [::1]:10248: connect: connection refused.\n[kubelet-check] It seems like the kubelet isn't running or healthy.\n[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get http://localhost:10248/healthz: dial tcp [::1]:10248: connect: connection refused.\n[kubelet-check] It seems like the kubelet isn't running or healthy.\n[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get http://localhost:10248/healthz: dial tcp [::1]:10248: connect: connection refused.\n[kubelet-check] It seems like the kubelet isn't running or healthy.\n[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get http://localhost:10248/healthz: dial tcp [::1]:10248: connect: connection refused.\n\nUnfortunately, an error has occurred:\n\ttimed out waiting for the condition\n\nThis error is likely caused by:\n\t- The kubelet is not running\n\t- The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)\n\nIf you are on a systemd-powered system, you can try to troubleshoot the error with the following commands:\n\t- 'systemctl status kubelet'\n\t- 'journalctl -xeu kubelet'\n\nAdditionally, a control plane component may have crashed or exited when started by the container runtime.\nTo troubleshoot, list all containers using your preferred container runtimes CLI, e.g. docker.\nHere is one example how you may list all Kubernetes containers running in docker:\n\t- 'docker ps -a | grep kube | grep -v pause'\n\tOnce you have found the failing container, you can inspect its logs with:\n\t- 'docker logs CONTAINERID'", "stdout_lines": ["[init] Using Kubernetes version: v1.14.1", "[preflight] Running pre-flight checks", "[preflight] Pulling images required for setting up a Kubernetes cluster", "[preflight] This might take a minute or two, depending on the speed of your internet connection", "[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'", "[kubelet-start] Writing kubelet environment file with flags to file \"/var/lib/kubelet/kubeadm-flags.env\"", "[kubelet-start] Writing kubelet configuration to file \"/var/lib/kubelet/config.yaml\"", "[kubelet-start] Activating the kubelet service", "[certs] Using certificateDir folder \"/etc/kubernetes/pki\"", "[certs] Generating \"etcd/ca\" certificate and key", "[certs] Generating \"etcd/healthcheck-client\" certificate and key", "[certs] Generating \"apiserver-etcd-client\" certificate and key", "[certs] Generating \"etcd/server\" certificate and key", "[certs] etcd/server serving cert is signed for DNS names [rak8s000 localhost] and IPs [192.168.1.60 127.0.0.1 ::1]", "[certs] Generating \"etcd/peer\" certificate and key", "[certs] etcd/peer serving cert is signed for DNS names [rak8s000 localhost] and IPs [192.168.1.60 127.0.0.1 ::1]", "[certs] Generating \"ca\" certificate and key", "[certs] Generating \"apiserver\" certificate and key", "[certs] apiserver serving cert is signed for DNS names [rak8s000 kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local] and IPs [10.96.0.1 192.168.1.60]", "[certs] Generating \"apiserver-kubelet-client\" certificate and key", "[certs] Generating \"front-proxy-ca\" certificate and key", "[certs] Generating \"front-proxy-client\" certificate and key", "[certs] Generating \"sa\" key and public key", "[kubeconfig] Using kubeconfig folder \"/etc/kubernetes\"", "[kubeconfig] Writing \"admin.conf\" kubeconfig file", "[kubeconfig] Writing \"kubelet.conf\" kubeconfig file", "[kubeconfig] Writing \"controller-manager.conf\" kubeconfig file", "[kubeconfig] Writing \"scheduler.conf\" kubeconfig file", "[control-plane] Using manifest folder \"/etc/kubernetes/manifests\"", "[control-plane] Creating static Pod manifest for \"kube-apiserver\"", "[control-plane] Creating static Pod manifest for \"kube-controller-manager\"", "[control-plane] Creating static Pod manifest for \"kube-scheduler\"", "[etcd] Creating static Pod manifest for local etcd in \"/etc/kubernetes/manifests\"", "[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory \"/etc/kubernetes/manifests\". This can take up to 4m0s", "[kubelet-check] Initial timeout of 40s passed.", "[kubelet-check] It seems like the kubelet isn't running or healthy.", "[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get http://localhost:10248/healthz: dial tcp [::1]:10248: connect: connection refused.", "[kubelet-check] It seems like the kubelet isn't running or healthy.", "[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get http://localhost:10248/healthz: dial tcp [::1]:10248: connect: connection refused.", "[kubelet-check] It seems like the kubelet isn't running or healthy.", "[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get http://localhost:10248/healthz: dial tcp [::1]:10248: connect: connection refused.", "[kubelet-check] It seems like the kubelet isn't running or healthy.", "[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get http://localhost:10248/healthz: dial tcp [::1]:10248: connect: connection refused.", "[kubelet-check] It seems like the kubelet isn't running or healthy.", "[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get http://localhost:10248/healthz: dial tcp [::1]:10248: connect: connection refused.", "", "Unfortunately, an error has occurred:", "\ttimed out waiting for the condition", "", "This error is likely caused by:", "\t- The kubelet is not running", "\t- The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)", "", "If you are on a systemd-powered system, you can try to troubleshoot the error with the following commands:", "\t- 'systemctl status kubelet'", "\t- 'journalctl -xeu kubelet'", "", "Additionally, a control plane component may have crashed or exited when started by the container runtime.", "To troubleshoot, list all containers using your preferred container runtimes CLI, e.g. docker.", "Here is one example how you may list all Kubernetes containers running in docker:", "\t- 'docker ps -a | grep kube | grep -v pause'", "\tOnce you have found the failing container, you can inspect its logs with:", "\t- 'docker logs CONTAINERID'"]}

Raspberry Pi Hardware Version:

5 x Raspberry Pi 3 Model B Rev 1.2

Raspberry Pi OS & Version (cat /etc/os-release):

PRETTY_NAME="Raspbian GNU/Linux 9 (stretch)" NAME="Raspbian GNU/Linux" VERSION_ID="9" VERSION="9 (stretch)" ID=raspbian ID_LIKE=debian HOME_URL="http://www.raspbian.org/" SUPPORT_URL="http://www.raspbian.org/RaspbianForums" BUG_REPORT_URL="http://www.raspbian.org/RaspbianBugs"

Detailed description of the issue:

Here is my inventory file: [dev]

[prod] rak8s000 ansible_host=192.168.1.60 rak8s001 ansible_host=192.168.1.61 rak8s002 ansible_host=192.168.1.62 rak8s003 ansible_host=192.168.1.63 rak8s004 ansible_host=192.168.1.64

[master] rak8s000


I ran the cleanup.yml and then cluster.yml and then received the error that you can see above in the ansible log.

rak8s git# I used: d1b14ec5175c500a89a8ab33b0e834ca63e0014e

njohnsn commented 5 years ago

I'm having the same issue, but for me it occasionally works (maybe 1 out of 10 tries)

njohnsn commented 5 years ago

To force the use of systemd instead of cgroupfs edit /lib/systemd/system/docker.service and edit the following line:

ExecStart=/usr/bin/dockerd -H unix:// To be: ExecStart=/usr/bin/dockerd -H unix:// --exec-opt native.cgroupdriver=systemd

I'm working on a ansible task to do this. I've got the task to edit the line, and added a task to restart the docker service, but it seems you need to reboot the whole pi for it to take effect.

njohnsn commented 5 years ago

Here is the output of jornalctl command:" pi@k8s-master-1:~/rak8s $ sudo journalctl -xeu kubelet Apr 20 11:34:56 k8s-master-1 kubelet[9470]: I0420 11:34:56.026676 9470 server.go:141] Starting to listen on 0.0.0.0:10250 Apr 20 11:34:56 k8s-master-1 kubelet[9470]: I0420 11:34:56.030917 9470 server.go:343] Adding debug handlers to kubelet server. Apr 20 11:34:56 k8s-master-1 kubelet[9470]: I0420 11:34:56.038128 9470 fs_resource_analyzer.go:64] Starting FS ResourceAnalyzer Apr 20 11:34:56 k8s-master-1 kubelet[9470]: I0420 11:34:56.038314 9470 status_manager.go:152] Starting to sync pod status with apiserver Apr 20 11:34:56 k8s-master-1 kubelet[9470]: I0420 11:34:56.038405 9470 kubelet.go:1806] Starting kubelet main sync loop. Apr 20 11:34:56 k8s-master-1 kubelet[9470]: I0420 11:34:56.038509 9470 kubelet.go:1823] skipping pod synchronization - [container runtime status check may not have completed yet., PLEG Apr 20 11:34:56 k8s-master-1 kubelet[9470]: I0420 11:34:56.039270 9470 volume_manager.go:248] Starting Kubelet Volume Manager Apr 20 11:34:56 k8s-master-1 kubelet[9470]: I0420 11:34:56.039341 9470 desired_state_of_world_populator.go:130] Desired state populator starts to run Apr 20 11:34:56 k8s-master-1 kubelet[9470]: E0420 11:34:56.046610 9470 reflector.go:126] k8s.io/client-go/informers/factory.go:133: Failed to list *v1beta1.RuntimeClass: Get https://10 Apr 20 11:34:56 k8s-master-1 kubelet[9470]: E0420 11:34:56.047087 9470 controller.go:115] failed to ensure node lease exists, will retry in 200ms, error: Get https://10.0.3.240:6443/ap Apr 20 11:34:56 k8s-master-1 kubelet[9470]: E0420 11:34:56.057409 9470 kubelet.go:2170] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:doc Apr 20 11:34:56 k8s-master-1 kubelet[9470]: I0420 11:34:56.116530 9470 clientconn.go:440] parsed scheme: "unix" Apr 20 11:34:56 k8s-master-1 kubelet[9470]: I0420 11:34:56.116629 9470 clientconn.go:440] scheme "unix" not registered, fallback to default scheme Apr 20 11:34:56 k8s-master-1 kubelet[9470]: I0420 11:34:56.116875 9470 asm_arm.s:868] ccResolverWrapper: sending new addresses to cc: [{unix:///run/containerd/containerd.sock 0 <nil>} Apr 20 11:34:56 k8s-master-1 kubelet[9470]: I0420 11:34:56.116939 9470 clientconn.go:796] ClientConn switching balancer to "pick_first" Apr 20 11:34:56 k8s-master-1 kubelet[9470]: I0420 11:34:56.117100 9470 balancer_conn_wrappers.go:131] pickfirstBalancer: HandleSubConnStateChange: 0x856d0b0, CONNECTING Apr 20 11:34:56 k8s-master-1 kubelet[9470]: I0420 11:34:56.117926 9470 balancer_conn_wrappers.go:131] pickfirstBalancer: HandleSubConnStateChange: 0x856d0b0, READY Apr 20 11:34:56 k8s-master-1 kubelet[9470]: W0420 11:34:56.130476 9470 nvidia.go:66] Error reading "/sys/bus/pci/devices/": open /sys/bus/pci/devices/: no such file or directory Apr 20 11:34:56 k8s-master-1 kubelet[9470]: I0420 11:34:56.144079 9470 kubelet.go:1823] skipping pod synchronization - container runtime status check may not have completed yet. Apr 20 11:34:56 k8s-master-1 kubelet[9470]: E0420 11:34:56.144134 9470 kubelet.go:2244] node "k8s-master-1" not found Apr 20 11:34:56 k8s-master-1 kubelet[9470]: I0420 11:34:56.147640 9470 kubelet_node_status.go:283] Setting node annotation to enable volume controller attach/detach Apr 20 11:34:56 k8s-master-1 kubelet[9470]: I0420 11:34:56.158240 9470 kubelet_node_status.go:72] Attempting to register node k8s-master-1 Apr 20 11:34:56 k8s-master-1 kubelet[9470]: E0420 11:34:56.159527 9470 kubelet_node_status.go:94] Unable to register node "k8s-master-1" with API server: Post https://10.0.3.240:6443/a Apr 20 11:34:56 k8s-master-1 kubelet[9470]: E0420 11:34:56.245247 9470 kubelet.go:2244] node "k8s-master-1" not found Apr 20 11:34:56 k8s-master-1 kubelet[9470]: E0420 11:34:56.252227 9470 controller.go:115] failed to ensure node lease exists, will retry in 400ms, error: Get https://10.0.3.240:6443/ap Apr 20 11:34:56 k8s-master-1 kubelet[9470]: I0420 11:34:56.344428 9470 kubelet.go:1823] skipping pod synchronization - container runtime status check may not have completed yet. Apr 20 11:34:56 k8s-master-1 kubelet[9470]: E0420 11:34:56.345513 9470 kubelet.go:2244] node "k8s-master-1" not found Apr 20 11:34:56 k8s-master-1 kubelet[9470]: I0420 11:34:56.361270 9470 kubelet_node_status.go:283] Setting node annotation to enable volume controller attach/detach Apr 20 11:34:56 k8s-master-1 kubelet[9470]: I0420 11:34:56.364285 9470 kubelet_node_status.go:283] Setting node annotation to enable volume controller attach/detach Apr 20 11:34:56 k8s-master-1 kubelet[9470]: I0420 11:34:56.376494 9470 kubelet_node_status.go:72] Attempting to register node k8s-master-1 Apr 20 11:34:56 k8s-master-1 kubelet[9470]: I0420 11:34:56.378024 9470 cpu_manager.go:155] [cpumanager] starting with none policy Apr 20 11:34:56 k8s-master-1 kubelet[9470]: I0420 11:34:56.378080 9470 cpu_manager.go:156] [cpumanager] reconciling every 10s Apr 20 11:34:56 k8s-master-1 kubelet[9470]: I0420 11:34:56.378120 9470 policy_none.go:42] [cpumanager] none policy: Start Apr 20 11:34:56 k8s-master-1 kubelet[9470]: E0420 11:34:56.378112 9470 kubelet_node_status.go:94] Unable to register node "k8s-master-1" with API server: Post https://10.0.3.240:6443/a Apr 20 11:34:56 k8s-master-1 kubelet[9470]: F0420 11:34:56.380489 9470 kubelet.go:1359] Failed to start ContainerManager failed to initialize top level QOS containers: failed to update Apr 20 11:34:56 k8s-master-1 systemd[1]: kubelet.service: Main process exited, code=exited, status=255/n/a Apr 20 11:34:56 k8s-master-1 systemd[1]: kubelet.service: Unit entered failed state. Apr 20 11:34:56 k8s-master-1 systemd[1]: kubelet.service: Failed with result 'exit-code'.

njohnsn commented 5 years ago

I can reproduce the issue from the command line, so I think it might be a k8s issue, not rak8s.

peiman commented 5 years ago

To force the use of systemd instead of cgroupfs edit /lib/systemd/system/docker.service and edit the following line:

ExecStart=/usr/bin/dockerd -H unix:// To be: ExecStart=/usr/bin/dockerd -H unix:// --exec-opt native.cgroupdriver=systemd

I'm working on a ansible task to do this. I've got the task to edit the line, and added a task to restart the docker service, but it seems you need to reboot the whole pi for it to take effect.

Did as you suggested manually.

Now I got this from the journalctl:

Apr 20 18:14:08 rak8s000 systemd[1]: libcontainer-5100-systemd-test-default-dependencies.scope: Scope has no PIDs. Refusing. Apr 20 18:14:09 rak8s000 systemd[1]: libcontainer-5100-systemd-test-default-dependencies.scope: Scope has no PIDs. Refusing. Apr 20 18:14:09 rak8s000 systemd[1]: Created slice libcontainer_5100_systemd_test_default.slice. Apr 20 18:14:09 rak8s000 systemd[1]: Removed slice libcontainer_5100_systemd_test_default.slice. Apr 20 18:14:09 rak8s000 systemd[1]: libcontainer-5108-systemd-test-default-dependencies.scope: Scope has no PIDs. Refusing. Apr 20 18:14:09 rak8s000 systemd[1]: libcontainer-5108-systemd-test-default-dependencies.scope: Scope has no PIDs. Refusing. Apr 20 18:14:09 rak8s000 systemd[1]: Created slice libcontainer_5108_systemd_test_default.slice. Apr 20 18:14:09 rak8s000 kubelet[3360]: W0420 18:14:09.096478 3360 container.go:523] Failed to update stats for container "/libcontainer_5108_systemd_test_default.slice": failed to parse memory.usage_in_bytes - open /sys/fs/cgroup/memory/libcontainer_5108_systemd_test_default.slice/memory.usage_in_bytes: no such file or directory, continuing to push stats Apr 20 18:14:09 rak8s000 systemd[1]: Removed slice libcontainer_5108_systemd_test_default.slice. Apr 20 18:14:12 rak8s000 kubelet[3360]: W0420 18:14:12.903479 3360 cni.go:213] Unable to update cni config: No networks found in /etc/cni/net.d Apr 20 18:14:13 rak8s000 kubelet[3360]: E0420 18:14:13.635837 3360 qos_container_manager_linux.go:329] [ContainerManager]: Failed to update QoS cgroup configuration Apr 20 18:14:13 rak8s000 kubelet[3360]: W0420 18:14:13.635904 3360 qos_container_manager_linux.go:139] [ContainerManager] Failed to reserve QoS requests: failed to set supported cgroup subsystems for cgroup [kubepods burstable]: Failed to find subsystem mount for required subsystem: pids Apr 20 18:14:13 rak8s000 kubelet[3360]: E0420 18:14:13.843958 3360 kubelet.go:2170] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized Apr 20 18:14:17 rak8s000 kubelet[3360]: W0420 18:14:17.903951 3360 cni.go:213] Unable to update cni config: No networks found in /etc/cni/net.d Apr 20 18:14:18 rak8s000 systemd[1]: libcontainer-5126-systemd-test-default-dependencies.scope: Scope has no PIDs. Refusing. Apr 20 18:14:18 rak8s000 systemd[1]: libcontainer-5126-systemd-test-default-dependencies.scope: Scope has no PIDs. Refusing. Apr 20 18:14:18 rak8s000 systemd[1]: Created slice libcontainer_5126_systemd_test_default.slice. Apr 20 18:14:18 rak8s000 systemd[1]: Removed slice libcontainer_5126_systemd_test_default.slice. Apr 20 18:14:18 rak8s000 systemd[1]: libcontainer-5133-systemd-test-default-dependencies.scope: Scope has no PIDs. Refusing. Apr 20 18:14:18 rak8s000 systemd[1]: libcontainer-5133-systemd-test-default-dependencies.scope: Scope has no PIDs. Refusing. Apr 20 18:14:18 rak8s000 systemd[1]: Created slice libcontainer_5133_systemd_test_default.slice. Apr 20 18:14:18 rak8s000 systemd[1]: Removed slice libcontainer_5133_systemd_test_default.slice. Apr 20 18:14:18 rak8s000 systemd[1]: libcontainer-5151-systemd-test-default-dependencies.scope: Scope has no PIDs. Refusing. Apr 20 18:14:18 rak8s000 systemd[1]: libcontainer-5151-systemd-test-default-dependencies.scope: Scope has no PIDs. Refusing. Apr 20 18:14:18 rak8s000 systemd[1]: Created slice libcontainer_5151_systemd_test_default.slice. Apr 20 18:14:18 rak8s000 systemd[1]: Removed slice libcontainer_5151_systemd_test_default.slice. Apr 20 18:14:18 rak8s000 kubelet[3360]: E0420 18:14:18.847741 3360 kubelet.go:2170] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized Apr 20 18:14:18 rak8s000 systemd[1]: libcontainer-5176-systemd-test-default-dependencies.scope: Scope has no PIDs. Refusing. Apr 20 18:14:18 rak8s000 systemd[1]: libcontainer-5176-systemd-test-default-dependencies.scope: Scope has no PIDs. Refusing. Apr 20 18:14:19 rak8s000 systemd[1]: Created slice libcontainer_5176_systemd_test_default.slice. Apr 20 18:14:19 rak8s000 systemd[1]: Removed slice libcontainer_5176_systemd_test_default.slice. Apr 20 18:14:19 rak8s000 systemd[1]: libcontainer-5183-systemd-test-default-dependencies.scope: Scope has no PIDs. Refusing. Apr 20 18:14:19 rak8s000 systemd[1]: libcontainer-5183-systemd-test-default-dependencies.scope: Scope has no PIDs. Refusing. Apr 20 18:14:19 rak8s000 systemd[1]: Created slice libcontainer_5183_systemd_test_default.slice. Apr 20 18:14:19 rak8s000 systemd[1]: Removed slice libcontainer_5183_systemd_test_default.slice. Apr 20 18:14:19 rak8s000 kubelet[3360]: W0420 18:14:19.085643 3360 raw.go:87] Error while processing event ("/sys/fs/cgroup/devices/libcontainer_5183_systemd_test_default.slice": 0x40000100 == IN_CREATE|IN_ISDIR): inotify_add_watch /sys/fs/cgroup/devices/libcontainer_5183_systemd_test_default.slice: no such file or directory Apr 20 18:14:19 rak8s000 kubelet[3360]: W0420 18:14:19.087468 3360 container.go:409] Failed to create summary reader for "/libcontainer_5183_systemd_test_default.slice": none of the resources are being tracked. Apr 20 18:14:22 rak8s000 kubelet[3360]: W0420 18:14:22.904522 3360 cni.go:213] Unable to update cni config: No networks found in /etc/cni/net.d Apr 20 18:14:23 rak8s000 kubelet[3360]: E0420 18:14:23.854038 3360 kubelet.go:2170] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized Apr 20 18:14:27 rak8s000 kubelet[3360]: W0420 18:14:27.904971 3360 cni.go:213] Unable to update cni config: No networks found in /etc/cni/net.d Apr 20 18:14:27 rak8s000 systemd[1]: Starting Cleanup of Temporary Directories... Apr 20 18:14:27 rak8s000 systemd[1]: Started Cleanup of Temporary Directories. Apr 20 18:14:28 rak8s000 systemd[1]: libcontainer-5211-systemd-test-default-dependencies.scope: Scope has no PIDs. Refusing. Apr 20 18:14:28 rak8s000 systemd[1]: libcontainer-5211-systemd-test-default-dependencies.scope: Scope has no PIDs. Refusing. Apr 20 18:14:28 rak8s000 systemd[1]: Created slice libcontainer_5211_systemd_test_default.slice. Apr 20 18:14:28 rak8s000 systemd[1]: Removed slice libcontainer_5211_systemd_test_default.slice. Apr 20 18:14:28 rak8s000 systemd[1]: libcontainer-5219-systemd-test-default-dependencies.scope: Scope has no PIDs. Refusing. Apr 20 18:14:28 rak8s000 systemd[1]: libcontainer-5219-systemd-test-default-dependencies.scope: Scope has no PIDs. Refusing. Apr 20 18:14:28 rak8s000 systemd[1]: Created slice libcontainer_5219_systemd_test_default.slice. Apr 20 18:14:28 rak8s000 systemd[1]: Removed slice libcontainer_5219_systemd_test_default.slice. Apr 20 18:14:28 rak8s000 systemd[1]: libcontainer-5236-systemd-test-default-dependencies.scope: Scope has no PIDs. Refusing. Apr 20 18:14:28 rak8s000 systemd[1]: libcontainer-5236-systemd-test-default-dependencies.scope: Scope has no PIDs. Refusing. Apr 20 18:14:28 rak8s000 systemd[1]: Created slice libcontainer_5236_systemd_test_default.slice. Apr 20 18:14:28 rak8s000 systemd[1]: Removed slice libcontainer_5236_systemd_test_default.slice. Apr 20 18:14:28 rak8s000 kubelet[3360]: E0420 18:14:28.857418 3360 kubelet.go:2170] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized Apr 20 18:14:29 rak8s000 systemd[1]: libcontainer-5260-systemd-test-default-dependencies.scope: Scope has no PIDs. Refusing. Apr 20 18:14:29 rak8s000 systemd[1]: libcontainer-5260-systemd-test-default-dependencies.scope: Scope has no PIDs. Refusing. Apr 20 18:14:29 rak8s000 systemd[1]: Created slice libcontainer_5260_systemd_test_default.slice. Apr 20 18:14:29 rak8s000 systemd[1]: Removed slice libcontainer_5260_systemd_test_default.slice. Apr 20 18:14:29 rak8s000 systemd[1]: libcontainer-5267-systemd-test-default-dependencies.scope: Scope has no PIDs. Refusing. Apr 20 18:14:29 rak8s000 systemd[1]: libcontainer-5267-systemd-test-default-dependencies.scope: Scope has no PIDs. Refusing. Apr 20 18:14:29 rak8s000 systemd[1]: Created slice libcontainer_5267_systemd_test_default.slice. Apr 20 18:14:29 rak8s000 systemd[1]: Removed slice libcontainer_5267_systemd_test_default.slice.

njohnsn commented 5 years ago

I wiped and completely reinstalled Raspian-Lite on my k8s master. When I run the playbook I got:

TASK [kubeadm : Install k8s 1.14.1-00 Y'all] *********************************** ok: [k8s-node-2] => (item=[u'kubelet=1.14.1-00', u'kubeadm=1.14.1-00', u'kubectl=1.14.1-00']) ok: [k8s-node-4] => (item=[u'kubelet=1.14.1-00', u'kubeadm=1.14.1-00', u'kubectl=1.14.1-00']) ok: [k8s-node-3] => (item=[u'kubelet=1.14.1-00', u'kubeadm=1.14.1-00', u'kubectl=1.14.1-00']) ok: [k8s-node-1] => (item=[u'kubelet=1.14.1-00', u'kubeadm=1.14.1-00', u'kubectl=1.14.1-00']) ok: [k8s-node-5] => (item=[u'kubelet=1.14.1-00', u'kubeadm=1.14.1-00', u'kubectl=1.14.1-00']) ok: [k8s-node-6] => (item=[u'kubelet=1.14.1-00', u'kubeadm=1.14.1-00', u'kubectl=1.14.1-00']) failed: [k8s-master-1] (item=[u'kubelet=1.14.1-00', u'kubeadm=1.14.1-00', u'kubectl=1.14.1-00']) => {"failed": true, "item": ["kubelet=1.14.1-00", "kubeadm=1.14.1-00", "kubectl=1.14.1-00"], "msg": "No package matching 'kubelet' is available"}

It appears that rak8s is not installing dependencies (or removing them when running the cleanup playbook).

So I installed the kubeadm manually with apt-get which installed all the dependencies.

Rerunning the playbook got past the Install master part but is hanging on joining the workers to the cluster.

njohnsn commented 5 years ago

My guess is that it is a k8s issue because I get the same error running the kubeadm join command by hand.

At this point I'm going to take a break and then reinstall Raspbian on all my nodes and just follow the manual instructions at here

Good Luck!

PostlMC commented 5 years ago

Hit this issue this evening -- still debugging. In the meantime, I just submitted a PR that fixes the "No package matching 'kubelet' is available" issue for me.

chris-short commented 5 years ago

I merged your changes in @PostlMC. Please test on clean installs if you can.

chris-short commented 4 years ago

Any updates here?