conjure-up / spells

conjure-up spells registry
79 stars 37 forks source link

canonical-kubernetes fails to deploy with "Waiting for kube-system pods to start" #230

Closed iatrou closed 5 years ago

iatrou commented 5 years ago

Deploying canonical-kubernetes on localhost (lxd) fails/stalls. Here is the relevant info:

ubuntu@kubernetes:~$ snap list
Name        Version              Rev   Tracking  Publisher   Notes
conjure-up  2.6.1-20181018.1610  1031  stable    canonical✓  classic
core        16-2.35.4            5662  stable    canonical✓  core
lxd         3.0.2                8715  3.0       canonical✓  -

ubuntu@kubernetes:~$ juju --version
2.4.3-xenial-amd64

ubuntu@kubernetes:~$ lxc --version
3.0.2

ubuntu@kubernetes:~$ lxc storage list
+------------+-------------+--------+-----------------------------------------------+---------+
|    NAME    | DESCRIPTION | DRIVER |                    SOURCE                     | USED BY |
+------------+-------------+--------+-----------------------------------------------+---------+
| default    |             | zfs    | default                                       | 11      |
+------------+-------------+--------+-----------------------------------------------+---------+
| juju-btrfs |             | btrfs  | /var/snap/lxd/common/lxd/disks/juju-btrfs.img | 0       |
+------------+-------------+--------+-----------------------------------------------+---------+
| juju-zfs   |             | zfs    | /var/snap/lxd/common/lxd/disks/juju-zfs.img   | 0       |
+------------+-------------+--------+-----------------------------------------------+---------+

conjure-up leaves the cluster in the following state:

ubuntu@kubernetes:~$ juju status
Model                         Controller                Cloud/Region         Version  SLA          Timestamp
conjure-canonical-kubern-764  conjure-up-localhost-b6b  localhost/localhost  2.4.3    unsupported  19:19:16Z

App                    Version  Status   Scale  Charm                  Store       Rev  OS      Notes
easyrsa                3.0.1    active       1  easyrsa                jujucharms  117  ubuntu  
etcd                   3.2.10   active       3  etcd                   jujucharms  209  ubuntu  
flannel                0.10.0   active       2  flannel                jujucharms  146  ubuntu  
kubeapi-load-balancer  1.14.0   active       1  kubeapi-load-balancer  jujucharms  162  ubuntu  exposed
kubernetes-master      1.12.1   waiting      1  kubernetes-master      jujucharms  219  ubuntu  
kubernetes-worker      1.12.1   waiting      1  kubernetes-worker      jujucharms  239  ubuntu  exposed

Unit                      Workload  Agent      Machine  Public address  Ports           Message
easyrsa/0*                active    idle       0        10.202.248.222                  Certificate Authority connected.
etcd/0*                   active    idle       1        10.202.248.56   2379/tcp        Healthy with 3 known peers
etcd/1                    active    idle       2        10.202.248.187  2379/tcp        Healthy with 3 known peers
etcd/2                    active    idle       3        10.202.248.220  2379/tcp        Healthy with 3 known peers
kubeapi-load-balancer/0*  active    idle       4        10.202.248.57   443/tcp         Loadbalancer ready.
kubernetes-master/0*      waiting   idle       5        10.202.248.186  6443/tcp        Waiting for kube-system pods to start
  flannel/0*              active    idle                10.202.248.186                  Flannel subnet 10.1.80.1/24
kubernetes-worker/0*      waiting   executing  6        10.202.248.160  80/tcp,443/tcp  (config-changed) Container runtime not available.
  flannel/1               active    idle                10.202.248.160                  Flannel subnet 10.1.5.1/24

Entity  Meter status  Message
model   amber         user verification pending  

Machine  State    DNS             Inst id        Series  AZ  Message
0        started  10.202.248.222  juju-32908b-0  bionic      Running
1        started  10.202.248.56   juju-32908b-1  bionic      Running
2        started  10.202.248.187  juju-32908b-2  bionic      Running
3        started  10.202.248.220  juju-32908b-3  bionic      Running
4        started  10.202.248.57   juju-32908b-4  bionic      Running
5        started  10.202.248.186  juju-32908b-5  bionic      Running
6        started  10.202.248.160  juju-32908b-6  bionic      Running

The logs in kubernetes-worker/0 show:

dockerd[12663]: time="2018-10-24T18:49:46.333413062Z" level=info msg="libcontainerd: started new docker-containerd process" pid=12671
dockerd[12663]: time="2018-10-24T18:49:46Z" level=info msg="starting containerd" module=containerd revision=9b55aab90508bd389d7654c4baf173a981477d55 version=docker-17.
dockerd[12663]: time="2018-10-24T18:49:46Z" level=info msg="loading plugin "io.containerd.content.v1.content"..." module=containerd type=io.containerd.content.v1 
dockerd[12663]: time="2018-10-24T18:49:46Z" level=info msg="loading plugin "io.containerd.snapshotter.v1.btrfs"..." module=containerd type=io.containerd.snapshotter.v1
dockerd[12663]: time="2018-10-24T18:49:46Z" level=warning msg="failed to load plugin io.containerd.snapshotter.v1.btrfs" error="path /var/lib/docker/containerd/daemon/
dockerd[12663]: time="2018-10-24T18:49:46Z" level=info msg="loading plugin "io.containerd.snapshotter.v1.overlayfs"..." module=containerd type=io.containerd.snapshotte
dockerd[12663]: time="2018-10-24T18:49:46Z" level=info msg="loading plugin "io.containerd.metadata.v1.bolt"..." module=containerd type=io.containerd.metadata.v1
dockerd[12663]: time="2018-10-24T18:49:46Z" level=warning msg="could not use snapshotter btrfs in metadata plugin" error="path /var/lib/docker/containerd/daemon/io.con
dockerd[12663]: time="2018-10-24T18:49:46Z" level=info msg="loading plugin "io.containerd.differ.v1.walking"..." module=containerd type=io.containerd.differ.v1
dockerd[12663]: time="2018-10-24T18:49:46Z" level=info msg="loading plugin "io.containerd.gc.v1.scheduler"..." module=containerd type=io.containerd.gc.v1
dockerd[12663]: time="2018-10-24T18:49:46Z" level=info msg="loading plugin "io.containerd.grpc.v1.containers"..." module=containerd type=io.containerd.grpc.v1
dockerd[12663]: time="2018-10-24T18:49:46Z" level=info msg="loading plugin "io.containerd.grpc.v1.content"..." module=containerd type=io.containerd.grpc.v1
dockerd[12663]: time="2018-10-24T18:49:46Z" level=info msg="loading plugin "io.containerd.grpc.v1.diff"..." module=containerd type=io.containerd.grpc.v1
dockerd[12663]: time="2018-10-24T18:49:46Z" level=info msg="loading plugin "io.containerd.grpc.v1.events"..." module=containerd type=io.containerd.grpc.v1
dockerd[12663]: time="2018-10-24T18:49:46Z" level=info msg="loading plugin "io.containerd.grpc.v1.healthcheck"..." module=containerd type=io.containerd.grpc.v1
dockerd[12663]: time="2018-10-24T18:49:46Z" level=info msg="loading plugin "io.containerd.grpc.v1.images"..." module=containerd type=io.containerd.grpc.v1
dockerd[12663]: time="2018-10-24T18:49:46Z" level=info msg="loading plugin "io.containerd.grpc.v1.leases"..." module=containerd type=io.containerd.grpc.v1
dockerd[12663]: time="2018-10-24T18:49:46Z" level=info msg="loading plugin "io.containerd.grpc.v1.namespaces"..." module=containerd type=io.containerd.grpc.v1
dockerd[12663]: time="2018-10-24T18:49:46Z" level=info msg="loading plugin "io.containerd.grpc.v1.snapshots"..." module=containerd type=io.containerd.grpc.v1
dockerd[12663]: time="2018-10-24T18:49:46Z" level=info msg="loading plugin "io.containerd.monitor.v1.cgroups"..." module=containerd type=io.containerd.monitor.v1 
dockerd[12663]: time="2018-10-24T18:49:46Z" level=info msg="loading plugin "io.containerd.runtime.v1.linux"..." module=containerd type=io.containerd.runtime.v1 
dockerd[12663]: time="2018-10-24T18:49:46Z" level=info msg="loading plugin "io.containerd.grpc.v1.tasks"..." module=containerd type=io.containerd.grpc.v1
dockerd[12663]: time="2018-10-24T18:49:46Z" level=info msg="loading plugin "io.containerd.grpc.v1.version"..." module=containerd type=io.containerd.grpc.v1
dockerd[12663]: time="2018-10-24T18:49:46Z" level=info msg="loading plugin "io.containerd.grpc.v1.introspection"..." module=containerd type=io.containerd.grpc.v1
dockerd[12663]: time="2018-10-24T18:49:46Z" level=info msg=serving... address="/var/run/docker/containerd/docker-containerd-debug.sock" module="containerd/debug"
dockerd[12663]: time="2018-10-24T18:49:46Z" level=info msg=serving... address="/var/run/docker/containerd/docker-containerd.sock" module="containerd/grpc"
dockerd[12663]: time="2018-10-24T18:49:46Z" level=info msg="containerd successfully booted in 0.001984s" module=containerd
dockerd[12663]: time="2018-10-24T18:49:46.349069375Z" level=error msg="There are no more loopback devices available."
dockerd[12663]: time="2018-10-24T18:49:46.349107675Z" level=error msg="[graphdriver] prior storage driver devicemapper failed: loopback attach failed"
dockerd[12663]: Error starting daemon: error initializing graphdriver: loopback attach failed
systemd[1]: docker.service: Main process exited, code=exited, status=1/FAILURE
systemd[1]: docker.service: Failed with result 'exit-code'.
systemd[1]: Failed to start Docker Application Container Engine.
iatrou commented 5 years ago
ubuntu@kubernetes:~$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 16.04.5 LTS
Release:        16.04
Codename:       xenial
ktsakalozos commented 5 years ago

Hi @iatrou ,

The storage driver needs to be dir. Even though this deployment is done with conjure-up, you might also want to take a look at this page https://github.com/juju-solutions/bundle-canonical-kubernetes/wiki/Deploying-on-LXD .

If this deployment is for experimenting we have some (not recommended) lxd profiles that might work with zfs but need more testing.

Cheers

iatrou commented 5 years ago

Thanks for the prompt reply @ktsakalozos! Using 'dir' for storage fixed the issue -- there is no need to manually go through the steps described in the provided documentation, the process is already integrated in the conjure-up spell. I would love to see the experimental workaround profile for ZFS, if you have it handy.

ktsakalozos commented 5 years ago

We had some success in starting microk8s with the profile found here: https://github.com/ubuntu/microk8s/issues/65#issuecomment-417354097 That profile is wide open and thus not recommended. The issue with zfs should benefit from the work we are doing on strict confinement.

iatrou commented 5 years ago

@ktsakalozos awesome, thanks!