Closed geerlingguy closed 4 years ago
One solution might be setting the storage driver to vfs
... if, indeed, that's the problem. It might not be. But just checking on things by using docker info
in Travis and in the started test container.
Also, see: https://docs.docker.com/storage/storagedriver/select-storage-driver/
Hi - I've lately run into this same error when trying to run molecule test on Travis - docker in docker. Did you manage somehow to overcome this issue ?
Same here testing in Travis.
Travis VM uses storage driver: overlay2 Molecule docker uses storage driver: aufs
I am wondering why molecule docker installs with aufs. I'm using @geerlingguy docker images for testing in Travis and roles for docker installation.
Over in https://github.com/geerlingguy/ansible-for-kubernetes/issues/5, I posited it might help to upgrade Docker CE inside the Travis CI environment first... attempting that now in https://github.com/geerlingguy/raspberry-pi-dramble/commit/ca3f2964d8dc99a8d5f7011b688c7fddc54e2987
Interesting, when I run and have the kubeadm init
command output returned via -vvvv
I see the following stderr output:
[WARNING IsDockerSystemdCheck]: detected \"cgroupfs\" as the Docker cgroup driver. The recommended driver is \"systemd\". Please follow the guide at https://kubernetes.io/docs/setup/cri/
[WARNING FileContent--proc-sys-net-bridge-bridge-nf-call-iptables]: /proc/sys/net/bridge/bridge-nf-call-iptables does not exist
[WARNING SystemVerification]: this Docker version is not on the list of validated versions: 19.03.1. Latest validated version: 18.09
[WARNING SystemVerification]: failed to parse kernel config: unable to load kernel module: \"configs\", output: \"modprobe: ERROR: ../libkmod/libkmod.c:586 kmod_search_moddep() could not open moddep file '/lib/modules/4.15.0-1028-gcp/modules.dep.bin'\
modprobe: FATAL: Module configs not found in directory /lib/modules/4.15.0-1028-gcp\
\", err: exit status 1
"error execution phase wait-control-plane: couldn't initialize a Kubernetes cluster"
Also the stdout has some interesting info that led to me printing the output of journalctl -u kubelet
:
[init] Using Kubernetes version: v1.15.7
[preflight] Running pre-flight checks
[preflight] The system verification failed. Printing the output from the verification:
\u001b[0;37mKERNEL_VERSION\u001b[0m: \u001b[0;32m4.15.0-1028-gcp\u001b[0m
\u001b[0;37mDOCKER_VERSION\u001b[0m: \u001b[0;32m19.03.1\u001b[0m
\u001b[0;37mOS\u001b[0m: \u001b[0;32mLinux\u001b[0m
\u001b[0;37mCGROUPS_CPU\u001b[0m: \u001b[0;32menabled\u001b[0m
\u001b[0;37mCGROUPS_CPUACCT\u001b[0m: \u001b[0;32menabled\u001b[0m
\u001b[0;37mCGROUPS_CPUSET\u001b[0m: \u001b[0;32menabled\u001b[0m
\u001b[0;37mCGROUPS_DEVICES\u001b[0m: \u001b[0;32menabled\u001b[0m
\u001b[0;37mCGROUPS_FREEZER\u001b[0m: \u001b[0;32menabled\u001b[0m
\u001b[0;37mCGROUPS_MEMORY\u001b[0m: \u001b[0;32menabled\u001b[0m
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'
[kubelet-start] Writing kubelet environment file with flags to file \"/var/lib/kubelet/kubeadm-flags.env\"
[kubelet-start] Writing kubelet configuration to file \"/var/lib/kubelet/config.yaml\"
[kubelet-start] Activating the kubelet service
[certs] Using certificateDir folder \"/etc/kubernetes/pki\"
[certs] Generating \"ca\" certificate and key
[certs] Generating \"apiserver\" certificate and key
[certs] apiserver serving cert is signed for DNS names [kube1 kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local] and IPs [10.96.0.1 172.17.0.2]
[certs] Generating \"apiserver-kubelet-client\" certificate and key
[certs] Generating \"front-proxy-ca\" certificate and key
[certs] Generating \"front-proxy-client\" certificate and key
[certs] Generating \"etcd/ca\" certificate and key
[certs] Generating \"etcd/server\" certificate and key
[certs] etcd/server serving cert is signed for DNS names [kube1 localhost] and IPs [172.17.0.2 127.0.0.1 ::1]
[certs] Generating \"etcd/peer\" certificate and key
[certs] etcd/peer serving cert is signed for DNS names [kube1 localhost] and IPs [172.17.0.2 127.0.0.1 ::1]
[certs] Generating \"etcd/healthcheck-client\" certificate and key
[certs] Generating \"apiserver-etcd-client\" certificate and key
[certs] Generating \"sa\" key and public key
[kubeconfig] Using kubeconfig folder \"/etc/kubernetes\"
[kubeconfig] Writing \"admin.conf\" kubeconfig file
[kubeconfig] Writing \"kubelet.conf\" kubeconfig file
[kubeconfig] Writing \"controller-manager.conf\" kubeconfig file
[kubeconfig] Writing \"scheduler.conf\" kubeconfig file
[control-plane] Using manifest folder \"/etc/kubernetes/manifests\"
[control-plane] Creating static Pod manifest for \"kube-apiserver\"
[control-plane] Creating static Pod manifest for \"kube-controller-manager\"
[control-plane] Creating static Pod manifest for \"kube-scheduler\"
[etcd] Creating static Pod manifest for local etcd in \"/etc/kubernetes/manifests\"
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory \"/etc/kubernetes/manifests\". This can take up to 4m0s
[kubelet-check] Initial timeout of 40s passed.
Unfortunately, an error has occurred:
timed out waiting for the condition
This error is likely caused by:
- The kubelet is not running
- The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)
If you are on a systemd-powered system, you can try to troubleshoot the error with the following commands:
- 'systemctl status kubelet'
- 'journalctl -xeu kubelet'
Additionally, a control plane component may have crashed or exited when started by the container runtime.
To troubleshoot, list all containers using your preferred container runtimes CLI, e.g. docker.
Here is one example how you may list all Kubernetes containers running in docker:
- 'docker ps -a | grep kube | grep -v pause'
Once you have found the failing container, you can inspect its logs with:
- 'docker logs CONTAINERID'
Upgrading Docker CE didn't work. So trying to configure overlayfs as the default driver instead... we'll see if that makes a difference.
In my case, what worked and didn't work:
vm storage driver | nested container storage driver | worked |
---|---|---|
overlay2 | aufs | *no |
overlay2 | overlay2 | no |
aufs | aufs | yes |
*The first one is the default config
So... vm and nested both being overlay2
just now worked. To get this working in Travis CI, here's what I did:
# If on Travis CI, update Docker's configuration.
if [ "$TRAVIS" == "true" ]; then
mkdir /tmp/docker
echo '{
"experimental": true,
"storage-driver": "overlay2"
}' | sudo tee /etc/docker/daemon.json
sudo service docker restart
fi
docker run
command, mount the docker daemon config into the container, and add a bind mount to /var/lib/docker so overlay2 doesn't error out:docker run [...] \
--volume=/etc/docker/daemon.json:/etc/docker/daemon.json:ro \
--mount type=bind,src=/tmp/docker,dst=/var/lib/docker
First successful build: https://travis-ci.org/geerlingguy/raspberry-pi-dramble/builds/625489080
Woohoo, now I can finally stop getting weekly 'your tests are still failing' emails again :)
Just wanted to note that my awx build on GitHub Actions is now giving a similar error:
ERROR: for awx_redis Cannot create container for service redis: error creating aufs mount
Over in https://github.com/moby/moby/issues/13742, I saw a comment mentioning:
Starting the daemon with
storage-driver: vfs
(/usr/bin/dockerd --storage-driver=vfs
) solved the problem.
So a fix for a GH Actions workflow is adding a step like:
- name: Force GitHub Actions' docker daemon to use vfs.
run: |
sudo systemctl stop docker
echo '{"cgroup-parent":"/actions_job","storage-driver":"vfs"}' | sudo tee /etc/docker/daemon.json
sudo systemctl start docker
Right before:
I see a lot of:
More debug info on the running docker daemon: