Closed MaxenceAdnot closed 8 years ago
We spoke a bit on the new Deis slack! (Shameless plug, join us at https://slack.deis.io)
I was able to reproduce on a CoreOS 983.0.0 cluster out of the box. May be related to userns
+ selinux + Docker 1.10.2 but I haven't pinned down a specific change just yet.
I was able to successfully boot the postgres component by specifying a volume, mounted at /var/lib/postgres
.
@MaxenceAdnot if you could modify the deis-database-rc.yaml
with the changes in https://github.com/deis/charts/pull/160/files that would be awesome.
That can be accomplished by:
helm uninstall deis-dev
<-- will remove your deis installhelm edit deis-dev
<-- hand edit tpl/deis-database-rc.yaml
helm generate deis-dev
helm install deis-dev
OR:
kubectl --namespace=deis edit rc deis-database
<-- hand-edit the volume information from the PRkubectl --namespace=deis delete deis-databse-XYZ123
< --- delete the database podEdit: clarity
Thank you for the quick fix. I will try this in a few hours and let you know.
Your patch is working fine ! Thank you @slack
Excellent, thanks for testing! Will get this merged in the morning.
The upstream issue for this is https://github.com/docker/docker/issues/7952. It seems to be Docker + btrfs with SELinux enabled causes this issue. Can you try running docker on a non-btrfs backend and see if you see the same issue?
This did not necessarily resolve the issue, just circumvent it. Volume mounts are likely non-btrfs so SELinux is playing nicely with it. This project is also not designed around being backed by a persistent disk as demontrated in the end-to-end tests at https://ci.deis.io/.
You'll likely see this issue again in another form in the future, so I don't feel this problem is resolved. I'm going to revert this change and further debug this issue.
core@ip-10-0-0-175 ~ $ docker info
Containers: 43
Running: 7
Paused: 0
Stopped: 36
Images: 10
Server Version: 1.10.2
Storage Driver: overlay
Backing Filesystem: extfs
Execution Driver: native-0.2
Logging Driver: json-file
Plugins:
Volume: local
Network: null host bridge
Kernel Version: 4.4.4-coreos
Operating System: CoreOS 983.0.0 (Coeur Rouge)
OSType: linux
Architecture: x86_64
CPUs: 1
Total Memory: 3.679 GiB
Name: ip-10-0-0-175.us-west-2.compute.internal
ID: OP6Q:UNHM:BTT4:FCYX:3QTL:HZUQ:ET7E:EOXC:RYJO:L5ZK:VLQO:YYCH
I don't think CoreOS is using btrfs at all (from my memory Docker on CoreOS is using overlay as storage driver).
Some concrete evidence supporting this would be this comment: https://github.com/docker/docker/issues/7952#issuecomment-54989852
@slack I think this includes overlayfs as well, judging from that comment:
Kernel engineers are working on a fix for this and potentially Overlayfs if it gets merged into the container.
Reading further down the page, the temporary fix seems to be removing --selinux-enabled
from the docker daemon's option flag list. Overlay is indeed broken as well.
Source for fix: https://github.com/docker/docker/issues/7952#issuecomment-56435657
@MaxenceAdnot how did you deploy this cluster on AWS? Just following the kubernetes documentation and ran KUBERNETES_PROVIDER=aws ./cluster/kube-up.sh
?
I used the kube-aws tool provided by CoreOS to deploy Kubernetes on AWS.
Also note that Docker has disabled SELinux support on v1.9.1 due to the above bug. This is what I get on Fedora 23 with SELinux enabled and with the overlay driver:
[vagrant@localhost ~]$ docker version -f '{{.Client.Version}}' 2>/dev/null
1.9.1
[vagrant@localhost ~]$ cat /etc/sysconfig/docker | grep OPTIONS=
OPTIONS='--selinux-enabled --log-driver=journald'
[vagrant@localhost ~]$ cat /etc/sysconfig/docker-storage | grep DOCKER_STORAGE_OPTIONS=
DOCKER_STORAGE_OPTIONS="-s overlay"
[vagrant@localhost ~]$ sudo journalctl -u docker --no-pager
-- Logs begin at Sat 2015-12-26 10:24:57 UTC, end at Tue 2016-03-15 20:02:41 UTC. --
Mar 15 20:01:08 localhost.localdomain systemd[1]: Starting Docker Application Container Engine...
Mar 15 20:01:08 localhost.localdomain docker[19914]: time="2016-03-15T20:01:08.973114867Z" level=fatal msg="Error starting daemon: SELinux is not supported with the overlay graph driver"
Mar 15 20:01:08 localhost.localdomain systemd[1]: docker.service: Main process exited, code=exited, status=1/FAILURE
Mar 15 20:01:08 localhost.localdomain systemd[1]: Failed to start Docker Application Container Engine.
Mar 15 20:01:08 localhost.localdomain systemd[1]: docker.service: Unit entered failed state.
Mar 15 20:01:08 localhost.localdomain systemd[1]: docker.service: Failed with result 'exit-code'.
Setting --selinux-enabled=false
in the options list allows me to start Docker and fixes the issue noted in the OP.
@MaxenceAdnot can you confirm that disabling SELinux support on your CoreOS cluster fixes the issue?
This is also something we should bring up with CoreOS as well so their kubernetes clusters work OOTB.
I'm seeing this as well on kube-aws. Details:
core@ip-10-0-0-50 ~ $ uname -a
Linux ip-10-0-0-50.us-west-2.compute.internal 4.4.3-coreos #2 SMP Thu Mar 3 23:21:00 UTC 2016 x86_64 Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz GenuineIntel GNU/Linux
core@ip-10-0-0-50 ~ $ cat /etc/os-release
NAME=CoreOS
ID=coreos
VERSION=976.0.0
VERSION_ID=976.0.0
BUILD_ID=2016-03-03-2324
PRETTY_NAME="CoreOS 976.0.0 (Coeur Rouge)"
ANSI_COLOR="1;32"
HOME_URL="https://coreos.com/"
BUG_REPORT_URL="https://github.com/coreos/bugs/issues"
core@ip-10-0-0-50 ~ $ docker --version
Docker version 1.10.2, build eb1bdb1
I'm using the latest kube-aws release as well as @slack's modified chart (using the latest from deis-dev) to no avail.
Investigating the Docker journal:
$ journalctl -u docker --no-pager
The following seems to be relevant:
Mar 16 05:50:34 ip-10-0-0-50.us-west-2.compute.internal dockerd[1241]: time="2016-03-16T05:50:34.686537706Z" level=warning msg="DEPRECATED: Setting host configuration options when the container starts is deprecated and will be removed in Docker 1.12"
Mar 16 05:50:34 ip-10-0-0-50.us-west-2.compute.internal dockerd[1241]: time="2016-03-16T05:50:34.800687111Z" level=warning msg="HostsPath set to \"/var/lib/docker/containers/4b28d46ca43b7b6752761225e3d1ec129653b4dce07769386a59a2a5d464eb21/hosts\", but can't stat this filename (err = stat /var/lib/docker/containers/4b28d46ca43b7b6752761225e3d1ec129653b4dce07769386a59a2a5d464eb21/hosts: no such file or directory); skipping"
Mar 16 05:50:34 ip-10-0-0-50.us-west-2.compute.internal dockerd[1241]: time="2016-03-16T05:50:34.956670266Z" level=warning msg="signal: killed"
Mar 16 05:50:35 ip-10-0-0-50.us-west-2.compute.internal dockerd[1241]: time="2016-03-16T05:50:35.012629347Z" level=warning msg="HostsPath set to \"/var/lib/docker/containers/4b28d46ca43b7b6752761225e3d1ec129653b4dce07769386a59a2a5d464eb21/hosts\", but can't stat this filename (err = stat /var/lib/docker/containers/4b28d46ca43b7b6752761225e3d1ec129653b4dce07769386a59a2a5d464eb21/hosts: no such file or directory); skipping"
Mar 16 05:50:35 ip-10-0-0-50.us-west-2.compute.internal dockerd[1241]: time="2016-03-16T05:50:35.019676616Z" level=error msg="error locating sandbox id 29c8df24230e8e7f732b3eeff8c02150198fb212b015d44857cf4ab8383f58c2: sandbox 29c8df24230e8e7f732b3eeff8c02150198fb212b015d44857cf4ab8383f58c2 not found"
Mar 16 05:50:35 ip-10-0-0-50.us-west-2.compute.internal dockerd[1241]: time="2016-03-16T05:50:35.019723965Z" level=warning msg="failed to cleanup ipc mounts:\nfailed to umount /var/lib/docker/containers/4b28d46ca43b7b6752761225e3d1ec129653b4dce07769386a59a2a5d464eb21/shm: invalid argument"
Mar 16 05:50:35 ip-10-0-0-50.us-west-2.compute.internal dockerd[1241]: time="2016-03-16T05:50:35.019745608Z" level=error msg="Error unmounting container 4b28d46ca43b7b6752761225e3d1ec129653b4dce07769386a59a2a5d464eb21: not mounted"
Mar 16 05:50:35 ip-10-0-0-50.us-west-2.compute.internal dockerd[1241]: time="2016-03-16T05:50:35.019837256Z" level=warning msg="HostsPath set to \"/var/lib/docker/containers/4b28d46ca43b7b6752761225e3d1ec129653b4dce07769386a59a2a5d464eb21/hosts\", but can't stat this filename (err = stat /var/lib/docker/containers/4b28d46ca43b7b6752761225e3d1ec129653b4dce07769386a59a2a5d464eb21/hosts: no such file or directory); skipping"
Mar 16 05:50:35 ip-10-0-0-50.us-west-2.compute.internal dockerd[1241]: time="2016-03-16T05:50:35.019898767Z" level=error msg="Handler for POST /containers/4b28d46ca43b7b6752761225e3d1ec129653b4dce07769386a59a2a5d464eb21/start returned error: Container command not found or does not exist."
Note that selinux is enabled by the /usr/lib/coreos/dockerd
wrapper on CoreOS. There is a configuration directive, ARG_SELINUX
, which can be added to /run/flannel_docker_opts.env
:
ARG_SELINUX="nowayjose"
Restarting dockerd with sudo systemctl restart docker
resolves this for the main Docker daemon - however, early-docker.service
is unaffected. That one is harder to patch, it seems.
dmesg
also shows:
[ 153.677041] SELinux: mount invalid. Same superblock, different security settings for (dev mqueue, type mqueue)
[ 162.133212] SELinux: mount invalid. Same superblock, different security settings for (dev mqueue, type mqueue)
[ 163.069155] SELinux: mount invalid. Same superblock, different security settings for (dev mqueue, type mqueue)
[ 168.453115] SELinux: mount invalid. Same superblock, different security settings for (dev mqueue, type mqueue)
[ 212.856207] SELinux: mount invalid. Same superblock, different security settings for (dev mqueue, type mqueue)
[ 218.218618] SELinux: mount invalid. Same superblock, different security settings for (dev mqueue, type mqueue)
[ 219.433233] SELinux: mount invalid. Same superblock, different security settings for (dev mqueue, type mqueue)
[ 237.072790] SELinux: mount invalid. Same superblock, different security settings for (dev mqueue, type mqueue)
[ 237.242488] SELinux: mount invalid. Same superblock, different security settings for (dev mqueue, type mqueue)
[ 237.377854] SELinux: mount invalid. Same superblock, different security settings for (dev mqueue, type mqueue)
[ 237.770179] SELinux: mount invalid. Same superblock, different security settings for (dev mqueue, type mqueue)
With the beta channel, @carmstrong and I experience some issues with the kube-aws tool.
No running container with a docker ps
, kubernetes is not initialized ...
Just for posterity, I did notice at their documentation that the only Release Channel value that is supported at this time is alpha
, so that seems to be the cause for the issues you're having on the beta channel.
Yep, Beta worked without issue until the recent kubelet-wrapper merges (which don't exist in beta). That is why k8s fails to boot using v0.4.1 of kube-aws on Beta.
Hi, It seems I have the same error I explain in: https://github.com/deis/workflow/issues/210 I use kubernetes 1.2, CoreOS alpha (1010.1.0), Deis beta2, kube-aws 0.6.0 This error should be solved?
@arkkanoid the root issue is https://github.com/coreos/coreos-kubernetes/issues/317. There's nothing we can do on our end other than contribute upstream.
FYI, I ran into what I think was the same thing on GKE running Kubernetes 1.2.6 on workflow 2.6.0. I have to run 1.2.6 for my own reasons, this doesn't seem to be a problem in 1.3.X or 1.4.0 from what I can tell.
I'm trying to deploy the deis-dev chart on AWS and it seems that the database refuses to start...
CoreOS version is 976.0.0 (alpha)
docker logs 6ece7bde9d :
I also noticed that line in kernel logs :
I don't really like this SELinux message :/
Any ideas ?